Re: [OMPI users] few Problems
On Apr 23, 2009, at 3:59 PM, Luis Vitorio Cargnini wrote: I'm using NFS, my home dir is the same in all nodes the problem is when generating the key it is been generated for a specific machine end of the key is the user@host, the system is consulting id_dsa in each machine. That's ok. I have a similar setup: svbu-mpi is my cluster "head node" and that's where I generated my DSA key. So my id_dsa.pub file looks like this: [13:05] svbu-mpi:~/hg % cat ~/.ssh/id_dsa.pub ssh-dss B3NzaC1kc3MAAACBAPhFvzoDPw1da2aYf2PCW9sQfOT4SYmvI5EYfJvJXyyVLs7C +ETY5Zma7js2PCfk4kgHUVJQgglP5V/Dp9uBjgP/zpNdOWbP +chULEXaz0HKOV3NZM5BH6oBRTSGTZh4DhqnQjotQsp6gi9LZ+GGl00tzc + EzlfqIfSuKHQjSTADFQCM1AbE8Z7+mcCzFpNUAa7eLBFOhQAAAIEAjMEiDNceRdvMjf +Of1nwaMb8ndx/w4ltEH67P0g2xn8PfJP56rYn7ffiEuB5Ndu +iLskII5CkDwLZOmv4nP32gNzxxyo23Qbnd88a+BYe+j9yu35czqvPzxHBKlP5t0zaeZQt/ fXr/VKd1P9OhZKMVmGZm1m2Yn5M21d16V1j4QAAACBALe2hbtgzqSMSVyX7ED31MfJsYxW/ y01VH9f7Ot+WfJrpTsTRTWMYb6x1jTAozC/DvZlx/KPKiekQH +ApkfL1e6TSlug1Y5Kv9zCvXwEAbgwHEwUoWvTT +IpBwD318AjraZtJXlIb03tkX7l2gZNncwOmzFbwqGwypD3YtHAY3j1 jsquyres@svbu- mpi [13:05] svbu-mpi:~/hg % And that same $HOME/.ssh/id_dsa.pub (and corresponding $HOME/.ssh/ id_dsa) file is available on all my nodes via NFS. The email address at the end is not really part of the key; it's just there for human reference for you to remember where it came from. It doesn't affect the authentication at all. so to fix the problem since my applications are launch from node srv0 I just create the keys in node 0 and that is it start to work in to connect in the others node, the problem is the reverse path I can't access from srv1 srv0 for example. Why not? If you copy your id_dsa.pub file to authorized_keys, it should Just Work (assuming the permissions are all set correctly: - $HOME/.ssh needs, owned by you, 0700 - $HOME/.ssh/authorized_keys owned by you, 0600 - $HOME/.ssh/id_dsa.pub owned by you, 0644 - $HOME/.ssh/id_dsa owned by you, 0600 The SSH setup HOWTOs and recipes sent in this thread (I assume) must talk about such things..? The point is working from node0, the connections trough ssh. Now the execution it start but do not stop, like keep running ad infinitum, any ideas ? mpirun -d -v -hostfile chosts -np 35 ~/mpi/hello [cluster-srv0:29466] procdir: /tmp/openmpi-sessions- lvcargnini@cluster- srv0_0/44411/0/0 [snipped] Are you able to run non-MPI apps through mpirun? For example: mpirun -d -v -hostfile chosts hostname | sort If that works, then did you compile "hello" correctly (e.g., with mpicc)? I assume this is a simple "hello world" kind of MPI program -- calls MPI_INIT, maybe MPI_COMM_RANK and MPI_COMM_SIZE, and MPI_FINALIZE? Do you have TCP firewalling disabled on all of your cluster nodes? -- Jeff Squyres Cisco Systems
Re: [OMPI users] few Problems
Hi all, I tried as described in the documentation (I have did this before) so the problem now is: I'm using NFS, my home dir is the same in all nodes the problem is when generating the key it is been generated for a specific machine end of the key is the user@host, the system is consulting id_dsa in each machine. so to fix the problem since my applications are launch from node srv0 I just create the keys in node 0 and that is it start to work in to connect in the others node, the problem is the reverse path I can't access from srv1 srv0 for example. But this is not really a problem now. The point is working from node0, the connections trough ssh. Now the execution it start but do not stop, like keep running ad infinitum, any ideas ? mpirun -d -v -hostfile chosts -np 35 ~/mpi/hello [cluster-srv0:29466] procdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv0_0/44411/0/0 [cluster-srv0:29466] jobdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv0_0/44411/0 [cluster-srv0:29466] top: openmpi-sessions-lvcargnini@cluster-srv0_0 [cluster-srv0:29466] tmp: /tmp [cluster-srv0:29466] mpirun: reset PATH: /export/cluster/appl/x86_64/ llvm/bin:/bin:/sbin:/export/cluster/appl/x86_64/llvm/bin:/usr/local/ llvm/bin:/usr/local/bin:/usr/bin:/usr/sbin:/home/GTI420/lvcargnini/oe/ bitbake/bin [cluster-srv0:29466] mpirun: reset LD_LIBRARY_PATH: /export/cluster/ appl/x86_64/llvm/lib:/lib64:/lib:/export/cluster/appl/x86_64/llvm/lib:/ usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib [cluster-srv1:13531] procdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv1_0/44411/0/1 [cluster-srv3:20272] procdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv3_0/44411/0/3 [cluster-srv3:20272] jobdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv3_0/44411/0 [cluster-srv3:20272] top: openmpi-sessions-lvcargnini@cluster-srv3_0 [cluster-srv3:20272] tmp: /tmp [cluster-srv2:23273] procdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv2_0/44411/0/2 [cluster-srv4:09057] procdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv4_0/44411/0/4 [cluster-srv0:29466] [[44411,0],0] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv0:29466] [[44411,0],0] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv0:29466] [[44411,0],0] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv0:29466] [[44411,0],0] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv0:29466] [[44411,0],0] node[4].name cluster-srv4 daemon 4 arch ffc91200 [cluster-srv1:13531] jobdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv1_0/44411/0 [cluster-srv1:13531] top: openmpi-sessions-lvcargnini@cluster-srv1_0 [cluster-srv1:13531] tmp: /tmp [cluster-srv1:13531] [[44411,0],1] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv1:13531] [[44411,0],1] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv1:13531] [[44411,0],1] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv1:13531] [[44411,0],1] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv1:13531] [[44411,0],1] node[4].name cluster-srv4 daemon 4 arch ffc91200 [cluster-srv2:23273] jobdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv2_0/44411/0 [cluster-srv2:23273] top: openmpi-sessions-lvcargnini@cluster-srv2_0 [cluster-srv2:23273] tmp: /tmp [cluster-srv2:23273] [[44411,0],2] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv2:23273] [[44411,0],2] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv2:23273] [[44411,0],2] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv2:23273] [[44411,0],2] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv2:23273] [[44411,0],2] node[4].name cluster-srv4 daemon 4 arch ffc91200 [cluster-srv4:09057] jobdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv4_0/44411/0 [cluster-srv4:09057] top: openmpi-sessions-lvcargnini@cluster-srv4_0 [cluster-srv4:09057] tmp: /tmp [cluster-srv4:09057] [[44411,0],4] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv4:09057] [[44411,0],4] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv4:09057] [[44411,0],4] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv4:09057] [[44411,0],4] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv4:09057] [[44411,0],4] node[4].name cluster-srv4 daemon 4 arch ffc91200 [cluster-srv0:29472] procdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv0_0/44411/1/0 [cluster-srv0:29472] jobdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv0_0/44411/1 [cluster-srv0:29472] top: openmpi-sessions-lvcargnini@cluster-srv0_0 [cluster-srv0:29472] tmp: /tmp [cluster-srv0:29474] procdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv0_0/44411/1/2 [cluster-srv0:29474] jobdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv0_0/44411/1 [cluster-srv0:29474] top: openmpi-sessions-lvcargnini@cluster-srv0_0 [cluster-srv0:29474] tmp: /tmp [cluster-srv0:29475] procdir: /tmp/openmpi-sessions-lvcargnini@cluster- srv0_0/44411/1/3 [c
Re: [OMPI users] few Problems
thank you all, I'll try to fix this ASAP, after I'll make a new test round than I answer back, Thanks you all until here. Le 09-04-22 à 17:06, Gus Correa a écrit : Hi Luis, list To complement Jeff's recommendation, see if this recipe to setup passwordless ssh connections helps. If you use RSA keys instead of DSA, replace all "dsa" by "rsa": http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO/SSH-with-Keys-HOWTO-4.html#ss4.3 I hope this helps. Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Jeff Squyres wrote: It looks like you need to fix your password-less ssh problems first: > Permission denied, please try again. > AH72000@cluster-srv2's password: As I mentioned earlier, you need to be able to be able to run ssh cluster-srv2 uptime without being prompted for a password before Open MPI will work properly. If you're still having problems after fixing this, please send all the information from the "help" URL I sent earlier. Thanks! On Apr 22, 2009, at 3:24 PM, Luis Vitorio Cargnini wrote: ok this is the debug information debug running on 5 nodes (trying at least), the process is locked until now: each node is composed by two quad-core microprocessors. (don't finish), one node yet asked me the password. I have the home partition mounted (the same) in all nodes. so login in cluster- srv[0-4] is the same thing, I generated the key in each node, in different files and added all to the authorized_keys, it should be working. That is it, all help is welcome. this is the code been executed: #include #include int main (int argc, char *argv[]) { int rank, size; MPI_Init (&argc, &argv); /* starts MPI */ MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */ MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */ printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize(); return 0; } -- debug: -bash-3.2$ mpirun -v -d -hostfile chosts -np 32 /export/cluster/ home/ AH72000/mpi/hello [cluster-srv0:21606] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/0/0 [cluster-srv0:21606] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/0 [cluster-srv0:21606] top: openmpi-sessions-AH72000@cluster-srv0_0 [cluster-srv0:21606] tmp: /tmp [cluster-srv0:21606] mpirun: reset PATH: /export/cluster/appl/ x86_64/ llvm/bin:/bin:/sbin:/export/cluster/appl/x86_64/llvm/bin:/usr/local/ llvm/bin:/usr/local/bin:/usr/bin:/usr/sbin:/home/GTI420/AH72000/oe/ bitbake/bin [cluster-srv0:21606] mpirun: reset LD_LIBRARY_PATH: /export/cluster/ appl/x86_64/llvm/lib:/lib64:/lib:/export/cluster/appl/x86_64/llvm/ lib:/ usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib AH72000@cluster-srv1's password: AH72000@cluster-srv2's password: AH72000@cluster-srv3's password: [cluster-srv1:07406] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv1_0/35335/0/1 [cluster-srv1:07406] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv1_0/35335/0 [cluster-srv1:07406] top: openmpi-sessions-AH72000@cluster-srv1_0 [cluster-srv1:07406] tmp: /tmp Permission denied, please try again. AH72000@cluster-srv2's password: [cluster-srv3:14230] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv3_0/35335/0/3 [cluster-srv3:14230] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv3_0/35335/0 [cluster-srv3:14230] top: openmpi-sessions-AH72000@cluster-srv3_0 [cluster-srv3:14230] tmp: /tmp Permission denied, please try again. AH72000@cluster-srv2's password: [cluster-srv2:17092] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv2_0/35335/0/2 [cluster-srv2:17092] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv2_0/35335/0 [cluster-srv2:17092] top: openmpi-sessions-AH72000@cluster-srv2_0 [cluster-srv2:17092] tmp: /tmp [cluster-srv0:21606] [[35335,0],0] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[1].name cluster-srv1 daemon 1 arch ff
Re: [OMPI users] few Problems
Hi Luis, list To complement Jeff's recommendation, see if this recipe to setup passwordless ssh connections helps. If you use RSA keys instead of DSA, replace all "dsa" by "rsa": http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO/SSH-with-Keys-HOWTO-4.html#ss4.3 I hope this helps. Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Jeff Squyres wrote: It looks like you need to fix your password-less ssh problems first: > Permission denied, please try again. > AH72000@cluster-srv2's password: As I mentioned earlier, you need to be able to be able to run ssh cluster-srv2 uptime without being prompted for a password before Open MPI will work properly. If you're still having problems after fixing this, please send all the information from the "help" URL I sent earlier. Thanks! On Apr 22, 2009, at 3:24 PM, Luis Vitorio Cargnini wrote: ok this is the debug information debug running on 5 nodes (trying at least), the process is locked until now: each node is composed by two quad-core microprocessors. (don't finish), one node yet asked me the password. I have the home partition mounted (the same) in all nodes. so login in cluster- srv[0-4] is the same thing, I generated the key in each node, in different files and added all to the authorized_keys, it should be working. That is it, all help is welcome. this is the code been executed: #include #include int main (int argc, char *argv[]) { int rank, size; MPI_Init (&argc, &argv); /* starts MPI */ MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */ MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */ printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize(); return 0; } -- debug: -bash-3.2$ mpirun -v -d -hostfile chosts -np 32 /export/cluster/home/ AH72000/mpi/hello [cluster-srv0:21606] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/0/0 [cluster-srv0:21606] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/0 [cluster-srv0:21606] top: openmpi-sessions-AH72000@cluster-srv0_0 [cluster-srv0:21606] tmp: /tmp [cluster-srv0:21606] mpirun: reset PATH: /export/cluster/appl/x86_64/ llvm/bin:/bin:/sbin:/export/cluster/appl/x86_64/llvm/bin:/usr/local/ llvm/bin:/usr/local/bin:/usr/bin:/usr/sbin:/home/GTI420/AH72000/oe/ bitbake/bin [cluster-srv0:21606] mpirun: reset LD_LIBRARY_PATH: /export/cluster/ appl/x86_64/llvm/lib:/lib64:/lib:/export/cluster/appl/x86_64/llvm/lib:/ usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib AH72000@cluster-srv1's password: AH72000@cluster-srv2's password: AH72000@cluster-srv3's password: [cluster-srv1:07406] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv1_0/35335/0/1 [cluster-srv1:07406] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv1_0/35335/0 [cluster-srv1:07406] top: openmpi-sessions-AH72000@cluster-srv1_0 [cluster-srv1:07406] tmp: /tmp Permission denied, please try again. AH72000@cluster-srv2's password: [cluster-srv3:14230] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv3_0/35335/0/3 [cluster-srv3:14230] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv3_0/35335/0 [cluster-srv3:14230] top: openmpi-sessions-AH72000@cluster-srv3_0 [cluster-srv3:14230] tmp: /tmp Permission denied, please try again. AH72000@cluster-srv2's password: [cluster-srv2:17092] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv2_0/35335/0/2 [cluster-srv2:17092] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv2_0/35335/0 [cluster-srv2:17092] top: openmpi-sessions-AH72000@cluster-srv2_0 [cluster-srv2:17092] tmp: /tmp [cluster-srv0:21606] [[35335,0],0] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv2:170
Re: [OMPI users] few Problems
It looks like you need to fix your password-less ssh problems first: > Permission denied, please try again. > AH72000@cluster-srv2's password: As I mentioned earlier, you need to be able to be able to run ssh cluster-srv2 uptime without being prompted for a password before Open MPI will work properly. If you're still having problems after fixing this, please send all the information from the "help" URL I sent earlier. Thanks! On Apr 22, 2009, at 3:24 PM, Luis Vitorio Cargnini wrote: ok this is the debug information debug running on 5 nodes (trying at least), the process is locked until now: each node is composed by two quad-core microprocessors. (don't finish), one node yet asked me the password. I have the home partition mounted (the same) in all nodes. so login in cluster- srv[0-4] is the same thing, I generated the dsa key in each node, in different files and added all to the authorized_keys, it should be working. That is it, all help is welcome. this is the code been executed: #include #include int main (int argc, char *argv[]) { int rank, size; MPI_Init (&argc, &argv); /* starts MPI */ MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */ MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */ printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize(); return 0; } -- debug: -bash-3.2$ mpirun -v -d -hostfile chosts -np 32 /export/cluster/home/ AH72000/mpi/hello [cluster-srv0:21606] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/0/0 [cluster-srv0:21606] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/0 [cluster-srv0:21606] top: openmpi-sessions-AH72000@cluster-srv0_0 [cluster-srv0:21606] tmp: /tmp [cluster-srv0:21606] mpirun: reset PATH: /export/cluster/appl/x86_64/ llvm/bin:/bin:/sbin:/export/cluster/appl/x86_64/llvm/bin:/usr/local/ llvm/bin:/usr/local/bin:/usr/bin:/usr/sbin:/home/GTI420/AH72000/oe/ bitbake/bin [cluster-srv0:21606] mpirun: reset LD_LIBRARY_PATH: /export/cluster/ appl/x86_64/llvm/lib:/lib64:/lib:/export/cluster/appl/x86_64/llvm/ lib:/ usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib AH72000@cluster-srv1's password: AH72000@cluster-srv2's password: AH72000@cluster-srv3's password: [cluster-srv1:07406] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv1_0/35335/0/1 [cluster-srv1:07406] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv1_0/35335/0 [cluster-srv1:07406] top: openmpi-sessions-AH72000@cluster-srv1_0 [cluster-srv1:07406] tmp: /tmp Permission denied, please try again. AH72000@cluster-srv2's password: [cluster-srv3:14230] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv3_0/35335/0/3 [cluster-srv3:14230] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv3_0/35335/0 [cluster-srv3:14230] top: openmpi-sessions-AH72000@cluster-srv3_0 [cluster-srv3:14230] tmp: /tmp Permission denied, please try again. AH72000@cluster-srv2's password: [cluster-srv2:17092] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv2_0/35335/0/2 [cluster-srv2:17092] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv2_0/35335/0 [cluster-srv2:17092] top: openmpi-sessions-AH72000@cluster-srv2_0 [cluster-srv2:17092] tmp: /tmp [cluster-srv0:21606] [[35335,0],0] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv0:21611] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1/0 [cluster-srv0:21611] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1 [cluster-srv0:21611] top: openmpi-sessions-AH72000@cluster-srv0_0 [cluster-srv0:21611] tmp: /tmp [cluster-srv0:21613] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1/2 [cluster-srv0:21613] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1 [cluster-srv0:21613] top: openmpi-sessions-AH72000@c
Re: [OMPI users] few Problems
ok this is the debug information debug running on 5 nodes (trying at least), the process is locked until now: each node is composed by two quad-core microprocessors. (don't finish), one node yet asked me the password. I have the home partition mounted (the same) in all nodes. so login in cluster- srv[0-4] is the same thing, I generated the dsa key in each node, in different files and added all to the authorized_keys, it should be working. That is it, all help is welcome. this is the code been executed: #include #include int main (int argc, char *argv[]) { int rank, size; MPI_Init (&argc, &argv); /* starts MPI */ MPI_Comm_rank (MPI_COMM_WORLD, &rank);/* get current process id */ MPI_Comm_size (MPI_COMM_WORLD, &size);/* get number of processes */ printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize(); return 0; } -- debug: -bash-3.2$ mpirun -v -d -hostfile chosts -np 32 /export/cluster/home/ AH72000/mpi/hello [cluster-srv0:21606] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/0/0 [cluster-srv0:21606] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/0 [cluster-srv0:21606] top: openmpi-sessions-AH72000@cluster-srv0_0 [cluster-srv0:21606] tmp: /tmp [cluster-srv0:21606] mpirun: reset PATH: /export/cluster/appl/x86_64/ llvm/bin:/bin:/sbin:/export/cluster/appl/x86_64/llvm/bin:/usr/local/ llvm/bin:/usr/local/bin:/usr/bin:/usr/sbin:/home/GTI420/AH72000/oe/ bitbake/bin [cluster-srv0:21606] mpirun: reset LD_LIBRARY_PATH: /export/cluster/ appl/x86_64/llvm/lib:/lib64:/lib:/export/cluster/appl/x86_64/llvm/lib:/ usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib AH72000@cluster-srv1's password: AH72000@cluster-srv2's password: AH72000@cluster-srv3's password: [cluster-srv1:07406] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv1_0/35335/0/1 [cluster-srv1:07406] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv1_0/35335/0 [cluster-srv1:07406] top: openmpi-sessions-AH72000@cluster-srv1_0 [cluster-srv1:07406] tmp: /tmp Permission denied, please try again. AH72000@cluster-srv2's password: [cluster-srv3:14230] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv3_0/35335/0/3 [cluster-srv3:14230] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv3_0/35335/0 [cluster-srv3:14230] top: openmpi-sessions-AH72000@cluster-srv3_0 [cluster-srv3:14230] tmp: /tmp Permission denied, please try again. AH72000@cluster-srv2's password: [cluster-srv2:17092] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv2_0/35335/0/2 [cluster-srv2:17092] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv2_0/35335/0 [cluster-srv2:17092] top: openmpi-sessions-AH72000@cluster-srv2_0 [cluster-srv2:17092] tmp: /tmp [cluster-srv0:21606] [[35335,0],0] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv0:21606] [[35335,0],0] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv1:07406] [[35335,0],1] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv2:17092] [[35335,0],2] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv0:21611] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1/0 [cluster-srv0:21611] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1 [cluster-srv0:21611] top: openmpi-sessions-AH72000@cluster-srv0_0 [cluster-srv0:21611] tmp: /tmp [cluster-srv0:21613] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1/2 [cluster-srv0:21613] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1 [cluster-srv0:21613] top: openmpi-sessions-AH72000@cluster-srv0_0 [cluster-srv0:21613] tmp: /tmp [cluster-srv0:21612] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1/1 [cluster-srv0:21612] jobdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1 [cluster-srv0:21612] top: openmpi-sessions-AH72000@cluster-srv0_0 [cluster-srv0:21612] tmp: /tmp [cluster-srv0:21614] procdir: /tmp/openmpi-sessions-AH72000@cluster- srv0_0/35335/1/3 [cluster-srv0:21614] jobdir: /tmp/openmpi-sessions-AH72000@c
Re: [OMPI users] few Problems
This isn't really enough information for us to help you. Can you send all the information here: http://www.open-mpi.org/community/help/ Thanks. On Apr 21, 2009, at 10:34 AM, Luis Vitorio Cargnini wrote: Hi, Please someone can answer me which can be this problem ? daemon INVALID arch ffc91200 the debug output: [[41704,1],14] node[4].name cluster-srv4 daemon INVALID arch ffc91200 [cluster-srv3:09684] [[41704,1],13] node[0].name cluster-srv0 daemon 0 arch ffc91200 [cluster-srv3:09684] [[41704,1],13] node[1].name cluster-srv1 daemon 1 arch ffc91200 [cluster-srv3:09684] [[41704,1],13] node[2].name cluster-srv2 daemon 2 arch ffc91200 [cluster-srv3:09684] [[41704,1],13] node[3].name cluster-srv3 daemon 3 arch ffc91200 [cluster-srv3:09684] [[41704,1],13] node[4].name cluster-srv4 daemon INVALID arch ffc91200 ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 105 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems