Re: [OMPI users] Ompi failing on mx only
On Jan 8, 2007, at 9:34 PM, Reese Faucette wrote: Right, that's the maximum number of open MX channels, i.e. processes than can run on the node using MX. With MX (1.2.0c I think), I get weird messages if I run a second mpirun quickly after the first one failed. The myrinet guys, I quite sure, can explain why and how. Somehow, when an application segfault while the MX port is open things are not cleaned up right away. It take few seconds (not more than one minute) to have everything running correctly after that. Supposedly I am a "myrinet guy" ;-) Yeah, the endpoint cleanup stuff could take a few seconds after an ungraceful exit. But, if you're getting some behavior that looks like you ought not be getting, please let us know! I think it make sense what I get. If I loop in a script starting mpiruns and one of the run segfault, the next one usually is unable to open the MX endpoints. That's happens only if I run 4 processes by node, where 4 is the number of instances as reported by mx_info. If I put a sleep of 30 seconds between my runs, then everything runs just fine. george. -reese Myricom, Inc. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Ompi failing on mx only
Right, that's the maximum number of open MX channels, i.e. processes than can run on the node using MX. With MX (1.2.0c I think), I get weird messages if I run a second mpirun quickly after the first one failed. The myrinet guys, I quite sure, can explain why and how. Somehow, when an application segfault while the MX port is open things are not cleaned up right away. It take few seconds (not more than one minute) to have everything running correctly after that. Supposedly I am a "myrinet guy" ;-) Yeah, the endpoint cleanup stuff could take a few seconds after an ungraceful exit. But, if you're getting some behavior that looks like you ought not be getting, please let us know! -reese Myricom, Inc.
Re: [OMPI users] Ompi failing on mx only
On Jan 8, 2007, at 9:11 PM, Reese Faucette wrote: Second thing. From one of your previous emails, I see that MX is configured with 4 instance by node. Your running with exactly 4 processes on the first 2 nodes. Weirds things might happens ... 4 processes per node will be just fine. This is not like GM where the 4 includes some "reserved" ports. Right, that's the maximum number of open MX channels, i.e. processes than can run on the node using MX. With MX (1.2.0c I think), I get weird messages if I run a second mpirun quickly after the first one failed. The myrinet guys, I quite sure, can explain why and how. Somehow, when an application segfault while the MX port is open things are not cleaned up right away. It take few seconds (not more than one minute) to have everything running correctly after that. george.
Re: [OMPI users] Ompi failing on mx only
Second thing. From one of your previous emails, I see that MX is configured with 4 instance by node. Your running with exactly 4 processes on the first 2 nodes. Weirds things might happens ... 4 processes per node will be just fine. This is not like GM where the 4 includes some "reserved" ports. -reese
Re: [OMPI users] Ompi failing on mx only
Not really. This is the backtrace of the process that get killed because mpirun detect that the other one died ... What I need it's the backtrace on the process which generate the segfault. Second, in order to understand the backtrace, it's better to have run debug version of Open MPI. Without the debug version we only see the address where the fault occur without having access to the line number ... Thanks, george. On Mon, 8 Jan 2007, Grobe, Gary L. \(JSC-EV\)[ESCG] wrote: PS: Is there any way you can attach to the processes with gdb ? I would like to see the backtrace as showed by gdb in order to be able to figure out what's wrong there. I found out that all processes on the 2nd node crash so I just put a 30 second wait before MPI_Init in order to attach gdb and go from there. The code in cpi starts off as follows (in order to show where the SIGTERM below is coming from). MPI_Init(,); MPI_Comm_size(MPI_COMM_WORLD,); MPI_Comm_rank(MPI_COMM_WORLD,); MPI_Get_processor_name(processor_name,); --- Attaching to process 11856 Reading symbols from /home/ggrobe/Projects/ompi/cpi/cpi...done. Using host libthread_db library "/lib/libthread_db.so.1". Reading symbols from /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0...done. Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0 Reading symbols from /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0...done. Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0 Reading symbols from /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0...done. Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib64/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib64/libutil.so.1...done. Loaded symbols for /lib/libutil.so.1 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib64/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 46974166086512 (LWP 11856)] Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x2ab90661e880 in nanosleep () from /lib/libc.so.6 (gdb) break MPI_Init Breakpoint 1 at 0x2ab905c0c880 (gdb) break MPI_Comm_size Breakpoint 2 at 0x2ab905c01af0 (gdb) continue Continuing. [Switching to Thread 46974166086512 (LWP 11856)] Breakpoint 1, 0x2ab905c0c880 in PMPI_Init () from /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0 (gdb) n Single stepping until exit from function PMPI_Init, which has no line number information. [New Thread 1082132816 (LWP 11862)] Program received signal SIGTERM, Terminated. 0x2ab906643f47 in ioctl () from /lib/libc.so.6 (gdb) backtrace #0 0x2ab906643f47 in ioctl () from /lib/libc.so.6 Cannot access memory at address 0x7fffa50102f8 --- Does this help in anyway? ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users "We must accept finite disappointment, but we must never lose infinite hope." Martin Luther King
Re: [OMPI users] Ompi failing on mx only
> >> PS: Is there any way you can attach to the processes with gdb ? I > >> would like to see the backtrace as showed by gdb in order > to be able > >> to figure out what's wrong there. > > I found out that all processes on the 2nd node crash so I just put a 30 second wait before MPI_Init in order to attach gdb and go from there. The code in cpi starts off as follows (in order to show where the SIGTERM below is coming from). MPI_Init(,); MPI_Comm_size(MPI_COMM_WORLD,); MPI_Comm_rank(MPI_COMM_WORLD,); MPI_Get_processor_name(processor_name,); --- Attaching to process 11856 Reading symbols from /home/ggrobe/Projects/ompi/cpi/cpi...done. Using host libthread_db library "/lib/libthread_db.so.1". Reading symbols from /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0...done. Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0 Reading symbols from /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0...done. Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0 Reading symbols from /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0...done. Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib64/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib64/libutil.so.1...done. Loaded symbols for /lib/libutil.so.1 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib64/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 46974166086512 (LWP 11856)] Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x2ab90661e880 in nanosleep () from /lib/libc.so.6 (gdb) break MPI_Init Breakpoint 1 at 0x2ab905c0c880 (gdb) break MPI_Comm_size Breakpoint 2 at 0x2ab905c01af0 (gdb) continue Continuing. [Switching to Thread 46974166086512 (LWP 11856)] Breakpoint 1, 0x2ab905c0c880 in PMPI_Init () from /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0 (gdb) n Single stepping until exit from function PMPI_Init, which has no line number information. [New Thread 1082132816 (LWP 11862)] Program received signal SIGTERM, Terminated. 0x2ab906643f47 in ioctl () from /lib/libc.so.6 (gdb) backtrace #0 0x2ab906643f47 in ioctl () from /lib/libc.so.6 Cannot access memory at address 0x7fffa50102f8 --- Does this help in anyway?
Re: [OMPI users] external32 i/o not implemented?
Rainer, Thank you for taking time to reply to my querry. Do I understand correctly that external32 data representation for i/o is not implemented? I am puzzled since the MPI-2 standard clearly indicates the existence of external32 and has lots of words regarding how nice this feature is for file interoperability. So do both Open MPI and MPIch2 not adhere to the standard in this regard? If this is really the case, how difficult is it to define a custom data representation that is 32-bit big endian on all platforms? Do you know of any documentation that explains how to do this? Thanks again. ---Tom Rainer Keller wrote: Hello Tom, like MPIch2, Open MPI also uses ROMIO as underlying MPI-IO implementation as an mca. ROMIO implements the native datarep. With best regards, Rainer On Friday 05 January 2007 20:38, l...@cora.nwra.com wrote: Hi, I am attempting to use the 'external32' data representation in order read and write portable data files. I believe I understand how to do this, but I receive the following run-time error from the mpi_file_set_view call: MPI_FILE_SET_VIEW (line 118): **unsupporteddatarep If I replace 'external32' with 'native' in the mpi_file_set_view call then everything works, but the data file is written in little endian order on my Opteron cluster. Just for grins I also tried 'internal' but this produces the unsupporteddatarep error as well. Is the 'external32' data type implemented? Do I need to do something else to access it? I looked in the FAQs as well as the mailing list archives but I can not seem to find any threads discussing this issue. I would greatly appreciate any advice. I have attached my sample fortran codes (explicit_write.f, explicit_read.f, Makefkile) as well as the config.log, output of ompi_info, and my environment variable settings. I am running Fedora Core 4 with the 2.6.17-1.2142_FC4smp kernel. Thanks, ---Tom -- === Thomas S. Lund Sr. Research Scientist Colorado Research Associates, a division of NorthWest Research Associates 3380 Mitchell Ln. Boulder, CO 80301 (303) 415-9701 X 209 (voice) (303) 415-9702 (fax) l...@cora.nwra.com ===
Re: [OMPI users] Ompi failing on mx only
On Mon, Jan 08, 2007 at 03:07:57PM -0500, Jeff Squyres wrote: > if you're running in an ssh environment, you generally have 2 choices to > attach serial debuggers: > > 1. Put a loop in your app that pauses until you can attach a > debugger. Perhaps something like this: > > { int i = 0; printf("pid %d ready\n", getpid()); while (0 == i) sleep > (5); } > > Kludgey and horrible, but it works. > > 2. mpirun an xterm with gdb. If one of the participating hosts is the localhost and it's sufficient to debug only one process, it's even possible to call gdb directly: adi@ipc654~$ mpirun -np 2 -host ipc654,dana \ sh -c 'if [[ $(hostname) == "ipc654" ]]; then gdb test/vm/ring; \ else test/vm/ring ; fi ' (also works great with ddd). -- Cluster and Metacomputing Working Group Friedrich-Schiller-Universität Jena, Germany private: http://adi.thur.de
Re: [OMPI users] Ompi failing on mx only
> >> PS: Is there any way you can attach to the processes with gdb ? I > >> would like to see the backtrace as showed by gdb in order > to be able > >> to figure out what's wrong there. > > > > When I can get more detailed dbg, I'll send. Though I'm not > clear on > > what executable is being searched for below. > > > > $ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x > > LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 > --mca pml > > cm --mca mtl mx ./cpi > > FWIW, note that "-dbg" is not a recognized Open MPI mpirun > command line switch -- after all the debugging information, > Open MPI finally gets to telling you: > Sorry, wrong mpi, ok ... Fwiw, here's a working crash w/ just the -d option. The problem I'm trying to get to right now is how to dbg the 2nd process on the 2nd node since that's where the crash is always happening. One process past the 1st node works find (5 procs w/ 4 per node), but when a second process on the 2nd node starts or anything more than that, the crashes will occur. $ mpirun -d --prefix /usr/local/openmpi-1.2b3r13030 -x LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 6 --mca pml cm --mca mtl mx ./cpi > dbg.out 2>&1 [juggernaut:15087] connect_uni: connection not allowed [juggernaut:15087] connect_uni: connection not allowed [juggernaut:15087] connect_uni: connection not allowed [juggernaut:15087] connect_uni: connection not allowed [juggernaut:15087] connect_uni: connection not allowed [juggernaut:15087] connect_uni: connection not allowed [juggernaut:15087] connect_uni: connection not allowed [juggernaut:15087] connect_uni: connection not allowed [juggernaut:15087] connect_uni: connection not allowed [juggernaut:15087] [0,0,0] setting up session dir with [juggernaut:15087] universe default-universe-15087 [juggernaut:15087] user ggrobe [juggernaut:15087] host juggernaut [juggernaut:15087] jobid 0 [juggernaut:15087] procid 0 [juggernaut:15087] procdir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-15087/0/0 [juggernaut:15087] jobdir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-15087/0 [juggernaut:15087] unidir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-15087 [juggernaut:15087] top: openmpi-sessions-ggrobe@juggernaut_0 [juggernaut:15087] tmp: /tmp [juggernaut:15087] [0,0,0] contact_file /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-15087/univers e-setup.txt [juggernaut:15087] [0,0,0] wrote setup file [juggernaut:15087] pls:rsh: local csh: 0, local sh: 1 [juggernaut:15087] pls:rsh: assuming same remote shell as local shell [juggernaut:15087] pls:rsh: remote csh: 0, remote sh: 1 [juggernaut:15087] pls:rsh: final template argv: [juggernaut:15087] pls:rsh: /usr/bin/ssh orted --debug --bootproxy 1 --name --num_procs 3 --vpid_start 0 --nodename --universe ggrobe@juggernaut:default-universe-15087 --nsreplica "0.0.0;tcp://192.168.2.10:52099" --gprreplica "0.0.0;tcp://192.168.2.10:52099" [juggernaut:15087] pls:rsh: launching on node node-1 [juggernaut:15087] pls:rsh: node-1 is a REMOTE node [juggernaut:15087] pls:rsh: executing: /usr/bin/ssh node-1 PATH=/usr/local/openmpi-1.2b3r13030/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/openmpi-1.2b3r13030/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; /usr/local/openmpi-1.2b3r13030/bin/orted --debug --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 --nodename node-1 --universe ggrobe@juggernaut:default-universe-15087 --nsreplica "0.0.0;tcp://192.168.2.10:52099" --gprreplica "0.0.0;tcp://192.168.2.10:52099" [juggernaut:15087] pls:rsh: launching on node node-2 [juggernaut:15087] pls:rsh: node-2 is a REMOTE node [juggernaut:15087] pls:rsh: executing: /usr/bin/ssh node-2 PATH=/usr/local/openmpi-1.2b3r13030/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/openmpi-1.2b3r13030/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; /usr/local/openmpi-1.2b3r13030/bin/orted --debug --bootproxy 1 --name 0.0.2 --num_procs 3 --vpid_start 0 --nodename node-2 --universe ggrobe@juggernaut:default-universe-15087 --nsreplica "0.0.0;tcp://192.168.2.10:52099" --gprreplica "0.0.0;tcp://192.168.2.10:52099" [node-2:11499] [0,0,2] setting up session dir with [node-2:11499] universe default-universe-15087 [node-2:11499] user ggrobe [node-2:11499] host node-2 [node-2:11499] jobid 0 [node-2:11499] procid 2 [node-1:10307] procdir: /tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-15087/0/1 [node-1:10307] jobdir: /tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-15087/0 [node-1:10307] unidir: /tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-15087 [node-1:10307] top: openmpi-sessions-ggrobe@node-1_0 [node-2:11499] procdir: /tmp/openmpi-sessions-ggrobe@node-2_0/default-universe-15087/0/2 [node-2:11499] jobdir: /tmp/openmpi-sessions-ggrobe@node-2_0/default-universe-15087/0 [node-2:11499] unidir: /tmp/openmpi-sessions-ggrobe@node-2_0/default-universe-15087 [node-2:11499] top: openmpi-sessions-ggrobe@node-2_0 [node-2:11499] tmp:
Re: [OMPI users] Ompi failing on mx only
On Jan 8, 2007, at 2:52 PM, Grobe, Gary L. ((JSC-EV))[ESCG] wrote: I was wondering if someone could send me the HACKING file so I can do a bit more with debugging on the snapshots. Our web proxy has webdav methods turned off (request methods fail) so that I can't get to the latest of the svn repos. Bummer. :-( You are definitely falling victim to the fact that or nightly snapshots have been less-than-stable recently. Sorry [again] about that! FWIW, there's two ways to browse the source in the repository without an SVN checkout: - you can just point a normal web browser to our SVN repository (I'm pretty sure that doesn't use DAV, but I'm not 100% sure...), e.g.: https://svn.open-mpi.org/svn/ompi/trunk/HACKING - you can use our Trac SVN browser, e.g.: https://svn.open-mpi.org/ trac/ompi/browser/trunk/HACKING (there's a link at the bottom to download each file without all the HTML markup). Second thing. From one of your previous emails, I see that MX is configured with 4 instance by node. Your running with exactly 4 processes on the first 2 nodes. Weirds things might happens ... Just curious about this comment. Are you referring to over subscribing? We run 4 processes on each node because we have 2 dual core cpu's on each node. Am I not understanding processor counts correctly? I'll have to defer to Reese on this one... PS: Is there any way you can attach to the processes with gdb ? I would like to see the backtrace as showed by gdb in order to be able to figure out what's wrong there. When I can get more detailed dbg, I'll send. Though I'm not clear on what executable is being searched for below. $ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 --mca pml cm --mca mtl mx ./cpi FWIW, note that "-dbg" is not a recognized Open MPI mpirun command line switch -- after all the debugging information, Open MPI finally gets to telling you: -- -- Failed to find the following executable: Host: juggernaut Executable: -b Cannot continue. -- -- So nothing actually ran in this instance. Our debugging entries on the FAQ (http://www.open-mpi.org/faq/? category=debugging) are fairly inadequate at the moment, but if you're running in an ssh environment, you generally have 2 choices to attach serial debuggers: 1. Put a loop in your app that pauses until you can attach a debugger. Perhaps something like this: { int i = 0; printf("pid %d ready\n", getpid()); while (0 == i) sleep (5); } Kludgey and horrible, but it works. 2. mpirun an xterm with gdb. You'll need to specifically use the -d option to mpirun in order to keep the ssh sessions alive to relay back your X information, or separately setup your X channels yourself (e.g., if you're on a closed network, it may be acceptable to "xhost +" the nodes that you're running on and just manually setup the DISPLAY variable for the target nodes, perhaps via the -x option to mpirun) -- in which case you would not need to use the -d option to mpirun. Make sense? -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] Ompi failing on mx only
I was wondering if someone could send me the HACKING file so I can do a bit more with debugging on the snapshots. Our web proxy has webdav methods turned off (request methods fail) so that I can't get to the latest of the svn repos. > Second thing. From one of your previous emails, I see that MX > is configured with 4 instance by node. Your running with > exactly 4 processes on the first 2 nodes. Weirds things might > happens ... Just curious about this comment. Are you referring to over subscribing? We run 4 processes on each node because we have 2 dual core cpu's on each node. Am I not understanding processor counts correctly? > PS: Is there any way you can attach to the processes with gdb > ? I would like to see the backtrace as showed by gdb in order > to be able to figure out what's wrong there. When I can get more detailed dbg, I'll send. Though I'm not clear on what executable is being searched for below. $ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 --mca pml cm --mca mtl mx ./cpi [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] [0,0,0] setting up session dir with [juggernaut:14949] universe default-universe-14949 [juggernaut:14949] user ggrobe [juggernaut:14949] host juggernaut [juggernaut:14949] jobid 0 [juggernaut:14949] procid 0 [juggernaut:14949] procdir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0/0 [juggernaut:14949] jobdir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0 [juggernaut:14949] unidir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949 [juggernaut:14949] top: openmpi-sessions-ggrobe@juggernaut_0 [juggernaut:14949] tmp: /tmp [juggernaut:14949] [0,0,0] contact_file /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/univers e-setup.txt [juggernaut:14949] [0,0,0] wrote setup file [juggernaut:14949] pls:rsh: local csh: 0, local sh: 1 [juggernaut:14949] pls:rsh: assuming same remote shell as local shell [juggernaut:14949] pls:rsh: remote csh: 0, remote sh: 1 [juggernaut:14949] pls:rsh: final template argv: [juggernaut:14949] pls:rsh: /usr/bin/ssh orted --debug --bootproxy 1 --name --num_procs 2 --vpid_start 0 --nodename --universe ggrobe@juggernaut:default-universe-14949 --nsreplica "0.0.0;tcp://192.168.2.10:43121" --gprreplica "0.0.0;tcp://192.168.2.10:43121" [juggernaut:14949] pls:rsh: launching on node juggernaut [juggernaut:14949] pls:rsh: juggernaut is a LOCAL node [juggernaut:14949] pls:rsh: changing to directory /home/ggrobe [juggernaut:14949] pls:rsh: executing: orted --debug --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename juggernaut --universe ggrobe@juggernaut:default-universe-14949 --nsreplica "0.0.0;tcp://192.168.2.10:43121" --gprreplica "0.0.0;tcp://192.168.2.10:43121" [juggernaut:14950] [0,0,1] setting up session dir with [juggernaut:14950] universe default-universe-14949 [juggernaut:14950] user ggrobe [juggernaut:14950] host juggernaut [juggernaut:14950] jobid 0 [juggernaut:14950] procid 1 [juggernaut:14950] procdir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0/1 [juggernaut:14950] jobdir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0 [juggernaut:14950] unidir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949 [juggernaut:14950] top: openmpi-sessions-ggrobe@juggernaut_0 [juggernaut:14950] tmp: /tmp -- Failed to find the following executable: Host: juggernaut Executable: -b Cannot continue. -- [juggernaut:14950] [0,0,1] ORTE_ERROR_LOG: Fatal in file odls_default_module.c at line 1193 [juggernaut:14949] spawn: in job_state_callback(jobid = 1, state = 0x80) [juggernaut:14950] [0,0,1] ORTE_ERROR_LOG: Fatal in file orted.c at line 575 [juggernaut:14950] sess_dir_finalize: job session dir not empty - leaving [juggernaut:14950] sess_dir_finalize: proc session dir not empty - leaving [juggernaut:14949] sess_dir_finalize: proc session dir not empty - leaving