date:20070108

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread George Bosilca



On Jan 8, 2007, at 9:34 PM, Reese Faucette wrote:


Right, that's the maximum number of open MX channels, i.e. processes
than can run on the node using MX. With MX (1.2.0c I think), I get
weird messages if I run a second mpirun quickly after the first one
failed. The myrinet guys, I quite sure, can explain why and how.
Somehow, when an application segfault while the MX port is open
things are not cleaned up right away. It take few seconds (not more
than one minute) to have everything running correctly after that.


Supposedly I am a "myrinet guy" ;-)  Yeah, the endpoint cleanup  
stuff could
take a few seconds after an ungraceful exit.  But, if you're  
getting some

behavior that looks like you ought not be getting, please let us know!


I think it make sense what I get. If I loop in a script starting  
mpiruns and one of the run segfault, the next one usually is unable  
to open the MX endpoints. That's happens only if I run 4 processes by  
node, where 4 is the number of instances as reported by mx_info. If I  
put a sleep of 30 seconds between my runs, then everything runs just  
fine.


  george.


-reese
Myricom, Inc.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Reese Faucette


Right, that's the maximum number of open MX channels, i.e. processes
than can run on the node using MX. With MX (1.2.0c I think), I get
weird messages if I run a second mpirun quickly after the first one
failed. The myrinet guys, I quite sure, can explain why and how.
Somehow, when an application segfault while the MX port is open
things are not cleaned up right away. It take few seconds (not more
than one minute) to have everything running correctly after that.


Supposedly I am a "myrinet guy" ;-)  Yeah, the endpoint cleanup stuff could 
take a few seconds after an ungraceful exit.  But, if you're getting some 
behavior that looks like you ought not be getting, please let us know!

-reese
Myricom, Inc.

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread George Bosilca



On Jan 8, 2007, at 9:11 PM, Reese Faucette wrote:


Second thing. From one of your previous emails, I see that MX
is configured with 4 instance by node. Your running with
exactly 4 processes on the first 2 nodes. Weirds things might
happens ...


4 processes per node will be just fine.  This is not like GM where  
the 4

includes some "reserved" ports.


Right, that's the maximum number of open MX channels, i.e. processes  
than can run on the node using MX. With MX (1.2.0c I think), I get  
weird messages if I run a second mpirun quickly after the first one  
failed. The myrinet guys, I quite sure, can explain why and how.  
Somehow, when an application segfault while the MX port is open  
things are not cleaned up right away. It take few seconds (not more  
than one minute) to have everything running correctly after that.


  george.

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Reese Faucette


Second thing. From one of your previous emails, I see that MX
is configured with 4 instance by node. Your running with
exactly 4 processes on the first 2 nodes. Weirds things might
happens ...


4 processes per node will be just fine.  This is not like GM where the 4 
includes some "reserved" ports.

-reese

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread George Bosilca



Not really. This is the backtrace of the process that get killed because 
mpirun detect that the other one died ... What I need it's the backtrace 
on the process which generate the segfault. Second, in order to understand 
the backtrace, it's better to have run debug version of Open MPI. Without 
the debug version we only see the address where the fault occur without 
having access to the line number ...


  Thanks,
george.

On Mon, 8 Jan 2007, Grobe, Gary L. \(JSC-EV\)[ESCG] wrote:


PS: Is there any way you can attach to the processes with gdb ? I
would like to see the backtrace as showed by gdb in order

to be able

to figure out what's wrong there.




I found out that all processes on the 2nd node crash so I just put a 30
second wait before MPI_Init in order to attach gdb and go from there.

The code in cpi starts off as follows (in order to show where the
SIGTERM below is coming from).

   MPI_Init(,);
   MPI_Comm_size(MPI_COMM_WORLD,);
   MPI_Comm_rank(MPI_COMM_WORLD,);
   MPI_Get_processor_name(processor_name,);

---

Attaching to process 11856
Reading symbols from /home/ggrobe/Projects/ompi/cpi/cpi...done.
Using host libthread_db library "/lib/libthread_db.so.1".
Reading symbols from
/usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0...done.
Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0
Reading symbols from
/usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0...done.
Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0
Reading symbols from
/usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0...done.
Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib64/libutil.so.1...done.
Loaded symbols for /lib/libutil.so.1
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 46974166086512 (LWP 11856)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x2ab90661e880 in nanosleep () from /lib/libc.so.6
(gdb) break MPI_Init
Breakpoint 1 at 0x2ab905c0c880
(gdb) break MPI_Comm_size
Breakpoint 2 at 0x2ab905c01af0
(gdb) continue
Continuing.
[Switching to Thread 46974166086512 (LWP 11856)]

Breakpoint 1, 0x2ab905c0c880 in PMPI_Init ()
  from /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0
(gdb) n
Single stepping until exit from function PMPI_Init,
which has no line number information.
[New Thread 1082132816 (LWP 11862)]

Program received signal SIGTERM, Terminated.
0x2ab906643f47 in ioctl () from /lib/libc.so.6
(gdb) backtrace
#0  0x2ab906643f47 in ioctl () from /lib/libc.so.6
Cannot access memory at address 0x7fffa50102f8
---

Does this help in anyway?

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



"We must accept finite disappointment, but we must never lose infinite
hope."
  Martin Luther King

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Grobe, Gary L. (JSC-EV)[ESCG]

> >> PS: Is there any way you can attach to the processes with gdb ? I 
> >> would like to see the backtrace as showed by gdb in order 
> to be able 
> >> to figure out what's wrong there.
> >

I found out that all processes on the 2nd node crash so I just put a 30
second wait before MPI_Init in order to attach gdb and go from there.

The code in cpi starts off as follows (in order to show where the
SIGTERM below is coming from).

MPI_Init(,);
MPI_Comm_size(MPI_COMM_WORLD,);
MPI_Comm_rank(MPI_COMM_WORLD,);
MPI_Get_processor_name(processor_name,);

---

Attaching to process 11856
Reading symbols from /home/ggrobe/Projects/ompi/cpi/cpi...done.
Using host libthread_db library "/lib/libthread_db.so.1".
Reading symbols from
/usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0...done.
Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0
Reading symbols from
/usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0...done.
Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0
Reading symbols from
/usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0...done.
Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib64/libutil.so.1...done.
Loaded symbols for /lib/libutil.so.1
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 46974166086512 (LWP 11856)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x2ab90661e880 in nanosleep () from /lib/libc.so.6
(gdb) break MPI_Init
Breakpoint 1 at 0x2ab905c0c880
(gdb) break MPI_Comm_size
Breakpoint 2 at 0x2ab905c01af0
(gdb) continue
Continuing.
[Switching to Thread 46974166086512 (LWP 11856)]

Breakpoint 1, 0x2ab905c0c880 in PMPI_Init ()
   from /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0
(gdb) n
Single stepping until exit from function PMPI_Init, 
which has no line number information.
[New Thread 1082132816 (LWP 11862)]

Program received signal SIGTERM, Terminated.
0x2ab906643f47 in ioctl () from /lib/libc.so.6
(gdb) backtrace
#0  0x2ab906643f47 in ioctl () from /lib/libc.so.6
Cannot access memory at address 0x7fffa50102f8
---

Does this help in anyway?

Re: [OMPI users] external32 i/o not implemented?

2007-01-08 Thread Tom Lund


Rainer,
  Thank you for taking time to reply to my querry.  Do I understand 
correctly that external32 data representation for i/o is not 
implemented?  I am puzzled since the MPI-2 standard clearly indicates 
the existence of external32 and has lots of words regarding how nice 
this feature is for file interoperability.  So do both Open MPI and 
MPIch2 not adhere to the standard in this regard?  If this is really the 
case, how difficult is it to define a custom data representation that is 
32-bit big endian on all platforms?  Do you know of any documentation 
that explains how to do this?

  Thanks again.

   ---Tom

Rainer Keller wrote:

Hello Tom,
like MPIch2, Open MPI also uses ROMIO as underlying MPI-IO implementation as 
an mca. ROMIO implements the native datarep.


With best regards,
Rainer


On Friday 05 January 2007 20:38, l...@cora.nwra.com wrote:
  

Hi,
   I am attempting to use the 'external32' data representation in order
read and write portable data files.  I believe I understand how to do
this, but I receive the following run-time error from the
mpi_file_set_view call:

MPI_FILE_SET_VIEW (line 118): **unsupporteddatarep

If I replace 'external32' with 'native' in the mpi_file_set_view call then
everything works, but the data file is written in little endian order on
my Opteron cluster.  Just for grins I also tried 'internal' but this
produces the unsupporteddatarep error as well.

Is the 'external32' data type implemented?  Do I need to do something else
to access it?  I looked in the FAQs as well as the mailing list archives
but I can not seem to find any threads discussing this issue.  I would
greatly appreciate any advice.

I have attached my sample fortran codes (explicit_write.f,
explicit_read.f, Makefkile) as well as the config.log, output of
ompi_info, and my environment variable settings.  I am running Fedora Core
4 with the 2.6.17-1.2142_FC4smp kernel.

Thanks,

   ---Tom



  



--
===
  Thomas S. Lund
  Sr. Research Scientist
  Colorado Research Associates, a division of
  NorthWest Research Associates
  3380 Mitchell Ln.
  Boulder, CO 80301
  (303) 415-9701 X 209 (voice)
  (303) 415-9702   (fax)
  l...@cora.nwra.com
===

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Adrian Knoth

On Mon, Jan 08, 2007 at 03:07:57PM -0500, Jeff Squyres wrote:

> if you're running in an ssh environment, you generally have 2 choices to  
> attach serial debuggers:
> 
> 1. Put a loop in your app that pauses until you can attach a  
> debugger.  Perhaps something like this:
> 
> { int i = 0; printf("pid %d ready\n", getpid()); while (0 == i) sleep  
> (5); }
> 
> Kludgey and horrible, but it works.
> 
> 2. mpirun an xterm with gdb. 

If one of the participating hosts is the localhost and it's
sufficient to debug only one process, it's even possible to
call gdb directly:

adi@ipc654~$ mpirun -np 2 -host ipc654,dana \
   sh -c 'if [[ $(hostname)  == "ipc654" ]]; then gdb test/vm/ring; \
  else test/vm/ring ; fi '


(also works great with ddd).


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Grobe, Gary L. (JSC-EV)[ESCG]

> >> PS: Is there any way you can attach to the processes with gdb ? I 
> >> would like to see the backtrace as showed by gdb in order 
> to be able 
> >> to figure out what's wrong there.
> >
> > When I can get more detailed dbg, I'll send. Though I'm not 
> clear on 
> > what executable is being searched for below.
> >
> > $ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x 
> > LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 
> --mca pml 
> > cm --mca mtl mx ./cpi
> 
> FWIW, note that "-dbg" is not a recognized Open MPI mpirun 
> command line switch -- after all the debugging information, 
> Open MPI finally gets to telling you:
> 

Sorry, wrong mpi, ok ... Fwiw, here's a working crash w/ just the -d
option. The problem I'm trying to get to right now is how to dbg the 2nd
process on the 2nd node since that's where the crash is always
happening. One process past the 1st node works find (5 procs w/ 4 per
node), but when a second process on the 2nd node starts or anything more
than that, the crashes will occur.

$ mpirun -d --prefix /usr/local/openmpi-1.2b3r13030 -x
LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 6 --mca pml cm
--mca mtl mx ./cpi > dbg.out 2>&1

[juggernaut:15087] connect_uni: connection not allowed
[juggernaut:15087] connect_uni: connection not allowed
[juggernaut:15087] connect_uni: connection not allowed
[juggernaut:15087] connect_uni: connection not allowed
[juggernaut:15087] connect_uni: connection not allowed
[juggernaut:15087] connect_uni: connection not allowed
[juggernaut:15087] connect_uni: connection not allowed
[juggernaut:15087] connect_uni: connection not allowed
[juggernaut:15087] connect_uni: connection not allowed
[juggernaut:15087] [0,0,0] setting up session dir with
[juggernaut:15087]  universe default-universe-15087
[juggernaut:15087]  user ggrobe
[juggernaut:15087]  host juggernaut
[juggernaut:15087]  jobid 0
[juggernaut:15087]  procid 0
[juggernaut:15087] procdir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-15087/0/0
[juggernaut:15087] jobdir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-15087/0
[juggernaut:15087] unidir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-15087
[juggernaut:15087] top: openmpi-sessions-ggrobe@juggernaut_0
[juggernaut:15087] tmp: /tmp
[juggernaut:15087] [0,0,0] contact_file
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-15087/univers
e-setup.txt
[juggernaut:15087] [0,0,0] wrote setup file
[juggernaut:15087] pls:rsh: local csh: 0, local sh: 1
[juggernaut:15087] pls:rsh: assuming same remote shell as local shell
[juggernaut:15087] pls:rsh: remote csh: 0, remote sh: 1
[juggernaut:15087] pls:rsh: final template argv:
[juggernaut:15087] pls:rsh: /usr/bin/ssh  orted --debug
--bootproxy 1 --name  --num_procs 3 --vpid_start 0 --nodename
 --universe ggrobe@juggernaut:default-universe-15087
--nsreplica "0.0.0;tcp://192.168.2.10:52099" --gprreplica
"0.0.0;tcp://192.168.2.10:52099"
[juggernaut:15087] pls:rsh: launching on node node-1
[juggernaut:15087] pls:rsh: node-1 is a REMOTE node
[juggernaut:15087] pls:rsh: executing: /usr/bin/ssh node-1
PATH=/usr/local/openmpi-1.2b3r13030/bin:$PATH ; export PATH ;
LD_LIBRARY_PATH=/usr/local/openmpi-1.2b3r13030/lib:$LD_LIBRARY_PATH ;
export LD_LIBRARY_PATH ; /usr/local/openmpi-1.2b3r13030/bin/orted
--debug --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0
--nodename node-1 --universe ggrobe@juggernaut:default-universe-15087
--nsreplica "0.0.0;tcp://192.168.2.10:52099" --gprreplica
"0.0.0;tcp://192.168.2.10:52099"
[juggernaut:15087] pls:rsh: launching on node node-2
[juggernaut:15087] pls:rsh: node-2 is a REMOTE node
[juggernaut:15087] pls:rsh: executing: /usr/bin/ssh node-2
PATH=/usr/local/openmpi-1.2b3r13030/bin:$PATH ; export PATH ;
LD_LIBRARY_PATH=/usr/local/openmpi-1.2b3r13030/lib:$LD_LIBRARY_PATH ;
export LD_LIBRARY_PATH ; /usr/local/openmpi-1.2b3r13030/bin/orted
--debug --bootproxy 1 --name 0.0.2 --num_procs 3 --vpid_start 0
--nodename node-2 --universe ggrobe@juggernaut:default-universe-15087
--nsreplica "0.0.0;tcp://192.168.2.10:52099" --gprreplica
"0.0.0;tcp://192.168.2.10:52099"
[node-2:11499] [0,0,2] setting up session dir with
[node-2:11499]  universe default-universe-15087
[node-2:11499]  user ggrobe
[node-2:11499]  host node-2
[node-2:11499]  jobid 0
[node-2:11499]  procid 2
[node-1:10307] procdir:
/tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-15087/0/1
[node-1:10307] jobdir:
/tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-15087/0
[node-1:10307] unidir:
/tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-15087
[node-1:10307] top: openmpi-sessions-ggrobe@node-1_0
[node-2:11499] procdir:
/tmp/openmpi-sessions-ggrobe@node-2_0/default-universe-15087/0/2
[node-2:11499] jobdir:
/tmp/openmpi-sessions-ggrobe@node-2_0/default-universe-15087/0
[node-2:11499] unidir:
/tmp/openmpi-sessions-ggrobe@node-2_0/default-universe-15087
[node-2:11499] top: openmpi-sessions-ggrobe@node-2_0
[node-2:11499] tmp:

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Jeff Squyres


On Jan 8, 2007, at 2:52 PM, Grobe, Gary L. ((JSC-EV))[ESCG] wrote:

I was wondering if someone could send me the HACKING file so I can  
do a

bit more with debugging on the snapshots. Our web proxy has webdav
methods turned off (request methods fail) so that I can't get to the
latest of the svn repos.


Bummer.  :-(  You are definitely falling victim to the fact that or  
nightly snapshots have been less-than-stable recently.  Sorry [again]  
about that!


FWIW, there's two ways to browse the source in the repository without  
an SVN checkout:


- you can just point a normal web browser to our SVN repository (I'm  
pretty sure that doesn't use DAV, but I'm not 100% sure...), e.g.:  
https://svn.open-mpi.org/svn/ompi/trunk/HACKING


- you can use our Trac SVN browser, e.g.: https://svn.open-mpi.org/ 
trac/ompi/browser/trunk/HACKING (there's a link at the bottom to  
download each file without all the HTML markup).



Second thing. From one of your previous emails, I see that MX
is configured with 4 instance by node. Your running with
exactly 4 processes on the first 2 nodes. Weirds things might
happens ...


Just curious about this comment. Are you referring to over  
subscribing?

We run 4 processes on each node because we have 2 dual core cpu's on
each node. Am I not understanding processor counts correctly?


I'll have to defer to Reese on this one...


PS: Is there any way you can attach to the processes with gdb
? I would like to see the backtrace as showed by gdb in order
to be able to figure out what's wrong there.


When I can get more detailed dbg, I'll send. Though I'm not clear on
what executable is being searched for below.

$ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x
LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 --mca  
pml cm

--mca mtl mx ./cpi


FWIW, note that "-dbg" is not a recognized Open MPI mpirun command  
line switch -- after all the debugging information, Open MPI finally  
gets to telling you:


-- 
--

Failed to find the following executable:

Host:   juggernaut
Executable: -b

Cannot continue.
-- 
--


So nothing actually ran in this instance.

Our debugging entries on the FAQ (http://www.open-mpi.org/faq/? 
category=debugging) are fairly inadequate at the moment, but if  
you're running in an ssh environment, you generally have 2 choices to  
attach serial debuggers:


1. Put a loop in your app that pauses until you can attach a  
debugger.  Perhaps something like this:


{ int i = 0; printf("pid %d ready\n", getpid()); while (0 == i) sleep  
(5); }


Kludgey and horrible, but it works.

2. mpirun an xterm with gdb.  You'll need to specifically use the -d  
option to mpirun in order to keep the ssh sessions alive to relay  
back your X information, or separately setup your X channels yourself  
(e.g., if you're on a closed network, it may be acceptable to "xhost  
+" the nodes that you're running on and just manually setup the  
DISPLAY variable for the target nodes, perhaps via the -x option to  
mpirun) -- in which case you would not need to use the -d option to  
mpirun.


Make sense?

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Grobe, Gary L. (JSC-EV)[ESCG]

I was wondering if someone could send me the HACKING file so I can do a
bit more with debugging on the snapshots. Our web proxy has webdav
methods turned off (request methods fail) so that I can't get to the
latest of the svn repos.

> Second thing. From one of your previous emails, I see that MX 
> is configured with 4 instance by node. Your running with 
> exactly 4 processes on the first 2 nodes. Weirds things might 
> happens ...

Just curious about this comment. Are you referring to over subscribing?
We run 4 processes on each node because we have 2 dual core cpu's on
each node. Am I not understanding processor counts correctly?

> PS: Is there any way you can attach to the processes with gdb 
> ? I would like to see the backtrace as showed by gdb in order 
> to be able to figure out what's wrong there.

When I can get more detailed dbg, I'll send. Though I'm not clear on
what executable is being searched for below.

$ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x
LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 --mca pml cm
--mca mtl mx ./cpi

[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] [0,0,0] setting up session dir with
[juggernaut:14949]  universe default-universe-14949
[juggernaut:14949]  user ggrobe
[juggernaut:14949]  host juggernaut
[juggernaut:14949]  jobid 0
[juggernaut:14949]  procid 0
[juggernaut:14949] procdir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0/0
[juggernaut:14949] jobdir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0
[juggernaut:14949] unidir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949
[juggernaut:14949] top: openmpi-sessions-ggrobe@juggernaut_0
[juggernaut:14949] tmp: /tmp
[juggernaut:14949] [0,0,0] contact_file
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/univers
e-setup.txt
[juggernaut:14949] [0,0,0] wrote setup file
[juggernaut:14949] pls:rsh: local csh: 0, local sh: 1
[juggernaut:14949] pls:rsh: assuming same remote shell as local shell
[juggernaut:14949] pls:rsh: remote csh: 0, remote sh: 1
[juggernaut:14949] pls:rsh: final template argv:
[juggernaut:14949] pls:rsh: /usr/bin/ssh  orted --debug
--bootproxy 1 --name  --num_procs 2 --vpid_start 0 --nodename
 --universe ggrobe@juggernaut:default-universe-14949
--nsreplica "0.0.0;tcp://192.168.2.10:43121" --gprreplica
"0.0.0;tcp://192.168.2.10:43121"
[juggernaut:14949] pls:rsh: launching on node juggernaut
[juggernaut:14949] pls:rsh: juggernaut is a LOCAL node
[juggernaut:14949] pls:rsh: changing to directory /home/ggrobe
[juggernaut:14949] pls:rsh: executing: orted --debug --bootproxy 1
--name 0.0.1 --num_procs 2 --vpid_start 0 --nodename juggernaut
--universe ggrobe@juggernaut:default-universe-14949 --nsreplica
"0.0.0;tcp://192.168.2.10:43121" --gprreplica
"0.0.0;tcp://192.168.2.10:43121"
[juggernaut:14950] [0,0,1] setting up session dir with
[juggernaut:14950]  universe default-universe-14949
[juggernaut:14950]  user ggrobe
[juggernaut:14950]  host juggernaut
[juggernaut:14950]  jobid 0
[juggernaut:14950]  procid 1
[juggernaut:14950] procdir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0/1
[juggernaut:14950] jobdir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0
[juggernaut:14950] unidir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949
[juggernaut:14950] top: openmpi-sessions-ggrobe@juggernaut_0
[juggernaut:14950] tmp: /tmp

--
Failed to find the following executable:

Host:   juggernaut
Executable: -b

Cannot continue.

--
[juggernaut:14950] [0,0,1] ORTE_ERROR_LOG: Fatal in file
odls_default_module.c at line 1193
[juggernaut:14949] spawn: in job_state_callback(jobid = 1, state = 0x80)
[juggernaut:14950] [0,0,1] ORTE_ERROR_LOG: Fatal in file orted.c at line
575
[juggernaut:14950] sess_dir_finalize: job session dir not empty -
leaving
[juggernaut:14950] sess_dir_finalize: proc session dir not empty -
leaving
[juggernaut:14949] sess_dir_finalize: proc session dir not empty -
leaving

Re: [OMPI users] Ompi failing on mx only

Re: [OMPI users] Ompi failing on mx only

Re: [OMPI users] Ompi failing on mx only

Re: [OMPI users] Ompi failing on mx only

Re: [OMPI users] Ompi failing on mx only

Re: [OMPI users] Ompi failing on mx only

Re: [OMPI users] external32 i/o not implemented?

Re: [OMPI users] Ompi failing on mx only

Re: [OMPI users] Ompi failing on mx only

Re: [OMPI users] Ompi failing on mx only

Re: [OMPI users] Ompi failing on mx only

11 matches

Site Navigation

Mail list logo

Footer information