Re: [OMPI users] job runs with mpirun on a node but not if submitted via Torque.

2009-04-01 Thread Ralph Castain
The difference you are seeing here indicates that the "direct" run is  
using the rsh launcher, while the other run is using the Torque  
launcher.


So I gather that by "direct" you mean that you don't get an allocation  
from Maui before running the job, but for the other you do? Otherwise,  
OMPI should detect the that it is running under Torque and  
automatically use the Torque launcher unless directed to do otherwise.


The --set-sid option causes the orteds to separate from mpirun's  
process group. This was done on the rsh launcher to avoid having  
signals directly propagate to local processes so that mpirun could  
properly deal with them.


The --no-daemonize option on the Torque launch keeps the daemons in  
the PBS job so that Torque can properly terminate them all when you  
reach your time limit. We let the rsh-launched daemons daemonize so  
that they terminate the ssh session as there are system limits to the  
number of ssh sessions you can have concurrently open.


Once the daemon gets running on the node, there isn't anything  
different about how it starts a process that depends upon how the  
daemon was started. The environment seen by the processes will be the  
same either way, with the exception of the process group. Is there  
something about that application which is sensitive to the process  
group?


If so, what you could do is simply add -mca pls rsh to your command  
line when launching it. This will direct OMPI to use the rsh launcher,  
which will give you the same behavior as your "direct" scenario (we  
will still read the PBS_NODEFILE to get the allocation).


You might also want to upgrade to the 1.3 series - the launch system  
there is simpler and scales better. If your application cares about  
process group, you might still need to specify the rsh launcher (in  
1.3, you would use -mca plm rsh to do this - slight syntax change),  
but it would be interesting to see if it has any impact...and would  
definitely run better either way.


Ralph



On Mar 31, 2009, at 8:36 PM, Rahul Nabar wrote:


2009/3/31 Ralph Castain :

It is very hard to debug the problem with so little information. We
regularly run OMPI jobs on Torque without issue.


Another small thing that I noticed. Not sure if it is relevant.

When the job starts running there is an orte process. The args to this
process are slightly different depending on whether the job was
submitted with Torque or directly on a node. Could this be an issue?
Just a thought.

The essential difference seems that the torque run has the
--no-daemonize option whereas the direct run has a --set-sid option. I
got these via ps after I submitted an interactive Torque job.

Do these matter at all? Full ps output snippets reproduced below. Some
other numbers also seem different on closer inspection but that might
be by design.

###via Torque; segfaults. ##
rpnabar  11287  0.1  0.0  24680  1828 ?Ss   21:04   0:00 orted
--no-daemonize --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0
--nodename node17 --universe rpnabar@node17:default-universe-11286
--nsreplica "0.0.0;tcp://10.0.0.17:45839" --gprreplica
"0.0.0;tcp://10.0.0.17:45839"
##


##direct MPI run; this works OK
rpnabar  11026  0.0  0.0  24676  1712 ?Ss   20:52   0:00 orted
--bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename
node17 --universe rpnabar@node17:default-universe-11024 --nsreplica
"0.0.0;tcp://10.0.0.17:34716" --gprreplica
"0.0.0;tcp://10.0.0.17:34716" --set-sid
##
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem

2009-04-01 Thread M C

Hi Josh,

Yep, adding that "--with-ft=cr" flag did the trick. Thanks.

Cheers,
m

> From: jjhur...@open-mpi.org
> To: us...@open-mpi.org
> Date: Tue, 31 Mar 2009 15:48:05 -0400
> Subject: Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem
> 
> I think that the missing configure option might be the problem as  
> well. The BLCR configure logic checks to see if you have enabled  
> checkpoint/restart in Open MPI. If you haven't then it fails out of  
> configure (probably should print a better error message - I'll put  
> that on my todo list).
> 
> The configure flag that you are looking for is:
>   --with-ft=cr
> 
> So try the following and let me know if that fixes the problem for you:
>./configure --with-ft=cr --with-blcr=/opt/blcr --with-blcr-libdir=/ 
> opt/blcr/lib --prefix=/opt/openmpi
> 
> Some of the configure options and runtime options are discussed in the  
> Checkpoint/Restart in Open MPI User's Guide which you can find linked  
> at the bottom of the following wiki page:
>https://svn.open-mpi.org/trac/ompi/wiki/ProcessFT_CR
> 
> You may also want to consider using the thread options too for  
> improved C/R response:
>--enable-mpi-threads --enable-ft-thread
> 
> Best,
> Josh
> 
> On Mar 31, 2009, at 2:49 PM, Dave Love wrote:
> 
> > M C  writes:
> >
> >> --- MCA component crs:blcr (m4 configuration macro)
> >> checking for MCA component crs:blcr compile mode... dso
> >> checking --with-blcr value... sanity check ok (/opt/blcr)
> >> checking --with-blcr-libdir value... sanity check ok (/opt/blcr/lib)
> >> configure: WARNING: BLCR support requested but not found.  Perhaps  
> >> you need to specify the location of the BLCR libraries.
> >> configure: error: Aborting.
> >>
> >> This is strange, as both /opt/blcr and /opt/blcr/lib are sensibly  
> >> populated:
> >
> > I ran into this recently.  You need an extra flag which I forget, but
> > ./configure --help will show it can take `LAM' as an argument.  It  
> > seems
> > pretty obscure and probably deserves a report I haven't had time to
> > make.
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_
View your Twitter and Flickr updates from one place – Learn more!
http://clk.atdmt.com/UKM/go/137984870/direct/01/

[OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?

2009-04-01 Thread Guanyinzhu

Hi! 
  I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on Redhat 
Linux x86_64. 



I run a test like this: just killed the orted process and the job hung for a 
long time (hang for 2~3 hours then I killed the job).



I have the follow questions:



 when network failed or host failed or orted deamon was killed by accident, 
How long would the running mpi job notice and exit?  



 Does OpenMPI support a heartbeat mechanism or how could I fast detect the 
failture to avoid the mpi job hang?





thanks a lot!



_
打工,挣钱,买房子,快来MClub一起”金屋藏娇”!
http://club.msn.cn/?from=10

[OMPI users] Strange Net problem

2009-04-01 Thread Gabriele Fatigati
Dear OpenMPI developers, m
i have a strange problem during running my application ( 2000
processors). I'm using openmpi 1.2.22 over Infiniband. The follow is
the mca-params.conf:


btl = ^tcp
btl_tcp_if_exclude = eth0,ib0,ib1
oob_tcp_include = eth1,lo,eth0
btl_openib_warn_default_gid_prefix = 0
btl_openib_ib_timeout   = 20

At certain point of my run, the application died with this message:

[node265:05593] [0,1,1679]-[0,1,1680] mca_oob_tcp_peer_try_connect:
connect to 10.161.12.14:36645 failed: Software caused connection abort
(103)
[node484:06545] [0,1,1617]-[0,1,1681] mca_oob_tcp_peer_try_connect:
connect to 10.161.12.14:36647 failed: Software caused connection abort
(103)
[node295:05394] [0,1,1649]-[0,1,1681] mca_oob_tcp_peer_try_connect:
connect to 10.161.12.14:36647 failed: Software caused connection abort
(103)
[node182:05579] [0,1,1673]-[0,1,1681] mca_oob_tcp_peer_try_connect:
connect to 10.161.12.14:36647 failed: Software caused connection abort
(103)
[node182:05579] [0,1,1673]-[0,1,1681] mca_oob_tcp_peer_try_connect:
connect to 10.161.12.14:36647 failed, connecting over all interfaces
failed!

My question is: This error depends by some timeout? How can i solve?
Thanks in advance.

Than




-- 
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it


Re: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?

2009-04-01 Thread Jerome BENOIT

Is there a firewall somewhere ?

Jerome

Guanyinzhu wrote:
Hi! 
  I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on 
Redhat Linux x86_64. 
 
I run a test like this: just killed the orted process and the job hung 
for a long time (hang for 2~3 hours then I killed the job).
 
I have the follow questions:
   
 when network failed or host failed or orted deamon was killed by 
accident, How long would the running mpi job notice and exit? 

 Does OpenMPI support a heartbeat mechanism or how could I fast 
detect the failture to avoid the mpi job hang?
 
 
thanks a lot!
 



?MSN,??! ! 




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





[OMPI users] Can't find libsvml in the execution

2009-04-01 Thread Marce
Hi all,

I have compiled OpenMPI 1.2.7 with Intel Compilers (icc and ifort) in
a cluster with Centos 4.7. It was ok, but when I try to launch an
execution, mpirun can't find some libraries.

When I check the linked libraries in the nodes, the output was:

[marce@nodo1 ~]$ ldd /home/aplicaciones/openmpi-1.2.7/bin/mpirun
libopen-rte.so.0 =>
/home/aplicaciones/openmpi-1.2.7//lib/libopen-rte.so.0
(0x002a95557000)
libopen-pal.so.0 =>
/home/aplicaciones/openmpi-1.2.7//lib/libopen-pal.so.0
(0x002a956d6000)
libdl.so.2 => /lib64/libdl.so.2 (0x0033f690)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0038c890)
libutil.so.1 => /lib64/libutil.so.1 (0x0038c8b0)
libm.so.6 => /lib64/tls/libm.so.6 (0x0038c810)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x002a95852000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0038c8d0)
libc.so.6 => /lib64/tls/libc.so.6 (0x0038c7c0)
libsvml.so => /opt/intel/fce/9.1.039/lib/libsvml.so (0x002a9596)
libimf.so => /opt/intel/fce/9.1.039/lib/libimf.so (0x002a95aa4000)
libirc.so => /opt/intel/fce/9.1.039/lib/libirc.so (0x002a95e0d000)
/lib64/ld-linux-x86-64.so.2 (0x0038c7a0)

(The same output in node2).

But when I do the same operation to see the linked libraries from
node1 to node2:

[marce@nodo1 ~]$ ssh nodo2 "ldd /home/aplicaciones/openmpi-1.2.7/bin/mpirun"
libopen-rte.so.0 =>
/home/aplicaciones/openmpi-1.2.7//lib/libopen-rte.so.0
(0x002a95557000)
libopen-pal.so.0 =>
/home/aplicaciones/openmpi-1.2.7//lib/libopen-pal.so.0
(0x002a956d6000)
libdl.so.2 => /lib64/libdl.so.2 (0x003ddb50)
libnsl.so.1 => /lib64/libnsl.so.1 (0x003d83b0)
libutil.so.1 => /lib64/libutil.so.1 (0x003d8390)
libm.so.6 => /lib64/tls/libm.so.6 (0x003d8310)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x003d84a0)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x003d8330)
libc.so.6 => /lib64/tls/libc.so.6 (0x003d82c0)
/lib64/ld-linux-x86-64.so.2 (0x003d8280)
libsvml.so => not found
libimf.so => not found
libirc.so => not found
libsvml.so => not found
libimf.so => not found
libirc.so => not found

OpenMPI is installed in a shared filesystem (/home) and intel is
installed in the same path in all nodes (/opt/intel..).

When I try to see the LD_LIBRARY_PATH in the nodes (local and over
ssh), all seems to be ok, it's the correct path and link to /opt/intel
and openmpi-1.2.7.

How I can solve this issue? Where I have to set the LD_LIBRARY_PATH?

Thanks for all!

Regards


Re: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?

2009-04-01 Thread Guanyinzhu

I mean killed the orted deamon process during the mpi job running , but the mpi 
job hang and could't notice one of it's rank failed.







> Date: Wed, 1 Apr 2009 19:09:34 +0800
> From: ml.jgmben...@mailsnare.net
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Beginner's question: how to avoid a running mpi job 
> hang if host or network failed or orted deamon killed?
> 
> Is there a firewall somewhere ?
> 
> Jerome
> 
> Guanyinzhu wrote:
> > Hi! 
> > I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on 
> > Redhat Linux x86_64. 
> > 
> > I run a test like this: just killed the orted process and the job hung 
> > for a long time (hang for 2~3 hours then I killed the job).
> > 
> > I have the follow questions:
> > 
> > when network failed or host failed or orted deamon was killed by 
> > accident, How long would the running mpi job notice and exit? 
> > 
> > Does OpenMPI support a heartbeat mechanism or how could I fast 
> > detect the failture to avoid the mpi job hang?
> > 
> > 
> > thanks a lot!
> > 
> > 
> > 
> > ?MSN,??! ! 
> > 
> > 
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_
Live Search视频搜索,快速检索视频的利器!
http://www.live.com/?scope=video

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Rolf Vandevaart
It turns out that the use of --host and --hostfile act as a filter of 
which nodes to run on when you are running under SGE.  So, listing them 
several times does not affect where the processes land.  However, this 
still does not explain why you are seeing what you are seeing.  One 
thing you can try is to add this to the mpirun command.


 -mca ras_gridengine_verbose 100

This will provide some additional information as to what Open MPI is 
seeing as nodes and slots from SGE.  (Is there any chance that node0002 
actually has 8 slots?)


I just retried on my cluster of 2 CPU sparc solaris nodes.  When I run 
with np=2, the two MPI processes will all land on a single node, because 
that node has two slots.  When I go up to np=4, then they move on to the 
other node.  The --host acts as a filter to where they should run.


In terms of the using "IB bonding", I do not know what that means 
exactly.  Open MPI does stripe over multiple IB interfaces, so I think 
the answer is yes.


Rolf

PS:  Here is what my np=4 job script looked like.  (I just changed np=2 
for the other run)


 burl-ct-280r-0 148 =>more run.sh
#! /bin/bash
#$ -S /bin/bash
#$ -V
#$ -cwd
#$ -N Job1
#$ -pe orte 200
#$ -j y
#$ -l h_rt=00:20:00  # Run time (hh:mm:ss) - 10 min

echo $NSLOTS
/opt/SUNWhpc/HPC8.2/sun/bin/mpirun -mca ras_gridengine_verbose 100 -v 
-np 4 -host burl-ct-280r-1,burl-ct-280r-0 -mca btl self,sm,tcp hostname


Here is the output (somewhat truncated)
 burl-ct-280r-0 150 =>more Job1.o199
200
[burl-ct-280r-2:22132] ras:gridengine: JOB_ID: 199
[burl-ct-280r-2:22132] ras:gridengine: PE_HOSTFILE: 
/ws/ompi-tools/orte/sge/sge6_2u1/default/spool/burl-ct-280r-2/active_jobs/199.1/pe_hostfile

[..snip..]
[burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-0: PE_HOSTFILE shows 
slots=2
[burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-1: PE_HOSTFILE shows 
slots=2

[..snip..]
burl-ct-280r-1
burl-ct-280r-1
burl-ct-280r-0
burl-ct-280r-0
 burl-ct-280r-0 151 =>


On 03/31/09 22:39, PN wrote:

Dear Rolf,

Thanks for your reply.
I've created another PE and changed the submission script, explicitly 
specify the hostname with "--host".

However the result is the same.

# qconf -sp orte
pe_nameorte
slots  8
user_lists NONE
xuser_listsNONE
start_proc_args/bin/true
stop_proc_args /bin/true
allocation_rule$fill_up
control_slaves TRUE
job_is_first_task  FALSE
urgency_slots  min
accounting_summary TRUE

$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/admin/hpl-2.0
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS --host 
node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002 
./bin/goto-openmpi-gcc/xhpl



# pdsh -a ps ax --width=200|grep hpl
node0002: 18901 ?S  0:00 /opt/openmpi-gcc/bin/mpirun -v -np 
8 --host 
node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002 
./bin/goto-openmpi-gcc/xhpl

node0002: 18902 ?RLl0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18903 ?RLl0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18904 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18905 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18906 ?RLl0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18907 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18908 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18909 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl

Any hint to debug this situation?

Also, if I have 2 IB ports in each node, which IB bonding was done, will 
Open MPI automatically benefit from the double bandwidth?


Thanks a lot.

Best Regards,
PN

2009/4/1 Rolf Vandevaart >


On 03/31/09 11:43, PN wrote:

Dear all,

I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
I have 2 compute nodes for testing, each node has a single quad
core CPU.

Here is my submission script and PE config:
$ cat hpl-8cpu.sge
#!/bin/bash
#
#$ -N HPL_8cpu_IB
#$ -pe mpi-fu 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/admin/hpl-2.0
# For IB
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile
$TMPDIR/machines ./bin/goto-openmpi-gcc/xhpl

I've tested the mpirun command can be run correctly in command line.

$ qconf -sp mpi-fu
pe_namempi-fu
slots  8
user_lists NONE
xuser_listsNONE
start_proc_args/opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args /opt/sge/mpi/stopmpi.sh
allocation_rule$fill_up
control_slaves TRUE
job_is_first_task  FALSE
urgency_slots  min
accounting_summary TRUE


I've checked the $TMPDIR/machines after submit, it was correct.
node0002
nod

Re: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?

2009-04-01 Thread Ralph Castain
There is indeed a heartbeat mechanism you can use - it is "off" by  
default. You can set it to check every N seconds with:


-mca orte_heartbeat_rate N

on your command line. Or if you want it to always run, add  
"orte_heartbeat_rate = N" to your default MCA param file. OMPI will  
declare the orted "dead" if two consecutive heartbeats are not seen.


Let me know how it works for you - it hasn't been extensively tested,  
but has worked so far.

Ralph

On Apr 1, 2009, at 6:07 AM, Guanyinzhu wrote:

I mean killed the orted deamon process during the mpi job running ,  
but the mpi job hang and could't notice one of it's rank failed.





> Date: Wed, 1 Apr 2009 19:09:34 +0800
> From: ml.jgmben...@mailsnare.net
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Beginner's question: how to avoid a  
running mpi job hang if host or network failed or orted deamon killed?

>
> Is there a firewall somewhere ?
>
> Jerome
>
> Guanyinzhu wrote:
> > Hi!
> > I'm using OpenMPI 1.3 on ten nodes connected with Gigabit  
Ethernet on

> > Redhat Linux x86_64.
> >
> > I run a test like this: just killed the orted process and the  
job hung

> > for a long time (hang for 2~3 hours then I killed the job).
> >
> > I have the follow questions:
> >
> > when network failed or host failed or orted deamon was killed by
> > accident, How long would the running mpi job notice and exit?
> >
> > Does OpenMPI support a heartbeat mechanism or how c! ould I fast
> > detect the failture to avoid the mpi job hang?
> >
> >
> > thanks a lot!
> >
> >
> >  


> > ?MSN,??! ! 
> >
> >
> >  


> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

更多热辣资讯尽在新版MSN首页! 立刻访问!  
___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Ralph Castain

As an FYI: you can debug allocation issues more easily by:

mpirun --display-allocation --do-not-launch -n 1 foo

This will read the allocation, do whatever host filtering you specify  
with -host and -hostfile options, report out the result, and then  
terminate without trying to launch anything. I found it most useful  
for debugging these situations.


If you want to know where the procs would have gone, then you can do:

mpirun --display-allocation --display-map --do-not-launch -n 8 foo

In this case, the #procs you specify needs to be the number you  
actually wanted so that the mapper will properly run. However, the  
executable can be bogus and nothing will actually launch. It's the  
closest you can come to a dry run of a job.


HTH
Ralph


On Apr 1, 2009, at 7:10 AM, Rolf Vandevaart wrote:

It turns out that the use of --host and --hostfile act as a filter  
of which nodes to run on when you are running under SGE.  So,  
listing them several times does not affect where the processes  
land.  However, this still does not explain why you are seeing what  
you are seeing.  One thing you can try is to add this to the mpirun  
command.


-mca ras_gridengine_verbose 100

This will provide some additional information as to what Open MPI is  
seeing as nodes and slots from SGE.  (Is there any chance that  
node0002 actually has 8 slots?)


I just retried on my cluster of 2 CPU sparc solaris nodes.  When I  
run with np=2, the two MPI processes will all land on a single node,  
because that node has two slots.  When I go up to np=4, then they  
move on to the other node.  The --host acts as a filter to where  
they should run.


In terms of the using "IB bonding", I do not know what that means  
exactly.  Open MPI does stripe over multiple IB interfaces, so I  
think the answer is yes.


Rolf

PS:  Here is what my np=4 job script looked like.  (I just changed  
np=2 for the other run)


burl-ct-280r-0 148 =>more run.sh
#! /bin/bash
#$ -S /bin/bash
#$ -V
#$ -cwd
#$ -N Job1
#$ -pe orte 200
#$ -j y
#$ -l h_rt=00:20:00  # Run time (hh:mm:ss) - 10 min

echo $NSLOTS
/opt/SUNWhpc/HPC8.2/sun/bin/mpirun -mca ras_gridengine_verbose 100 - 
v -np 4 -host burl-ct-280r-1,burl-ct-280r-0 -mca btl self,sm,tcp  
hostname


Here is the output (somewhat truncated)
burl-ct-280r-0 150 =>more Job1.o199
200
[burl-ct-280r-2:22132] ras:gridengine: JOB_ID: 199
[burl-ct-280r-2:22132] ras:gridengine: PE_HOSTFILE: /ws/ompi-tools/ 
orte/sge/sge6_2u1/default/spool/burl-ct-280r-2/active_jobs/199.1/ 
pe_hostfile

[..snip..]
[burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-0: PE_HOSTFILE  
shows slots=2
[burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-1: PE_HOSTFILE  
shows slots=2

[..snip..]
burl-ct-280r-1
burl-ct-280r-1
burl-ct-280r-0
burl-ct-280r-0
burl-ct-280r-0 151 =>


On 03/31/09 22:39, PN wrote:

Dear Rolf,
Thanks for your reply.
I've created another PE and changed the submission script,  
explicitly specify the hostname with "--host".

However the result is the same.
# qconf -sp orte
pe_nameorte
slots  8
user_lists NONE
xuser_listsNONE
start_proc_args/bin/true
stop_proc_args /bin/true
allocation_rule$fill_up
control_slaves TRUE
job_is_first_task  FALSE
urgency_slots  min
accounting_summary TRUE
$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/admin/hpl-2.0
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS --host  
node0001 
,node0001,node0001,node0001,node0002,node0002,node0002,node0002 ./ 
bin/goto-openmpi-gcc/xhpl

# pdsh -a ps ax --width=200|grep hpl
node0002: 18901 ?S  0:00 /opt/openmpi-gcc/bin/mpirun -v  
-np 8 --host  
node0001 
,node0001,node0001,node0001,node0002,node0002,node0002,node0002 ./ 
bin/goto-openmpi-gcc/xhpl

node0002: 18902 ?RLl0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18903 ?RLl0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18904 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18905 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18906 ?RLl0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18907 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18908 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18909 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
Any hint to debug this situation?
Also, if I have 2 IB ports in each node, which IB bonding was done,  
will Open MPI automatically benefit from the double bandwidth?

Thanks a lot.
Best Regards,
PN
2009/4/1 Rolf Vandevaart mailto:rolf.vandeva...@sun.com 
>>

   On 03/31/09 11:43, PN wrote:
   Dear all,
   I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
   I have 2 compute nodes for testing, each node has a single  
quad

   core CPU.
   Here is my submission script and PE config:
   $ cat hpl-8cpu.sge
   #!/bin/bash
   #
   #$ -N HPL_8cpu_IB
   #$ -pe mpi-fu 8
   #$ -cwd
   #$ -j y
   #$ -S /bin/b

Re: [OMPI users] Strange Net problem

2009-04-01 Thread Ralph Castain

Hi Gabriele

I don't think this is a timeout issue. OMPI 1.2.x doesn't scale very  
well to that size due to a requirement that the underlying out-of-band  
system fully connect at the TCP level. Thus, every process in your job  
will be opening 2002 sockets (one to every other process, one to the  
local orted, and one back to mpirun). More than likely, you are simply  
running out of sockets on your nodes.


For a job this size, I would recommend upgrading to OMPI 1.3.1. This  
uses a routing scheme for the out-of-band system, so each process only  
opens 1 socket to its local daemon. Much more scalable, and I think it  
would solve this problem. It will also start much faster, as a bonus.


HTH
Ralph


On Apr 1, 2009, at 3:58 AM, Gabriele Fatigati wrote:


Dear OpenMPI developers, m
i have a strange problem during running my application ( 2000
processors). I'm using openmpi 1.2.22 over Infiniband. The follow is
the mca-params.conf:


btl = ^tcp
btl_tcp_if_exclude = eth0,ib0,ib1
oob_tcp_include = eth1,lo,eth0
btl_openib_warn_default_gid_prefix = 0
btl_openib_ib_timeout   = 20

At certain point of my run, the application died with this message:

[node265:05593] [0,1,1679]-[0,1,1680] mca_oob_tcp_peer_try_connect:
connect to 10.161.12.14:36645 failed: Software caused connection abort
(103)
[node484:06545] [0,1,1617]-[0,1,1681] mca_oob_tcp_peer_try_connect:
connect to 10.161.12.14:36647 failed: Software caused connection abort
(103)
[node295:05394] [0,1,1649]-[0,1,1681] mca_oob_tcp_peer_try_connect:
connect to 10.161.12.14:36647 failed: Software caused connection abort
(103)
[node182:05579] [0,1,1673]-[0,1,1681] mca_oob_tcp_peer_try_connect:
connect to 10.161.12.14:36647 failed: Software caused connection abort
(103)
[node182:05579] [0,1,1673]-[0,1,1681] mca_oob_tcp_peer_try_connect:
connect to 10.161.12.14:36647 failed, connecting over all interfaces
failed!

My question is: This error depends by some timeout? How can i solve?
Thanks in advance.

Than




--
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Strange Net problem

2009-04-01 Thread Gabriele Fatigati
Hi Ralph,
unfortunately, in this machine i can't upgrade OpenMPI at the moment.
Is there a way to limit or to reduce the probability of this error?

2009/4/1 Ralph Castain :
> Hi Gabriele
>
> I don't think this is a timeout issue. OMPI 1.2.x doesn't scale very well to
> that size due to a requirement that the underlying out-of-band system fully
> connect at the TCP level. Thus, every process in your job will be opening
> 2002 sockets (one to every other process, one to the local orted, and one
> back to mpirun). More than likely, you are simply running out of sockets on
> your nodes.
>
> For a job this size, I would recommend upgrading to OMPI 1.3.1. This uses a
> routing scheme for the out-of-band system, so each process only opens 1
> socket to its local daemon. Much more scalable, and I think it would solve
> this problem. It will also start much faster, as a bonus.
>
> HTH
> Ralph
>
>
> On Apr 1, 2009, at 3:58 AM, Gabriele Fatigati wrote:
>
>> Dear OpenMPI developers, m
>> i have a strange problem during running my application ( 2000
>> processors). I'm using openmpi 1.2.22 over Infiniband. The follow is
>> the mca-params.conf:
>>
>>
>> btl = ^tcp
>> btl_tcp_if_exclude = eth0,ib0,ib1
>> oob_tcp_include = eth1,lo,eth0
>> btl_openib_warn_default_gid_prefix = 0
>> btl_openib_ib_timeout   = 20
>>
>> At certain point of my run, the application died with this message:
>>
>> [node265:05593] [0,1,1679]-[0,1,1680] mca_oob_tcp_peer_try_connect:
>> connect to 10.161.12.14:36645 failed: Software caused connection abort
>> (103)
>> [node484:06545] [0,1,1617]-[0,1,1681] mca_oob_tcp_peer_try_connect:
>> connect to 10.161.12.14:36647 failed: Software caused connection abort
>> (103)
>> [node295:05394] [0,1,1649]-[0,1,1681] mca_oob_tcp_peer_try_connect:
>> connect to 10.161.12.14:36647 failed: Software caused connection abort
>> (103)
>> [node182:05579] [0,1,1673]-[0,1,1681] mca_oob_tcp_peer_try_connect:
>> connect to 10.161.12.14:36647 failed: Software caused connection abort
>> (103)
>> [node182:05579] [0,1,1673]-[0,1,1681] mca_oob_tcp_peer_try_connect:
>> connect to 10.161.12.14:36647 failed, connecting over all interfaces
>> failed!
>>
>> My question is: This error depends by some timeout? How can i solve?
>> Thanks in advance.
>>
>> Than
>>
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> Parallel programmer
>>
>> CINECA Systems & Tecnologies Department
>>
>> Supercomputing Group
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it                    Tel:   +39 051 6171722
>>
>> g.fatigati [AT] cineca.it
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>



-- 
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it



Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Ralph Castain
Rolf has correctly reminded me that display-allocation occurs prior to  
host filtering, so you will see all of the allocated nodes. You'll see  
the impact of the host specifications in display-map,


Sorry for the confusion - thanks to Rolf for pointing it out.
Ralph

On Apr 1, 2009, at 7:40 AM, Ralph Castain wrote:


As an FYI: you can debug allocation issues more easily by:

mpirun --display-allocation --do-not-launch -n 1 foo

This will read the allocation, do whatever host filtering you  
specify with -host and -hostfile options, report out the result, and  
then terminate without trying to launch anything. I found it most  
useful for debugging these situations.


If you want to know where the procs would have gone, then you can do:

mpirun --display-allocation --display-map --do-not-launch -n 8 foo

In this case, the #procs you specify needs to be the number you  
actually wanted so that the mapper will properly run. However, the  
executable can be bogus and nothing will actually launch. It's the  
closest you can come to a dry run of a job.


HTH
Ralph


On Apr 1, 2009, at 7:10 AM, Rolf Vandevaart wrote:

It turns out that the use of --host and --hostfile act as a filter  
of which nodes to run on when you are running under SGE.  So,  
listing them several times does not affect where the processes  
land.  However, this still does not explain why you are seeing what  
you are seeing.  One thing you can try is to add this to the mpirun  
command.


-mca ras_gridengine_verbose 100

This will provide some additional information as to what Open MPI  
is seeing as nodes and slots from SGE.  (Is there any chance that  
node0002 actually has 8 slots?)


I just retried on my cluster of 2 CPU sparc solaris nodes.  When I  
run with np=2, the two MPI processes will all land on a single  
node, because that node has two slots.  When I go up to np=4, then  
they move on to the other node.  The --host acts as a filter to  
where they should run.


In terms of the using "IB bonding", I do not know what that means  
exactly.  Open MPI does stripe over multiple IB interfaces, so I  
think the answer is yes.


Rolf

PS:  Here is what my np=4 job script looked like.  (I just changed  
np=2 for the other run)


burl-ct-280r-0 148 =>more run.sh
#! /bin/bash
#$ -S /bin/bash
#$ -V
#$ -cwd
#$ -N Job1
#$ -pe orte 200
#$ -j y
#$ -l h_rt=00:20:00  # Run time (hh:mm:ss) - 10 min

echo $NSLOTS
/opt/SUNWhpc/HPC8.2/sun/bin/mpirun -mca ras_gridengine_verbose 100 - 
v -np 4 -host burl-ct-280r-1,burl-ct-280r-0 -mca btl self,sm,tcp  
hostname


Here is the output (somewhat truncated)
burl-ct-280r-0 150 =>more Job1.o199
200
[burl-ct-280r-2:22132] ras:gridengine: JOB_ID: 199
[burl-ct-280r-2:22132] ras:gridengine: PE_HOSTFILE: /ws/ompi-tools/ 
orte/sge/sge6_2u1/default/spool/burl-ct-280r-2/active_jobs/199.1/ 
pe_hostfile

[..snip..]
[burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-0: PE_HOSTFILE  
shows slots=2
[burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-1: PE_HOSTFILE  
shows slots=2

[..snip..]
burl-ct-280r-1
burl-ct-280r-1
burl-ct-280r-0
burl-ct-280r-0
burl-ct-280r-0 151 =>


On 03/31/09 22:39, PN wrote:

Dear Rolf,
Thanks for your reply.
I've created another PE and changed the submission script,  
explicitly specify the hostname with "--host".

However the result is the same.
# qconf -sp orte
pe_nameorte
slots  8
user_lists NONE
xuser_listsNONE
start_proc_args/bin/true
stop_proc_args /bin/true
allocation_rule$fill_up
control_slaves TRUE
job_is_first_task  FALSE
urgency_slots  min
accounting_summary TRUE
$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/admin/hpl-2.0
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS --host  
node0001 
,node0001,node0001,node0001,node0002,node0002,node0002,node0002 ./ 
bin/goto-openmpi-gcc/xhpl

# pdsh -a ps ax --width=200|grep hpl
node0002: 18901 ?S  0:00 /opt/openmpi-gcc/bin/mpirun - 
v -np 8 --host  
node0001 
,node0001,node0001,node0001,node0002,node0002,node0002,node0002 ./ 
bin/goto-openmpi-gcc/xhpl

node0002: 18902 ?RLl0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18903 ?RLl0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18904 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18905 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18906 ?RLl0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18907 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18908 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18909 ?RLl0:28 ./bin/goto-openmpi-gcc/xhpl
Any hint to debug this situation?
Also, if I have 2 IB ports in each node, which IB bonding was  
done, will Open MPI automatically benefit from the double bandwidth?

Thanks a lot.
Best Regards,
PN
2009/4/1 Rolf Vandevaart mailto:rolf.vandeva...@sun.com 
>>

  On 03/31/09 11:43, PN wrote:
  Dear all,
  I'm using Open MPI 1.3.1 and

[OMPI users] mpirun interaction with pbsdsh

2009-04-01 Thread Brock Palen

Ok this is weird, and the correct answer is probably "don't do that",
Anyway:

User wants to run many many small jobs, faster than our scheduler 
+torque can start, he uses pbsdsh to start them in parallel, under tm.


pbsdsh bash -c 'cd $PBS_O_WORKDIR/$PBS_VNODENUM; mpirun  -np 1  
application'


This is kinda silly because the code while MPI based, when ran on  
single rank does not require mpirun to start, and just just fine if  
you leave off mpirun.


What happens though if you do leave it on (this is with ompi-1.2.x)   
you get errors about


[nyx428.engin.umich.edu:01929] pls:tm: failed to poll for a spawned  
proc, return status = 17002
[nyx428.engin.umich.edu:01929] [0,0,0] ORTE_ERROR_LOG: In errno in  
file rmgr_urm.c at line 462



Kinda makes sense, pbsdsh has already started 'mpirun' under tm, and  
now mpirun is trying to start a process also under tm. In fact with  
older versions (1.2.0).  The above will work fine only for the first  
TMNODE, any second node, will hang, at 'poll()' if you strace it.


To we can solve the above by not using mpirun to start single  
processes under tm that were spawned by tm in the first place.  Just  
thought you would like to know.


Is there a way to have mpirun spawn all the processes like pbsdsh?   
Problem is the code is MPI based, so if you say 'run 4'  its going to  
do the noraml COMM_SIZE=4, only read first input, etc.  Also we have  
to change the CWD of each rank.  Thus can you make mpirun farm?



Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985





Re: [OMPI users] mpirun interaction with pbsdsh

2009-04-01 Thread Ralph Castain

Ick is the proper response. :-)

The old 1.2 series would attempt to spawn a local orted on each of  
those nodes, and that is what is failing. Best guess is that it is  
because pbsdsh doesn't fully replicate a key part of the environment  
that is expected.


One thing you could try is do this with 1.3.1. It will just fork/exec  
that local application instead of trying to start a daemon, so the  
odds are much better that it will work.


I don't know of any native way to get mpirun to launch a farm - it  
will always set the comm_size to the total #procs. I suppose we could  
add that option, if people want it - wouldn't be very hard to implement.


Ralph'
On Apr 1, 2009, at 8:49 AM, Brock Palen wrote:


Ok this is weird, and the correct answer is probably "don't do that",
Anyway:

User wants to run many many small jobs, faster than our scheduler 
+torque can start, he uses pbsdsh to start them in parallel, under tm.


pbsdsh bash -c 'cd $PBS_O_WORKDIR/$PBS_VNODENUM; mpirun  -np 1  
application'


This is kinda silly because the code while MPI based, when ran on  
single rank does not require mpirun to start, and just just fine if  
you leave off mpirun.


What happens though if you do leave it on (this is with ompi-1.2.x)   
you get errors about


[nyx428.engin.umich.edu:01929] pls:tm: failed to poll for a spawned  
proc, return status = 17002
[nyx428.engin.umich.edu:01929] [0,0,0] ORTE_ERROR_LOG: In errno in  
file rmgr_urm.c at line 462



Kinda makes sense, pbsdsh has already started 'mpirun' under tm, and  
now mpirun is trying to start a process also under tm. In fact with  
older versions (1.2.0).  The above will work fine only for the first  
TMNODE, any second node, will hang, at 'poll()' if you strace it.


To we can solve the above by not using mpirun to start single  
processes under tm that were spawned by tm in the first place.  Just  
thought you would like to know.


Is there a way to have mpirun spawn all the processes like pbsdsh?   
Problem is the code is MPI based, so if you say 'run 4'  its going  
to do the noraml COMM_SIZE=4, only read first input, etc.  Also we  
have to change the CWD of each rank.  Thus can you make mpirun farm?



Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] mpirun: symbol lookup error: /usr/local/lib/openmpi/mca_plm_lsf.so: undefined symbol: ls b_init

2009-04-01 Thread Alessandro Surace
Hi guys, I try to repost my question...
I've a problem with the last stable build and the last nightly snapshot.

When I run a job directly with mpirun no problem.
If I try to submit it with lsf:
bsub -a openmpi -m grid01 mpirun.lsf /mnt/ewd/mpi/fibonacci/fibonacci_mpi

I get the follow error:
mpirun: symbol lookup error: /usr/local/lib/openmpi/mca_plm_lsf.so:
undefined symbol: lsb_init
Job  /opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/openmpi_wrapper
/mnt/ewd/mpi/fibonacci/fibonacci_mpi

I've verified that the lsb_init symbol is present in the library:
[root@grid01 lib]# strings libbat.* |grep lsb_init
lsb_init
sch_lsb_init
lsb_init()
lsb_init
sch_lsb_init
sch_lsb_init
sch_lsb_init
sch_lsb_init
lsb_init()
sch_lsb_init

My lsf version is:
Platform LSF 7.0.4.115872, Sep 24 2008
Copyright 1992-2008 Platform Computing Corporation

  binary type: linux2.6-glibc2.3-x86

No problem with version 1.2.9.

In attach the info about Open mpi.

Thankx
Alex
 Package: Open MPI r...@grid01.ags.wan Distribution
Open MPI: 1.3.2a1r20880
   Open MPI SVN revision: r20880
   Open MPI release date: Unreleased developer copy
Open RTE: 1.3.2a1r20880
   Open RTE SVN revision: r20880
   Open RTE release date: Unreleased developer copy
OPAL: 1.3.2a1r20880
   OPAL SVN revision: r20880
   OPAL release date: Unreleased developer copy
Ident string: 1.3.2a1r20880
   MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.2)
  MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.2)
   MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.2)
   MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.2)
   MCA carto: file (MCA v2.0, API v2.0, Component v1.3.2)
   MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.2)
   MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.2)
 MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.2)
 MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.2)
 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.2)
  MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.2)
   MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.2)
   MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: self (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.2)
  MCA io: romio (MCA v2.0, API v2.0, Component v1.3.2)
   MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.2)
   MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.2)
   MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.2)
 MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.2)
 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.2)
 MCA pml: v (MCA v2.0, API v2.0, Component v1.3.2)
 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.2)
  MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.2)
 MCA btl: self (MCA v2.0, API v2.0, Component v1.3.2)
 MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.2)
 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.2)
MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.2)
 MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.2)
 MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.2)
 MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.2)
 MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.2)
 MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.2)
 MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.2)
MCA odls: default (MCA v2.0, API v2.0, Component v1.3.2)
 MCA ras: lsf (MCA v2.0, API v2.0, Component v1.3.2)
 MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.2)
   MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3.2)
   MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3.2)
   MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.2)
 MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.2)
  MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3.2)
  MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.2)
  MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.2)
 MCA plm: lsf (MCA v2.0, API v2.0, Component v1.3.2)
 MCA plm

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread PN
Thanks. I've tried your suggestion.

$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
/opt/openmpi-gcc/bin/mpirun -mca ras_gridengine_verbose 100 -v -np $NSLOTS
--host node0001,node0002 hostname


It allocated 2 nodes to run, however all the processes are spawned in
node0001.

$ qstat -f
queuename  qtype resv/used/tot. load_avg arch
states
-
al...@node0001.v5cluster.com   BIPC  0/4/4  4.79 lx24-amd64
 45 0.55500 HPL_8cpu_G adminr 04/02/2009 00:26:49 4
-
al...@node0002.v5cluster.com   BIPC  0/4/4  0.00 lx24-amd64
 45 0.55500 HPL_8cpu_G adminr 04/02/2009 00:26:49 4


$ cat HPL_8cpu_GB.o45
[node0001:03194] ras:gridengine: JOB_ID: 45
[node0001:03194] ras:gridengine: node0001.v5cluster.com: PE_HOSTFILE shows
slots=4
[node0001:03194] ras:gridengine: node0002.v5cluster.com: PE_HOSTFILE shows
slots=4
node0001
node0001
node0001
node0001
node0001
node0001
node0001
node0001

$ qconf -sq all.q
qname all.q
hostlist  @allhosts
seq_no0
load_thresholds   np_load_avg=1.75
suspend_thresholdsNONE
nsuspend  1
suspend_interval  00:05:00
priority  0
min_cpu_interval  00:01:00
processorsUNDEFINED
qtype BATCH INTERACTIVE
ckpt_list blcr
pe_list   make mpi-rr mpi-fu orte
rerun FALSE
slots 4,[node0001=4],[node0002=4]
tmpdir/tmp
shell /bin/sh
prologNONE
epilogNONE
shell_start_mode  posix_compliant
starter_methodNONE
suspend_methodNONE
resume_method NONE
terminate_method  NONE
notify00:00:60
owner_listNONE
user_listsNONE
xuser_lists   NONE
subordinate_list  NONE
complex_valuesNONE
projects  NONE
xprojects NONE
calendar  NONE
initial_state default
s_rt  INFINITY
h_rt  INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize   INFINITY
h_fsize   INFINITY
s_dataINFINITY
h_dataINFINITY
s_stack   INFINITY
h_stack   INFINITY
s_coreINFINITY
h_coreINFINITY
s_rss INFINITY
h_rss INFINITY
s_vmemINFINITY
h_vmemINFINITY

$ qconf -se node0001
hostname  node0001.v5cluster.com
load_scaling  NONE
complex_valuesslots=4
load_values   arch=lx24-amd64,num_proc=4,mem_total=3949.597656M, \
  swap_total=0.00M,virtual_total=3949.597656M, \
  load_avg=2.80,load_short=0.22, \
  load_medium=2.80,load_long=2.32, \
  mem_free=3818.746094M,swap_free=0.00M, \
  virtual_free=3818.746094M,mem_used=130.851562M, \
  swap_used=0.00M,virtual_used=130.851562M, \
  cpu=0.00,np_load_avg=0.70, \
  np_load_short=0.055000,np_load_medium=0.70, \
  np_load_long=0.58
processors4
user_listsNONE
xuser_lists   NONE
projects  NONE
xprojects NONE
usage_scaling NONE
report_variables  NONE

$ qconf -se node0002
hostname  node0002.v5cluster.com
load_scaling  NONE
complex_valuesslots=4
load_values   arch=lx24-amd64,num_proc=4,mem_total=3949.597656M, \
  swap_total=0.00M,virtual_total=3949.597656M, \
  load_avg=0.00,load_short=0.00, \
  load_medium=0.00,load_long=0.00, \
  mem_free=3843.074219M,swap_free=0.00M, \
  virtual_free=3843.074219M,mem_used=106.523438M, \
  swap_used=0.00M,virtual_used=106.523438M, \
  cpu=0.00,np_load_avg=0.00, \
  np_load_short=0.00,np_load_medium=0.00, \
  np_load_long=0.00
processors4
user_listsNONE
xuser_lists   NONE
projects  NONE
xprojects NONE
usage_scaling NONE
report_variables  NONE



2009/4/1 Rolf Vandevaart 

> It turns out that the use of --host and --hostfile act as a filter of which
> nodes to run on when you are running under SGE.  So, listing them several
> times does not affect where the processes land.  However, this still does
> not explain why you are seeing what you are seeing.  One thing you can try
> is to a

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread PN
Thanks.

$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
/opt/openmpi-gcc/bin/mpirun --display-allocation --display-map -v -np
$NSLOTS --host node0001,node0002 hostname


$ cat HPL_8cpu_GB.o46

==   ALLOCATED NODES   ==

 Data for node: Name: node0001  Num slots: 4Max slots: 0
 Data for node: Name: node0002.v5cluster.comNum slots: 4Max slots: 0

=

    JOB MAP   

 Data for node: Name: node0001  Num procs: 8
Process OMPI jobid: [10982,1] Process rank: 0
Process OMPI jobid: [10982,1] Process rank: 1
Process OMPI jobid: [10982,1] Process rank: 2
Process OMPI jobid: [10982,1] Process rank: 3
Process OMPI jobid: [10982,1] Process rank: 4
Process OMPI jobid: [10982,1] Process rank: 5
Process OMPI jobid: [10982,1] Process rank: 6
Process OMPI jobid: [10982,1] Process rank: 7

 =
node0001
node0001
node0001
node0001
node0001
node0001
node0001
node0001

I'm not sure why node0001 miss the domain name, is this related?
However the result is correct when I run "qconf -sel"

$ qconf -sel
node0001.v5cluster.com
node0002.v5cluster.com





2009/4/1 Ralph Castain 

> Rolf has correctly reminded me that display-allocation occurs prior to host
> filtering, so you will see all of the allocated nodes. You'll see the impact
> of the host specifications in display-map,
>
> Sorry for the confusion - thanks to Rolf for pointing it out.
> Ralph
>
>
> On Apr 1, 2009, at 7:40 AM, Ralph Castain wrote:
>
>  As an FYI: you can debug allocation issues more easily by:
>>
>> mpirun --display-allocation --do-not-launch -n 1 foo
>>
>> This will read the allocation, do whatever host filtering you specify with
>> -host and -hostfile options, report out the result, and then terminate
>> without trying to launch anything. I found it most useful for debugging
>> these situations.
>>
>> If you want to know where the procs would have gone, then you can do:
>>
>> mpirun --display-allocation --display-map --do-not-launch -n 8 foo
>>
>> In this case, the #procs you specify needs to be the number you actually
>> wanted so that the mapper will properly run. However, the executable can be
>> bogus and nothing will actually launch. It's the closest you can come to a
>> dry run of a job.
>>
>> HTH
>> Ralph
>>
>>
>> On Apr 1, 2009, at 7:10 AM, Rolf Vandevaart wrote:
>>
>>  It turns out that the use of --host and --hostfile act as a filter of
>>> which nodes to run on when you are running under SGE.  So, listing them
>>> several times does not affect where the processes land.  However, this still
>>> does not explain why you are seeing what you are seeing.  One thing you can
>>> try is to add this to the mpirun command.
>>>
>>> -mca ras_gridengine_verbose 100
>>>
>>> This will provide some additional information as to what Open MPI is
>>> seeing as nodes and slots from SGE.  (Is there any chance that node0002
>>> actually has 8 slots?)
>>>
>>> I just retried on my cluster of 2 CPU sparc solaris nodes.  When I run
>>> with np=2, the two MPI processes will all land on a single node, because
>>> that node has two slots.  When I go up to np=4, then they move on to the
>>> other node.  The --host acts as a filter to where they should run.
>>>
>>> In terms of the using "IB bonding", I do not know what that means
>>> exactly.  Open MPI does stripe over multiple IB interfaces, so I think the
>>> answer is yes.
>>>
>>> Rolf
>>>
>>> PS:  Here is what my np=4 job script looked like.  (I just changed np=2
>>> for the other run)
>>>
>>> burl-ct-280r-0 148 =>more run.sh
>>> #! /bin/bash
>>> #$ -S /bin/bash
>>> #$ -V
>>> #$ -cwd
>>> #$ -N Job1
>>> #$ -pe orte 200
>>> #$ -j y
>>> #$ -l h_rt=00:20:00  # Run time (hh:mm:ss) - 10 min
>>>
>>> echo $NSLOTS
>>> /opt/SUNWhpc/HPC8.2/sun/bin/mpirun -mca ras_gridengine_verbose 100 -v -np
>>> 4 -host burl-ct-280r-1,burl-ct-280r-0 -mca btl self,sm,tcp hostname
>>>
>>> Here is the output (somewhat truncated)
>>> burl-ct-280r-0 150 =>more Job1.o199
>>> 200
>>> [burl-ct-280r-2:22132] ras:gridengine: JOB_ID: 199
>>> [burl-ct-280r-2:22132] ras:gridengine: PE_HOSTFILE:
>>> /ws/ompi-tools/orte/sge/sge6_2u1/default/spool/burl-ct-280r-2/active_jobs/199.1/pe_hostfile
>>> [..snip..]
>>> [burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-0: PE_HOSTFILE shows
>>> slots=2
>>> [burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-1: PE_HOSTFILE shows
>>> slots=2
>>> [..snip..]
>>> burl-ct-280r-1
>>> burl-ct-280r-1
>>> burl-ct-280r-0
>>> burl-ct-280r-0
>>> burl-ct-280r-0 151 =>
>>>
>>>
>>> On 03/31/09 22:39, PN wrote:
>>>
 Dear Rolf,
 Thanks for your reply.
 I've created another PE and changed the submission script, explicitly
 specify the hostname with "--host".
 However the result i

Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem

2009-04-01 Thread Dave Love
Josh Hursey  writes:

> The configure flag that you are looking for is:
>  --with-ft=cr

Is there a good reason why --with-blcr doesn't imply it?

> You may also want to consider using the thread options too for
> improved C/R response:
>   --enable-mpi-threads --enable-ft-thread

Incidentally, the draft document linked from the FAQ has a typo:
`--enable-mpi-thread' (with a missing `s').



Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Dave Love
Rolf Vandevaart  writes:

> No, orte_leave_session_attached is needed to avoid the errno=2 errors
> from the sm btl. (It is fixed in 1.3.2 and trunk)

[It does cause other trouble, but I forget what the exact behaviour was
when I lost it as a default.]

>> Yes, but there's a problem with the recommended (as far as I remember)
>> setup, with one slot per node to ensure a single job per node.  In that
>> case, you have no control over allocation -- -bynode and -byslot are
>> equivalent, which apparently can badly affect some codes.  We're
>> currently using a starter to generate a hosts file for that reason
 ^^^

I meant queue prologue, not pe starter method.

>> (complicated by having dual- and quad-core nodes) and would welcome a
>> better idea.
>>
> I am not sure what you are asking here.  Are you trying to get a
> single MPI process per node?  You could use -npernode 1.  Sorry for my
> confusion.

No.  It's an SGE issue, not an Open MPI one, but to try to explain
anyhow:  People normally want to ensure that a partially-full node
running an MPI job doesn't get anything else scheduled on it.  E.g. on
8-core nodes, if you submit a 16-process job, there are four cores left
over on the relevant nodes which might get something else scheduled on
them.  Using one slot per node avoids that, but means generating your
own hosts file if you want -bynode and -byslot not to be equivalent.



Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem

2009-04-01 Thread Josh Hursey


On Apr 1, 2009, at 12:42 PM, Dave Love wrote:


Josh Hursey  writes:


The configure flag that you are looking for is:
--with-ft=cr


Is there a good reason why --with-blcr doesn't imply it?


Not really. Though it is most likely difficult to make it happen given  
the configure logic in Open MPI (at least the way I understand it). It  
has to do with BLCR being a component, but the --with-ft flag being a  
fundamental flag and the ordering between when the two would be  
evaluated in the configure script.






You may also want to consider using the thread options too for
improved C/R response:
 --enable-mpi-threads --enable-ft-thread


Incidentally, the draft document linked from the FAQ has a typo:
`--enable-mpi-thread' (with a missing `s').


Thanks. I'll fix this and post a new draft soon (I have a few other  
items to put in there anyway).


Cheers,
Josh





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] job runs with mpirun on a node but not if submitted via Torque.

2009-04-01 Thread Rahul Nabar
On Wed, Apr 1, 2009 at 1:13 AM, Ralph Castain  wrote:
> So I gather that by "direct" you mean that you don't get an allocation from
> Maui before running the job, but for the other you do? Otherwise, OMPI
> should detect the that it is running under Torque and automatically use the
> Torque launcher unless directed to do otherwise.
>

I think I've figured out the sore point. It seems "ulimit" is needed.
Things seem sensitive to where exactly I put the ulimit directive
though. Funnily, the nodes reported an unlimited stack before too but
putting this extra directive in there seems to have helped!

I'm doing more testing to be sure that the problem has been solved!

Thanks for the leads guys!

-- 
Rahul


Re: [OMPI users] Cannot build OpenMPI 1.3 with PGI pgf90 and Gnu gcc/g++.

2009-04-01 Thread Jeff Squyres

On Mar 31, 2009, at 4:21 PM, Gus Correa wrote:


Please, correct my argument below if I am wrong.
I am not sure yet if the problem is caused by libtool,
because somehow it was not present in OpenMPI 1.2.8.

Just as a comparison, the libtool commands on 1.2.8 and 1.3 are very
similar, although 1.2.8 builds right, and 1.3 breaks.
The libtool commands differ in that 1.3 inserts ../../../ompi/ 
libmpi.la

on the list of libraries to be linked to, whereas in 1.2.8
../../../ompi/libmpi.la is not there.



We did specifically add the following in the build process in v1.3:

libmpi_f90_la_LIBADD = $(top_builddir)/ompi/libmpi.la

which means that libmpi_f90 will link against libmpi.  That is the  
source of this ickyness.  Here's the commit where I put this change in  
to Open MPI:


https://svn.open-mpi.org/trac/ompi/changeset/19040

IIRC, the reason is that libmpi_f90 depends on libmpi (i.e., it calls  
functions in libmpi).  If nothing else, it's the Right Thing To Do to  
link in a dependent library.  But also to setup automatic rpath's  
properly, it is best to actually create libmpi_f90 with an explicit  
dependency (which turns into an implicit dependency later).  Rpath  
isn't a huge deal here because both libmpi and libmpi_f90 should be  
installed in the same directory, but it's the principle of the  
thing...  Additionally, it allows people to be lazy and do something  
like:


gfortran my_mpi_application.f90 -lmpi_f90

and that pulls in the rest of the libraries because of the implicit  
dependencies.  I think there were other reasons to do the explicit/ 
implicit dependencies (e.g., look at ldd output and you can tell what  
libs will be pulled in, etc.), but I don't remember them all off the  
top of my head.  :-(


> I can think of two workarounds for you (one significantly less  
icky than

> the other):
>
> 1. Use pgcc, pgCC, pgf77, and pgf90 to build Open MPI.  If you  
have no
> C++ MPI code, the resulting Open MPI build *should* be compatible  
with

> your C + Fortran code.

Yes, this one, using only PGI compilers, was built already.

Our concern is that some codes seem to rely on gcc as the underlying
C compiler.
Hence the need for the hybrid libraries.



Ok, bummer.


> 2. Instead of using the "real" pgf77/pgf90, put pgf77/pgf90 scripts
> early in your PATH that simply strip out -pthread from the argv  
and then
> invoke the real/underlying pgf77/pgf90 compilers.  This is pretty  
icky,

> but it should work...

Here is the "fake pgf90" script:

#! /bin/bash
newargs=`echo $@ | sed s/-pthread//g -`
echo "/real/path/to/bin/pgf90  $newargs"
/real/path/to/bin/pgf90  $newargs
exit

Then I changed the path to get this script ahead of the real pgf90,
did make distclean, removed old subdirectories,
configured again, did make again ... and ...

It works!
Ugly, but functional!  :)
While a final fix for the configure/libtool issue is in the works,
this is fine.
Many thanks.



Glad we got it working, but I agree that it is significantly ugly.  :-)


Question:

The output of ompi_info --config shows that the absolute path to the
"fake pgf90" script was recorded by OpenMPI.

Will OpenMPI mpif90 hardwire this absolute path, or will it
search the user $PATH variable for the real mpif90 first?




Eww -- good point.  I believe that those absolute pathnames are *only*  
recorded for ompi_info output so that you can know which compiler was  
used to build Open MPI after the fact.  The relative names are stored  
for use in the wrapper compilers.  However, the wrapper compiler  
arguments are fully customizable -- you might want to remove the  
"fake" compilers after the fact.  See this FAQ entry for details:


http://www.open-mpi.org/faq/?category=mpi-apps#override-wrappers-after-v1.0

--
Jeff Squyres
Cisco Systems



[OMPI users] Open MPI 2009 released

2009-04-01 Thread George Bosilca

The Open MPI Team, representing a consortium of bailed-out banks, car
manufacturers, and insurance companies, is pleased to announce the
release of the "unbreakable" / bug-free version Open MPI 2009,
(expected to be available by mid-2011).  This release is essentially a
complete rewrite of Open MPI based on new technologies such as C#,
Java, and object-oriented Cobol (so say we all!).  Buffer overflows
and memory leaks are now things of the past.  We strongly recommend
that all users upgrade to Windows 7 to fully take advantage of the new
powers embedded in Open MPI.

This version can be downloaded from the The Onion web site or from
many BitTorrent networks (seeding now; the Open MPI ISO is
approximately 3.97GB -- please wait for the full upload).

Here is an abbreviated list of changes in Open MPI 2009 as compared to
the previous version:

- Dropped support for MPI 2 in favor of the newly enhanced MPI 11.7
 standard.  MPI_COOK_DINNER support is only available with additional
 equipment (some assembly may be required).  An experimental PVM-like
 API has been introduced to deal with the current limitations of the
 MPI 11.7 API.
- Added a Twitter network transport capable of achieving peta-scale
 per second bandwidth (but only on useless data).
- Dropped support for the barely-used x86 and x86_64 architectures in
 favor of the most recent ARM6 architecture.  As a direct result,
 several Top500 sites are planning to convert from their now obsolete
 peta-scale machines to high-reliability iPhone clusters using the
 low-latency AT&T 3G network.
- The iPhone iMPI app (powered by iOpen MPI) is now downloadable from
 the iTunes Store.  Blackberry support will be included in a future
 release.
- Fix all compiler errors related to the PGI 8.0 compiler by
 completely dropping support.
- Add some "green" features for energy savings.  The new "--bike"
 mpirun option will only run your parallel jobs only during the
 operation hours of the official Open MPI biking team.  The
 "--preload-result" option will directly embed the final result in
 the parallel execution, leading to more scalable and reliable runs
 and decreasing the execution time of any parallel application under
 the real-time limit of 1 second.  Open MPI is therefore EnergyStar
 compliant when used with these options.
- In addition to moving Open MPI's lowest point-to-point transports to
 be an external project, limited support will be offered for
 industry-standard platforms.  Our focus will now be to develop
 highly scalable transports based on widely distributed technologies
 such as SMTP, High Performance Gopher (v3.8 and later), OLE COMM,
 RSS/Atom, DNS, and Bonjour.
- Opportunistic integration with Conflicker in order to utilize free
 resources distributed world-wide.
- Support for all Fortran versions prior to Fortran 2020 has been
 dropped.

Make today an Open MPI day!




Re: [OMPI users] Open MPI 2009 released

2009-04-01 Thread Damien Hocking

Outstanding.  I'll have two.

Damien

George Bosilca wrote:

The Open MPI Team, representing a consortium of bailed-out banks, car
manufacturers, and insurance companies, is pleased to announce the
release of the "unbreakable" / bug-free version Open MPI 2009,
(expected to be available by mid-2011).  This release is essentially a
complete rewrite of Open MPI based on new technologies such as C#,
Java, and object-oriented Cobol (so say we all!).  Buffer overflows
and memory leaks are now things of the past.  We strongly recommend
that all users upgrade to Windows 7 to fully take advantage of the new
powers embedded in Open MPI.

This version can be downloaded from the The Onion web site or from
many BitTorrent networks (seeding now; the Open MPI ISO is
approximately 3.97GB -- please wait for the full upload).

Here is an abbreviated list of changes in Open MPI 2009 as compared to
the previous version:

- Dropped support for MPI 2 in favor of the newly enhanced MPI 11.7
 standard.  MPI_COOK_DINNER support is only available with additional
 equipment (some assembly may be required).  An experimental PVM-like
 API has been introduced to deal with the current limitations of the
 MPI 11.7 API.
- Added a Twitter network transport capable of achieving peta-scale
 per second bandwidth (but only on useless data).
- Dropped support for the barely-used x86 and x86_64 architectures in
 favor of the most recent ARM6 architecture.  As a direct result,
 several Top500 sites are planning to convert from their now obsolete
 peta-scale machines to high-reliability iPhone clusters using the
 low-latency AT&T 3G network.
- The iPhone iMPI app (powered by iOpen MPI) is now downloadable from
 the iTunes Store.  Blackberry support will be included in a future
 release.
- Fix all compiler errors related to the PGI 8.0 compiler by
 completely dropping support.
- Add some "green" features for energy savings.  The new "--bike"
 mpirun option will only run your parallel jobs only during the
 operation hours of the official Open MPI biking team.  The
 "--preload-result" option will directly embed the final result in
 the parallel execution, leading to more scalable and reliable runs
 and decreasing the execution time of any parallel application under
 the real-time limit of 1 second.  Open MPI is therefore EnergyStar
 compliant when used with these options.
- In addition to moving Open MPI's lowest point-to-point transports to
 be an external project, limited support will be offered for
 industry-standard platforms.  Our focus will now be to develop
 highly scalable transports based on widely distributed technologies
 such as SMTP, High Performance Gopher (v3.8 and later), OLE COMM,
 RSS/Atom, DNS, and Bonjour.
- Opportunistic integration with Conflicker in order to utilize free
 resources distributed world-wide.
- Support for all Fortran versions prior to Fortran 2020 has been
 dropped.

Make today an Open MPI day!


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?

2009-04-01 Thread Guanyinzhu

thank you very much!



The option -mca orte_heartbeat_rate N is very usefull do detect failures like 
host or network failed or orted deamon killed for the running mpi job.



I have another question:

I use ssh for openmpi remote connect, but sometimes a host doesn't answer ssh 
login request,  but answer ping, maybe because of os . If this "error" host in 
the hostfile, the "mpirun -hostfile..." command would hang even I set -mca 
orte_heartbeat_rate 5 , are there any other options to avoid this? 





thanks a lot!



From: r...@lanl.gov
To: us...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Wed, 1 Apr 2009 07:34:46 -0600
Subject: Re: [OMPI users] Beginner's question: how to avoid a running mpi job 
hang if host or network failed or orted deamon killed?

There is indeed a heartbeat mechanism you can use - it is "off" by default. You 
can set it to check every N seconds with:


-mca orte_heartbeat_rate N


on your command line. Or if you want it to always run, add "orte_heartbeat_rate 
= N" to your default MCA param file. OMPI will declare the orted "dead" if two 
consecutive heartbeats are not seen.


Let me know how it works for you - it hasn't been extensively tested, but has 
worked so far.
Ralph



On Apr 1, 2009, at 6:07 AM, Guanyinzhu wrote:

I mean killed the orted deamon process during the mpi job running , but the mpi 
job hang and could't notice one of it's rank failed.




> Date: Wed, 1 Apr 2009 19:09:34 +0800
> From: ml.jgmben...@mailsnare.net
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Beginner's question: how to avoid a running mpi job 
> hang if host or network failed or orted deamon killed?
> 
> Is there a firewall somewhere ?
> 
> Jerome
> 
> Guanyinzhu wrote:
> > Hi! 
> > I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on 
> > Redhat Linux x86_64. 
> > 
> > I run a test like this: just killed the orted process and the job hung 
> > for a long time (hang for 2~3 hours then I killed the job).
> > 
> > I have the follow questions:
> > 
> > when network failed or host failed or orted deamon was killed by 
> > accident, How long would the running mpi job notice and exit? 
> > 
> > Does OpenMPI support a heartbeat mechanism or how c! ould I fast 
> > detect the failture to avoid the mpi job hang?
> > 
> > 
> > thanks a lot!
> > 
> > 
> > 
> > ?MSN,??! ! 
> > 
> > 
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



更多热辣资讯尽在新版MSN首页! 立刻访问! ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_
Live Search视频搜索,快速检索视频的利器!
http://www.live.com/?scope=video