Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread sadfub
Hi, Jeff

many thanks for your reply..

> 1. You might want to update your version of Open MPI if possible; the  
> v1.1.1 version is quite old.  We have added many new bug fixes and  
> features since v1.1.1 (including tight SGE integration).  There is  
> nothing special about the Open MPI that is included in the OFED  
> distribution; you can download a new version from the Open MPI web  
> site (the current stable version is v1.2.3), configure, compile, and  
> install it with your current OFED installation.  You should be able  
> to configure Open MPI with:

Hmm, I've heard about conflicts with OMPI 1.2.x and OFED 1.1 (sorry no
refference here), and I've got no luck producing a working OMPI
installation ("mpirun --help" runs, and ./IMB-MPI compiles and runs too,
but "mpirun -np 2 node03,node14 IMB-MPI1" doesnt (segmentation
fault))... (beside that, I know that OFED 1.1 is quite old too) So I'm
tested it with OMPI 1.1.5 => same error.


> 2. I know little/nothing about SGE, but I'm assuming that you need to  
> have SGE pass the proper memory lock limits to new processes.  In an  
> interactive login, you showed that the max limit is "8162952" -- you  
> might just want to make it unlimited, unless you have a reason for  
> limiting it.  See http://www.open-mpi.org/faq/? 

yes I allready read the faq, and even setting them to unlimited has
shown not be working. In the SGE one could specify the limits to
SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
modify > limits) But there is everything set to infinity. (Beside that,
the job is running with a static machinefile (is this an
"noninteractive" job?)) How could I test ulimits of interactive and
noninteractive jobs?

Thank you for your great help.


Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread Markus Daene
Hi.

I think it is not necessary to specify the hosts via the hostfile using SGE 
and OpenMPI, even the $NSLOTS is not necessary , just run 
mpirun executable 
this works very well.

to your memory problem:
I had similar problems when I specified the h_vmem option to use in SGE. 
Without SGE everything works, but starting with SGE gives such memory errors.
You can easily check this with 'qconf -sc'. If you have used this option, try 
without it. The problem in my case was that OpenMPI allocates sometimes a lot 
of memory and the job gets immediately killed by SGE, and one gets such error 
messages, see my posting some days ago. I am not sure if this helps in your 
case but it could be an explanation.

Markus



Am Donnerstag, 21. Juni 2007 15:26 schrieb sad...@gmx.net:
> Hi,
>
> I'm having some really strange error causing me some serious headaches.
> I want to integrate OpenMPI version 1.1.1 from the OFED package version
> 1.1 with SGE version 6.0. For mvapich all works, but for OpenMPI not ;(.
> Here is my jobfile and error message:
> #!/bin/csh -f
> #$ -N MPI_Job
> #$ -pe mpi 4
> export PATH=$PATH:/usr/ofed/mpi/gcc/openmpi-1.1.1-1/bin
> export
> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/ofed/mpi/gcc/openmpi-1.1.1.-1/lib64
> /usr/ofed/mpi/gcc/openmpi-1.1.1-1/bin/mpirun -np $NSLOTS -hostfile
> $TMPDIR/machines /usr/ofed/mpi/gcc/openmpi-1.1.1-1/tests/IMB-2.3/IMB-MPI1
>
> ERRORMESSAGE:
> [node04:25768] mca_mpool_openib_register: ibv_reg_mr(0x584000,102400)
> failed with error: Cannot allocate memory
> [node04:25768] mca_mpool_openib_register: ibv_reg_mr(0x584000,102400)
> failed with error: Cannot allocate memory
> [node04:25768] mca_mpool_openib_register: ibv_reg_mr(0x584000,528384)
> failed with error: Cannot allocate memory
> [node04:25768] mca_mpool_openib_register: ibv_reg_mr(0x584000,528384)
> failed with error: Cannot allocate memory
> [node04:25769] mca_mpool_openib_register: ibv_reg_mr(0x584000,102400)
> failed with error: Cannot allocate memory
> [node04:25769] mca_mpool_openib_register: ibv_reg_mr(0x584000,102400)
> failed with error: Cannot allocate memory
> [node04:25769] mca_mpool_openib_register: ibv_reg_mr(0x584000,528384)
> failed with error: Cannot allocate memory
> [node04:25769] mca_mpool_openib_register: ibv_reg_mr(0x584000,528384)
> failed with error: Cannot allocate memory
> [node04:25770] mca_mpool_openib_register: ibv_reg_mr(0x584000,102400)
> failed with error: Cannot allocate memory
> [node04:25770] mca_mpool_openib_register: ibv_reg_mr(0x584000,102400)
> failed with error: Cannot allocate memory
> [node04:25770] mca_mpool_openib_register: ibv_reg_mr(0x584000,528384)
> failed with error: Cannot allocate memory
> [node04:25770] mca_mpool_openib_register: ibv_reg_mr(0x584000,528384)
> failed with error: Cannot allocate memory
> [node04:25771] mca_mpool_openib_register: ibv_reg_mr(0x584000,102400)
> failed with error: Cannot allocate memory
> [node04:25771] mca_mpool_openib_register: ibv_reg_mr(0x584000,102400)
> failed with error: Cannot allocate memory
> [node04:25771] mca_mpool_openib_register: ibv_reg_mr(0x584000,528384)
> failed with error: Cannot allocate memory
> [node04:25771] mca_mpool_openib_register: ibv_reg_mr(0x584000,528384)
> failed with error: Cannot allocate memory
> [0,1,1][btl_openib.c:808:mca_btl_openib_create_cq_srq] error creating
> low priority cq for mthca0 errno says Cannot allocate memory
>
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>   PML add procs failed
>   --> Returned "Error" (-1) instead of "Success" (0)
> --
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> MPI_Job.e111975 (END)
>
>
> If I run the OMPI job just with out SGE => everything works e.g. the
> following command:
> /usr/ofed/mpi/gcc/openmpi-1.1.1-1/bin/mpirun -v -np 4 -H
> node04,node04,node04,node04
> /usr/ofed/mpi/gcc/openmpi-1.1.1-1/tests/IMB-2.3/IMB-MPI1
>
> If I do this with static machinefiles, it works too:
> $ cat /tmp/machines
> node04
> node04
> node04
> node04
>
> /usr/ofed/mpi/gcc/openmpi-1.1.1-1/bin/mpirun -v -np 4 -hostfile
> /tmp/machines /usr/ofed/mpi/gcc/openmpi-1.1.1-1/tests/IMB-2.3/IMB-MPI1
>
> And if I run this in a jobscript it works even with a static machinefile
> (not shown below):
> #!/bin/csh -f
> #$ -N MPI_Job
> #$ -pe mpi 4
> export PATH=$PATH:/usr/ofed/mpi/gcc/openmpi-1.1.1-1/bin
> export
> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/ofed/mpi/gcc/openmpi-1.1.1.-1/lib64
> /usr/ofed/mpi/gcc/openmpi-1.1.1-1/bin/mpirun -v -np 4 -H
> node04,no

Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread sadfub
Markus Daene schrieb:
> Hi.
> 
> I think it is not necessary to specify the hosts via the hostfile using SGE 
> and OpenMPI, even the $NSLOTS is not necessary , just run 
> mpirun executable this works very well.

This produces the same error, but thanks for your suggestion. (For the
sake of interest: how controls then ompi how many slots it may use?)


> to your memory problem:
> I had similar problems when I specified the h_vmem option to use in SGE. 
> Without SGE everything works, but starting with SGE gives such memory errors.
> You can easily check this with 'qconf -sc'. If you have used this option, try 
> without it. The problem in my case was that OpenMPI allocates sometimes a lot 
> of memory and the job gets immediately killed by SGE, and one gets such error 
> messages, see my posting some days ago. I am not sure if this helps in your 
> case but it could be an explanation.

Hmm it seems that I'm not using such an option (for my queue the h_vmem
and s_vmem values are set to infinity). Here the output for the qconf
-sc command. (Sorry for posting SGE related stuff on this mailing list):
[~]# qconf -sc
#name   shortcut   typerelop requestable consumable
default  urgency
#
archa  RESTRING==YES NO
NONE 0
calendarc  RESTRING==YES NO
NONE 0
cpu cpuDOUBLE  >=YES NO
00
h_core  h_core MEMORY  <=YES NO
00
h_cpu   h_cpu  TIME<=YES NO
0:0:00
h_data  h_data MEMORY  <=YES NO
00
h_fsize h_fsizeMEMORY  <=YES NO
00
h_rss   h_rss  MEMORY  <=YES NO
00
h_rth_rt   TIME<=YES NO
0:0:00
h_stack h_stackMEMORY  <=YES NO
00
h_vmem  h_vmem MEMORY  <=YES NO
00
hostnameh  HOST==YES NO
NONE 0
load_avgla DOUBLE  >=NO  NO
00
load_long   ll DOUBLE  >=NO  NO
00
load_medium lm DOUBLE  >=NO  NO
00
load_short  ls DOUBLE  >=NO  NO
00
mem_freemf MEMORY  <=YES NO
00
mem_total   mt MEMORY  <=YES NO
00
mem_usedmu MEMORY  >=YES NO
00
min_cpu_intervalmciTIME<=NO  NO
0:0:00
np_load_avg nlaDOUBLE  >=NO  NO
00
np_load_longnllDOUBLE  >=NO  NO
00
np_load_medium  nlmDOUBLE  >=NO  NO
00
np_load_short   nlsDOUBLE  >=NO  NO
00
num_procp  INT ==YES NO
00
qname   q  RESTRING==YES NO
NONE 0
rerun   re BOOL==NO  NO
00
s_core  s_core MEMORY  <=YES NO
00
s_cpu   s_cpu  TIME<=YES NO
0:0:00
s_data  s_data MEMORY  <=YES NO
00
s_fsize s_fsizeMEMORY  <=YES NO
00
s_rss   s_rss  MEMORY  <=YES NO
00
s_rts_rt   TIME<=YES NO
0:0:00
s_stack s_stackMEMORY  <=YES NO
00
s_vmem  s_vmem MEMORY  <=YES NO
00
seq_no  seqINT ==NO  NO
00
slots   s  INT <=YES YES
11000
swap_free   sf MEMORY  <=YES NO
00
swap_rate   sr MEMORY  >=YES NO
00
swap_rsvd   srsv   MEMORY  >=YES NO
00
swap_total  st MEMORY  <=YES NO
00
swap_used   su MEMORY  >=YES NO
00
tmpdir  tmpRESTRING==NO  NO
NONE 0
virtual_freevf MEMORY  <=YES NO
00
virtual_total   vt MEMORY  <=YES NO
00
virtual_usedvu MEMORY  >=YES NO
00
# >#< starts a comment but comments are not saved across edits 

thanks for your help.



Re: [OMPI devel] create new btl

2007-06-22 Thread Pablo Cascón Katchadourian
   It couldn't be easier. Thanks a lot!

Pablo


On Friday 22 June 2007 00:32:13 George Bosilca wrote:
> Rerun the autogen.sh script and the new BTL will get auto-magically
> included in the build. You don't have to modify anything, just run
> the script.
>
> Once you get it compiled, you can specify --mca btl ,self
> on your mpirun command line to get access at runtime to your BTL.
>
>george.
>
> On Jun 21, 2007, at 3:36 PM, pcas...@atc.ugr.es wrote:
> > Hello all,
> >I just arrived to open-mpi. I'm trying to create a new btl. The
> > goal is
> > to use open-mpi with a library that sends/receives packets with a
> > network processor (IXP) based board. Since it's an ethernet board I
> > thought the best way to start it's to reproduce the TCP btl. So I made
> > a copy of the directory ompi/mca/btl/tcp/ just to have something to
> > start. But then, I don't know how to include this "new" btl into the
> > build system (./configure ; make all install). My knowledge about the
> > GNU autotools it's not good enough I guess. I believe first step it's
> > to modify the configure script through 'autoconf' but not sure how to
> > do this. I've been searching for information about that on this
> > maillist with no luck. What will be the steps to create a basic btl?
> > What's the best way to integrate the code onto the whole open-mpi?
> > Thanks a lot for reading :)
> >
> > Regards
> >Pablo
> >
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread Jeff Squyres

On Jun 22, 2007, at 3:52 AM, sad...@gmx.net wrote:


1. You might want to update your version of Open MPI if possible; the
v1.1.1 version is quite old.  We have added many new bug fixes and
features since v1.1.1 (including tight SGE integration).  There is
nothing special about the Open MPI that is included in the OFED
distribution; you can download a new version from the Open MPI web
site (the current stable version is v1.2.3), configure, compile, and
install it with your current OFED installation.  You should be able
to configure Open MPI with:


Hmm, I've heard about conflicts with OMPI 1.2.x and OFED 1.1 (sorry no
refference here),


I'm unaware of any problems with OMPI 1.2.x and OFED 1.1.  I run OFED  
1.1 on my cluster at Cisco and have many different versions of OMPI  
installed (1.2, trunk, etc.).



and I've got no luck producing a working OMPI
installation ("mpirun --help" runs, and ./IMB-MPI compiles and runs  
too,

but "mpirun -np 2 node03,node14 IMB-MPI1" doesnt (segmentation
fault))...


Can you send more information on this?  See http://www.open-mpi.org/ 
community/help/



(beside that, I know that OFED 1.1 is quite old too) So I'm
tested it with OMPI 1.1.5 => same error.


*IF* all goes well, OFED 1.2 should be released today (famous last  
words).



2. I know little/nothing about SGE, but I'm assuming that you need to
have SGE pass the proper memory lock limits to new processes.  In an
interactive login, you showed that the max limit is "8162952" -- you
might just want to make it unlimited, unless you have a reason for
limiting it.  See http://www.open-mpi.org/faq/?


yes I allready read the faq, and even setting them to unlimited has
shown not be working. In the SGE one could specify the limits to
SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
modify > limits) But there is everything set to infinity. (Beside  
that,

the job is running with a static machinefile (is this an
"noninteractive" job?)) How could I test ulimits of interactive and
noninteractive jobs?


Launch an SGE job that calls the shell command "limit" (if you run C- 
shell variants) or "ulimit -l" (if you run Bourne shell variants).   
Ensure that the output is "unlimited".


What are the limits of the user that launches the SGE daemons?  I.e.,  
did the SGE daemons get started with proper "unlimited" limits?  If  
not, that could hamper SGE's ability to set the limits that you told  
it to via qmon (remember my disclaimer: I know nothing about SGE, so  
this is speculation).


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread Markus Daene

> Markus Daene wrote:
> > Hi.
> >
> > I think it is not necessary to specify the hosts via the hostfile using
> > SGE and OpenMPI, even the $NSLOTS is not necessary , just run
> > mpirun executable this works very well.
>
> This produces the same error, but thanks for your suggestion. (For the
> sake of interest: how controls then ompi how many slots it may use?)

It just knows ist, I think the developers could answer this quastions.

> > to your memory problem:
> > I had similar problems when I specified the h_vmem option to use in SGE.
> > Without SGE everything works, but starting with SGE gives such memory
> > errors. You can easily check this with 'qconf -sc'. If you have used this
> > option, try without it. The problem in my case was that OpenMPI allocates
> > sometimes a lot of memory and the job gets immediately killed by SGE, and
> > one gets such error messages, see my posting some days ago. I am not sure
> > if this helps in your case but it could be an explanation.

I am sorry to discuss SGE stuff here as well, but there was this question and 
one should make clear that this is not just related to OMPI.

I think your output shows exactely the problem: you have set h_vmem as 
requestable and the default value to 0, the job has no memory at all. OMPI 
somehow knows that is has just this memory granted by SGE, so it cannot 
allocate any memory in this case. Of course you get the errors.
You should either set h_vmem to not requestable, or set a proper default 
value. e.g. 2.0G, or specify the memory consumption in your job script like
#$ -l h_vmem=2000M
it is not important that your queue has set h_vmem to infinity, this gives you 
just the maximum which you can request. 

Markus


> Hmm it seems that I'm not using such an option (for my queue the h_vmem
> and s_vmem values are set to infinity). Here the output for the qconf
> -sc command. (Sorry for posting SGE related stuff on this mailing list):
> [~]# qconf -sc
> #name   shortcut   typerelop requestable consumable
> default  urgency
> #--
>-- archa  RESTRING==YES
> NO
> NONE 0
> calendarc  RESTRING==YES NO
> NONE 0
> cpu cpuDOUBLE  >=YES NO
> 00
> h_core  h_core MEMORY  <=YES NO
> 00
> h_cpu   h_cpu  TIME<=YES NO
> 0:0:00
> h_data  h_data MEMORY  <=YES NO
> 00
> h_fsize h_fsizeMEMORY  <=YES NO
> 00
> h_rss   h_rss  MEMORY  <=YES NO
> 00
> h_rth_rt   TIME<=YES NO
> 0:0:00
> h_stack h_stackMEMORY  <=YES NO
> 00
> h_vmem  h_vmem MEMORY  <=YES NO
> 00
> hostnameh  HOST==YES NO
> NONE 0
> load_avgla DOUBLE  >=NO  NO
> 00
> load_long   ll DOUBLE  >=NO  NO
> 00
> load_medium lm DOUBLE  >=NO  NO
> 00
> load_short  ls DOUBLE  >=NO  NO
> 00
> mem_freemf MEMORY  <=YES NO
> 00
> mem_total   mt MEMORY  <=YES NO
> 00
> mem_usedmu MEMORY  >=YES NO
> 00
> min_cpu_intervalmciTIME<=NO  NO
> 0:0:00
> np_load_avg nlaDOUBLE  >=NO  NO
> 00
> np_load_longnllDOUBLE  >=NO  NO
> 00
> np_load_medium  nlmDOUBLE  >=NO  NO
> 00
> np_load_short   nlsDOUBLE  >=NO  NO
> 00
> num_procp  INT ==YES NO
> 00
> qname   q  RESTRING==YES NO
> NONE 0
> rerun   re BOOL==NO  NO
> 00
> s_core  s_core MEMORY  <=YES NO
> 00
> s_cpu   s_cpu  TIME<=YES NO
> 0:0:00
> s_data  s_data MEMORY  <=YES NO
> 00
> s_fsize s_fsizeMEMORY  <=YES NO
> 00
> s_rss   s_rss  MEMORY  <=YES NO
> 00
> s_rts_rt   TIME<=YES NO
> 0:0:00
> s_stack s_stackMEMORY  <=YES NO
> 00
> s_vmem  s_vmem MEMORY  <=YES NO
> 00
> seq_no  seqINT ==NO  NO
> 00
> slots  

Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread sadfub
Markus Daene wrote:

>>> to your memory problem:
>>> I had similar problems when I specified the h_vmem option to use in SGE.
>>> Without SGE everything works, but starting with SGE gives such memory
>>> errors. You can easily check this with 'qconf -sc'. If you have used this
>>> option, try without it. The problem in my case was that OpenMPI allocates
>>> sometimes a lot of memory and the job gets immediately killed by SGE, and
>>> one gets such error messages, see my posting some days ago. I am not sure
>>> if this helps in your case but it could be an explanation.
> 
> I am sorry to discuss SGE stuff here as well, but there was this question and 
> one should make clear that this is not just related to OMPI.
> 
> I think your output shows exactely the problem: you have set h_vmem as 
> requestable and the default value to 0, the job has no memory at all. OMPI 

(thought that zero means infinity)

> somehow knows that is has just this memory granted by SGE, so it cannot 
> allocate any memory in this case. Of course you get the errors.
> You should either set h_vmem to not requestable, or set a proper default 
> value. e.g. 2.0G, or specify the memory consumption in your job script like
> #$ -l h_vmem=2000M
> it is not important that your queue has set h_vmem to infinity, this gives 
> you 
> just the maximum which you can request. 

If I use the h_vmem option I get a slight different error, but if I mark
h_vmem as not requestable => same error. Below is the slight different
error message:

[node17:02861] mca: base: component_find: unable to open: libsysfs.so.1:
failed to map segment from shared object: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_pml_ob1.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_basic.so:
failed to map segment from share
d object: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_hierarch.so:
failed to map segment from sh
ared object: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_self.so: failed
to map segment from shared
 object: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_sm.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_tuned.so:
failed to map segment from share
d object: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_osc_pt2pt.so: failed
to map segment from shared
 object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open: libsysfs.so.1:
failed to map segment from shared object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_pml_ob1.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_basic.so:
failed to map segment from share
d object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_hierarch.so:
failed to map segment from sh
ared object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_self.so: failed
to map segment from shared
 object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_sm.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_tuned.so:
failed to map segment from share
d object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_osc_pt2pt.so: failed
to map segment from shared
 object: Cannot allocate memory (ignored)
[node17:02863] mca: base: component_find: unable to open: libsysfs.so.1:
failed to map segment from shared object: Cannot allocate memory (ignored)
[node17:02863] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_pml_ob1.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02863] mca: base: componen

Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread Pak Lui

Jeff Squyres wrote:

2. I know little/nothing about SGE, but I'm assuming that you need to
have SGE pass the proper memory lock limits to new processes.  In an
interactive login, you showed that the max limit is "8162952" -- you
might just want to make it unlimited, unless you have a reason for
limiting it.  See http://www.open-mpi.org/faq/?

yes I allready read the faq, and even setting them to unlimited has
shown not be working. In the SGE one could specify the limits to
SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
modify > limits) But there is everything set to infinity. (Beside  
that,

the job is running with a static machinefile (is this an
"noninteractive" job?)) How could I test ulimits of interactive and
noninteractive jobs?


Launch an SGE job that calls the shell command "limit" (if you run C- 
shell variants) or "ulimit -l" (if you run Bourne shell variants).   
Ensure that the output is "unlimited".


What are the limits of the user that launches the SGE daemons?  I.e.,  
did the SGE daemons get started with proper "unlimited" limits?  If  
not, that could hamper SGE's ability to set the limits that you told  
it to via qmon (remember my disclaimer: I know nothing about SGE, so  
this is speculation).




I am assuming you have tried without using SGE (like via ssh or others) 
to launch your job and that works correctly? If yes then you should 
compare the outputs of limit as Jeff suggested to see if they are any 
difference between the two (with and without using SGE).


I know of a similar problem with SGE's limitation that it cannot set the 
file descriptor limit for the user processes (and I believe the SGE 
folks are aware of the problem.) The workaround was to put the setting 
into the ~/.tcshrc. So if SGE is not setting other resource limit 
correctly or doesn't provide the option, you may have to workaround into 
the ~/.tcshrc or simliar settings file for your shell. Otherwise it'll 
probably fall back to use the system default.


--

- Pak Lui
pak@sun.com


Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread sadfub
Hi Pak,

> Jeff Squyres wrote:
 2. I know little/nothing about SGE, but I'm assuming that you need to
 have SGE pass the proper memory lock limits to new processes.  In an
 interactive login, you showed that the max limit is "8162952" -- you
 might just want to make it unlimited, unless you have a reason for
 limiting it.  See http://www.open-mpi.org/faq/?
>>> yes I allready read the faq, and even setting them to unlimited has
>>> shown not be working. In the SGE one could specify the limits to
>>> SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
>>> modify > limits) But there is everything set to infinity. (Beside  
>>> that,
>>> the job is running with a static machinefile (is this an
>>> "noninteractive" job?)) How could I test ulimits of interactive and
>>> noninteractive jobs?
>> Launch an SGE job that calls the shell command "limit" (if you run C- 
>> shell variants) or "ulimit -l" (if you run Bourne shell variants).   
>> Ensure that the output is "unlimited".
>>
>> What are the limits of the user that launches the SGE daemons?  I.e.,  
>> did the SGE daemons get started with proper "unlimited" limits?  If  
>> not, that could hamper SGE's ability to set the limits that you told  
>> it to via qmon (remember my disclaimer: I know nothing about SGE, so  
>> this is speculation).
>>
> 
> I am assuming you have tried without using SGE (like via ssh or others) 
> to launch your job and that works correctly? If yes then you should 
> compare the outputs of limit as Jeff suggested to see if they are any 
> difference between the two (with and without using SGE).

Yes, without SGE all works, with SGE it does work too if I use a static
machinefile (see initial post), or -H h1,...,hn does work too! Just with
the SGE's generate $TMPDIR/machines file (which in turn is valid! I
checked this), the job doesn't run. And the ulimits are (in every three
possibilities every time) unlimited:

pos1: pdsh -R shh -w node[XX-YY] ulimit -a => unlimited

(loose coupled)
pos2: qsub jobscribt, where jobscript just calls the command as in pos1

(thight coupled?)
pos3: qsub jobscribt, where jobscript calls another script (containing
the same command as in pos1) and additionally passing $TMPDIR/machines
as argument to it.

Thanks for your help.



Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread sadfub
Jeff Squyres schrieb:

>> Hmm, I've heard about conflicts with OMPI 1.2.x and OFED 1.1 (sorry no
>> refference here),
> 
> I'm unaware of any problems with OMPI 1.2.x and OFED 1.1.  I run OFED  
> 1.1 on my cluster at Cisco and have many different versions of OMPI  
> installed (1.2, trunk, etc.).

Yes you are right, I read wrong (in the OMPI 1.2 changelog (README) OFED
1.0 isn't considered to work with OMPI 1.2. Sorry..).

>> and I've got no luck producing a working OMPI
>> installation ("mpirun --help" runs, and ./IMB-MPI compiles and runs  
>> too,
>> but "mpirun -np 2 node03,node14 IMB-MPI1" doesnt (segmentation
>> fault))...
> 
> Can you send more information on this?  See http://www.open-mpi.org/ 
> community/help/

-sh-3.00$ ompi/bin/mpirun -d -np 2 -H node03,node06 hostname
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] [0,0,0] setting up session dir with
[headnode:23178]universe default-universe-23178
[headnode:23178]user me
[headnode:23178]host headnode
[headnode:23178]jobid 0
[headnode:23178]procid 0
[headnode:23178] procdir:
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178/0/0
[headnode:23178] jobdir:
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178/0
[headnode:23178] unidir:
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178
[headnode:23178] top: openmpi-sessions-me@headnode_0
[headnode:23178] tmp: /tmp
[headnode:23178] [0,0,0] contact_file
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178/universe-setup.txt
[headnode:23178] [0,0,0] wrote setup file
[headnode:23178] *** Process received signal ***
[headnode:23178] Signal: Segmentation fault (11)
[headnode:23178] Signal code: Address not mapped (1)
[headnode:23178] Failing at address: 0x1
[headnode:23178] [ 0] /lib64/tls/libpthread.so.0 [0x39ed80c430]
[headnode:23178] [ 1] /lib64/tls/libc.so.6(strcmp+0) [0x39ecf6ff00]
[headnode:23178] [ 2]
/home/me/ompi/lib/openmpi/mca_pls_rsh.so(orte_pls_rsh_launch+0x24f)
[0x2a9723cc7f]
[headnode:23178] [ 3] /home/me/ompi/lib/openmpi/mca_rmgr_urm.so
[0x2a9764fa90]
[headnode:23178] [ 4] /home/me/ompi/bin/mpirun(orterun+0x35b) [0x402ca3]
[headnode:23178] [ 5] /home/me/ompi/bin/mpirun(main+0x1b) [0x402943]
[headnode:23178] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x39ecf1c3fb]
[headnode:23178] [ 7] /home/me/ompi/bin/mpirun [0x40289a]
[headnode:23178] *** End of error message ***
Segmentation fault


>> yes I allready read the faq, and even setting them to unlimited has
>> shown not be working. In the SGE one could specify the limits to
>> SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
>> modify > limits) But there is everything set to infinity. (Beside  
>> that,
>> the job is running with a static machinefile (is this an
>> "noninteractive" job?)) How could I test ulimits of interactive and
>> noninteractive jobs?
> 
> Launch an SGE job that calls the shell command "limit" (if you run C- 
> shell variants) or "ulimit -l" (if you run Bourne shell variants).   
> Ensure that the output is "unlimited".

I've done that allready, but how to distinguish between tight coupled
job ulimits and loose coupled job ulimits? I tested to pass
$TMPDIR/machines to a shell script which in turn delivers a "ulimit -a",
*assuming* this is considered as a tight coupled job, but each node
returned unlimited.. and without this $TMPDIR/machines too. Even the
headnode is set to unlimited.

> What are the limits of the user that launches the SGE daemons?  I.e.,  
> did the SGE daemons get started with proper "unlimited" limits?  If  
> not, that could hamper SGE's ability to set the limits that you told  

The limits in /etc/security/limits.conf apply to all users (using a
'*'), hence the SGE processes and deamons shouldn't have any limits.

> it to via qmon (remember my disclaimer: I know nothing about SGE, so  
> this is speculation).

But thanks anyway => I will post this issue to an SGE mailing list soon.
The config.log and the `ompi_info --all` is attached. Thanks again to
all of you.




logs.tbz
Description: application/bzip-compressed-tar


Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread Jeff Squyres

On Jun 22, 2007, at 10:44 AM, sad...@gmx.net wrote:


Can you send more information on this?  See http://www.open-mpi.org/
community/help/


-sh-3.00$ ompi/bin/mpirun -d -np 2 -H node03,node06 hostname
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] [0,0,0] setting up session dir with
[headnode:23178]universe default-universe-23178
[headnode:23178]user me
[headnode:23178]host headnode
[headnode:23178]jobid 0
[headnode:23178]procid 0
[headnode:23178] procdir:
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178/0/0
[headnode:23178] jobdir:
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178/0
[headnode:23178] unidir:
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178
[headnode:23178] top: openmpi-sessions-me@headnode_0
[headnode:23178] tmp: /tmp
[headnode:23178] [0,0,0] contact_file
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178/universe- 
setup.txt

[headnode:23178] [0,0,0] wrote setup file
[headnode:23178] *** Process received signal ***
[headnode:23178] Signal: Segmentation fault (11)
[headnode:23178] Signal code: Address not mapped (1)
[headnode:23178] Failing at address: 0x1
[headnode:23178] [ 0] /lib64/tls/libpthread.so.0 [0x39ed80c430]
[headnode:23178] [ 1] /lib64/tls/libc.so.6(strcmp+0) [0x39ecf6ff00]
[headnode:23178] [ 2]
/home/me/ompi/lib/openmpi/mca_pls_rsh.so(orte_pls_rsh_launch+0x24f)
[0x2a9723cc7f]
[headnode:23178] [ 3] /home/me/ompi/lib/openmpi/mca_rmgr_urm.so
[0x2a9764fa90]
[headnode:23178] [ 4] /home/me/ompi/bin/mpirun(orterun+0x35b)  
[0x402ca3]

[headnode:23178] [ 5] /home/me/ompi/bin/mpirun(main+0x1b) [0x402943]
[headnode:23178] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x39ecf1c3fb]
[headnode:23178] [ 7] /home/me/ompi/bin/mpirun [0x40289a]
[headnode:23178] *** End of error message ***
Segmentation fault


This should not happen -- this is [obviously] even before any MPI  
processing starts.  Are you inside an SGE job here?


Pak/Ralph: any ideas?


Launch an SGE job that calls the shell command "limit" (if you run C-
shell variants) or "ulimit -l" (if you run Bourne shell variants).
Ensure that the output is "unlimited".


I've done that allready, but how to distinguish between tight coupled
job ulimits and loose coupled job ulimits? I tested to pass
$TMPDIR/machines to a shell script which in turn delivers a "ulimit  
-a",

*assuming* this is considered as a tight coupled job, but each node
returned unlimited.. and without this $TMPDIR/machines too. Even the
headnode is set to unlimited.


I don't really know what this means.  People have explained "loose"  
vs. "tight" integration to me before, but since I'm not an SGE user,  
the definitions always fall away.


Based on your prior e-mail, it looks like you are always invoking  
"ulimit" via "pdsh", even under SGE jobs.  This is incorrect.  Can't  
you just submit an SGE job script that runs "ulimit"?



What are the limits of the user that launches the SGE daemons?  I.e.,
did the SGE daemons get started with proper "unlimited" limits?  If
not, that could hamper SGE's ability to set the limits that you told


The limits in /etc/security/limits.conf apply to all users (using a
'*'), hence the SGE processes and deamons shouldn't have any limits.


Not really.  limits.conf is not universally applied; it's a PAM  
entity.  So for daemons that start via /etc/init.d scripts (or  
whatever the equivalent is on your system), PAM limits are not  
necessarily applied.  For example, I had to manually insert a "ulimit  
-Hl unlimited" in the startup script for my SLURM daemons.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] PML/BTL MCA params review

2007-06-22 Thread Jeff Squyres

On Jun 20, 2007, at 8:29 AM, Jeff Squyres wrote:

1. btl_*_min_send_size is used to decide when to stop striping a  
message across multiple BTL's.  Is there a reason that we don't  
just use eager_limit for this value?  It seems weird to say "this  
message is short enough to go across 1 BTL, even though it'll take  
multiple sends if min_send_size > eager_limit".  If no one has any  
objections, we suggest eliminating this MCA parameter (!!) and  
corresponding value and just using the BTL's eager limit for this  
value (this value is set by every BTL, but only used in exactly 1  
place in OB1).


Len: please put this on the agenda for next Tuesday (just so that  
there's a deadline to ensure progress).


No one has commented on this, so I assume we'll discuss on Tuesday.  :-)

2. rdma_pipeline_offset is bad name; it is not an accurate  
description of what this value represents.  See the attached figure  
for what this value is: it is the length that is sent/received  
after the eager match before the RDMA (it happens to be at the end  
of the message, but that's irrelevant).  Specifically: it is a  
length, not an offset.  We should change this name.  Here's some  
suggestions we came up with:


rdma_pipeline_send_length (this is our favorite)


Gleb made this change in the code.  I've attached a new slide showing  
the new name.


--
Jeff Squyres
Cisco Systems



pml-btl-values.pdf
Description: Adobe PDF document