Re: [OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-10 Thread Gus Correa

Hi Craig, George, list

Here is a quick and dirty solution I used before for a similar problem.
Link the Intel libraries statically,  using the "-static-intel" flag.
Other shared libraries continue to be dynamically linked.

For instance:

mipf90 -static-intel my_mpi_program.f90

What is not clear to me is why to use orted instead of 
mpirun/mpiexec/orterun,

which has a mechanism to pass environment variables to the hosts
with "-x LD_LIBRARY_PATH=/my/intel/lib".

I hope this helps.
Gus Correa

--
-
Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
-



George Bosilca wrote:


Craig,

This is a problem with the Intel libraries and not the Open MPI ones.  
You have to somehow make these libraries available on the compute nodes.


What I usually do (but it's not the best way to solve this problem) 
is  to copy these libraries somewhere on my home area and to add the  
directory to my LD_LIBRARY_PATH.


  george.

On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote:

I am having problems launching openmpi jobs on my system.  I support  
multiple versions
of MPI and compilers using GNU Modules.  For the default compiler,  
everything is fine.

For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the following configure  options:

# module load intel/10.1
# ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort --prefix=/ 
opt/openmpi/1.2.7-intel-10.1 --without-
gridengine --enable-io-romio --with-io-romio-flags=--with-file- 
sys=nfs+ufs --with-openib=/opt/hjet/ofed/1.3.1


When I launch a job, I run the module command for the right compiler/ 
MPI version to set the paths
correctly.  Mpirun passes LD_LIBRARY_PATH to the executable I am  
launching, but not orted.


When orted is launched on the remote system, the LD_LIBRARY_PATH
doesn't come with, and the Intel 10.1 libraries can't be found.

/opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading shared  
libraries: libintlc.so.5: cannot open shared object file: No such  
file or directory


How do others solve this problem?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-10 Thread George Bosilca

Craig,

This is a problem with the Intel libraries and not the Open MPI ones.  
You have to somehow make these libraries available on the compute nodes.


What I usually do (but it's not the best way to solve this problem) is  
to copy these libraries somewhere on my home area and to add the  
directory to my LD_LIBRARY_PATH.


  george.

On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote:

I am having problems launching openmpi jobs on my system.  I support  
multiple versions
of MPI and compilers using GNU Modules.  For the default compiler,  
everything is fine.

For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the following configure  
options:


# module load intel/10.1
# ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort --prefix=/ 
opt/openmpi/1.2.7-intel-10.1 --without-
gridengine --enable-io-romio --with-io-romio-flags=--with-file- 
sys=nfs+ufs --with-openib=/opt/hjet/ofed/1.3.1


When I launch a job, I run the module command for the right compiler/ 
MPI version to set the paths
correctly.  Mpirun passes LD_LIBRARY_PATH to the executable I am  
launching, but not orted.


When orted is launched on the remote system, the LD_LIBRARY_PATH
doesn't come with, and the Intel 10.1 libraries can't be found.

/opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading shared  
libraries: libintlc.so.5: cannot open shared object file: No such  
file or directory


How do others solve this problem?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-10 Thread Craig Tierney

I am having problems launching openmpi jobs on my system.  I support multiple 
versions
of MPI and compilers using GNU Modules.  For the default compiler, everything 
is fine.
For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the following configure options:

# module load intel/10.1
# ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort 
--prefix=/opt/openmpi/1.2.7-intel-10.1 --without-
gridengine --enable-io-romio --with-io-romio-flags=--with-file-sys=nfs+ufs 
--with-openib=/opt/hjet/ofed/1.3.1

When I launch a job, I run the module command for the right compiler/MPI 
version to set the paths
correctly.  Mpirun passes LD_LIBRARY_PATH to the executable I am launching, but 
not orted.

When orted is launched on the remote system, the LD_LIBRARY_PATH
doesn't come with, and the Intel 10.1 libraries can't be found.

/opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading shared libraries: libintlc.so.5: cannot open shared object file: No 
such file or directory


How do others solve this problem?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)


Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-10 Thread George Bosilca


On Oct 10, 2008, at 12:42 PM, V. Ram wrote:

Can anyone else suggest why the code might be crashing when running  
over

ethernet and not over shared memory?  Any suggestions on how to debug
this or interpret the error message issued from btl_tcp_frag.c ?


Unfortunately this is a standard error message which do not enlighten  
us on what the real error is/was. It simply state that one node failed  
to read data from a socket, which usually happens when the remote peer  
died unexpectedly (such as a seg-fault).


  george.



Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-10 Thread Brian Dobbins
Hi guys,

On Fri, Oct 10, 2008 at 12:57 PM, Brock Palen  wrote:

> Actually I had a much differnt results,
>
> gromacs-3.3.1  one node dual core dual socket opt2218  openmpi-1.2.7
>  pgi/7.2
> mpich2 gcc
>

   For some reason, the difference in minutes didn't come through, it seems,
but I would guess that if it's a medium-large difference, then it has its
roots in PGI7.2 vs. GCC rather than MPICH2 vs. OpenMPI.  Though, to be fair,
I find GCC vs. PGI (for C code) is often a toss-up - one may beat the other
handily on one code, and then lose just as badly on another.

I think my install of mpich2 may be bad, I have never installed it before,
>  only mpich1, OpenMPI and LAM. So take my mpich2 numbers with salt, Lots of
> salt.


  I think the biggest difference in performance with various MPICH2 install
comes from differences in the 'channel' used..  I tend to make sure that I
use the 'nemesis' channel, which may or may not be the default these days.
If not, though, most people would probably want it.  I think it has issues
with threading (or did ages ago?), but I seem to recall it being
considerably faster than even the 'ssm' channel.

  Sangamesh:  My advice to you would be to recompile Gromacs and specify, in
the *Gromacs* compile / configure, to use the same CFLAGS you used with
MPICH2.  Eg, "-O2 -m64", whatever.  If you do that, I bet the times between
MPICH2 and OpenMPI will be pretty comparable for your benchmark case -
especially when run on a single processor.

  Cheers,
  - Brian


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-10 Thread Brock Palen

Whoops didn't include the mpich2 numbers,

20M mpich2  same node,

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 10, 2008, at 12:57 PM, Brock Palen wrote:


Actually I had a much differnt results,

gromacs-3.3.1  one node dual core dual socket opt2218   
openmpi-1.2.7  pgi/7.2

mpich2 gcc

19M OpenMPI
M  Mpich2

So for me OpenMPI+pgi was faster, I don't know how you got such a  
low mpich2 number.


On the other hand if you do this preprocess before you run:

grompp -sort -shuffle -np 4
mdrun -v

With -sort and -shuffle  the OpenMPI run time went down,

12M OpenMPI + sort shuffle

I think my install of mpich2 may be bad, I have never installed it  
before,  only mpich1, OpenMPI and LAM. So take my mpich2 numbers  
with salt, Lots of salt.


On that point though -sort -shuffle may be useful for you, be sure  
to understand what they do before you use them.

Read:
http://cac.engin.umich.edu/resources/software/gromacs.html

Last,  make sure that your using the single precision version of  
gromacs for both runs.  the double is about half the speed of the  
single.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 10, 2008, at 1:15 AM, Sangamesh B wrote:




On Thu, Oct 9, 2008 at 7:30 PM, Brock Palen  wrote:
Which benchmark did you use?

Out of 4 benchmarks I used d.dppc benchmark.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 9, 2008, at 8:06 AM, Sangamesh B wrote:



On Thu, Oct 9, 2008 at 5:40 AM, Jeff Squyres   
wrote:

On Oct 8, 2008, at 5:25 PM, Aurélien Bouteiller wrote:

Make sure you don't use a "debug" build of Open MPI. If you use  
trunk, the build system detects it and turns on debug by default.  
It really kills performance. --disable-debug will remove all those  
nasty printfs from the critical path.


You can easily tell if you have a debug build of OMPI with the  
ompi_info command:


shell$ ompi_info | grep debug
 Internal debug support: no
Memory debugging support: no
shell$
Yes. It is "no"
$ /opt/ompi127/bin/ompi_info -all | grep debug
 Internal debug support: no
Memory debugging support: no

I've tested GROMACS for a single process (mpirun -np 1):
Here are the results:

OpenMPI : 120m 6s

MPICH2 :  67m 44s

I'm trying to bulid the codes with PGI, but facing problem with  
compilation of GROMACS.


You want to see "no" for both of those.

--
Jeff Squyres
Cisco Systems



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-10 Thread Brock Palen

Actually I had a much differnt results,

gromacs-3.3.1  one node dual core dual socket opt2218  openmpi-1.2.7   
pgi/7.2

mpich2 gcc

19M OpenMPI
M  Mpich2

So for me OpenMPI+pgi was faster, I don't know how you got such a low  
mpich2 number.


On the other hand if you do this preprocess before you run:

grompp -sort -shuffle -np 4
mdrun -v

With -sort and -shuffle  the OpenMPI run time went down,

12M OpenMPI + sort shuffle

I think my install of mpich2 may be bad, I have never installed it  
before,  only mpich1, OpenMPI and LAM. So take my mpich2 numbers with  
salt, Lots of salt.


On that point though -sort -shuffle may be useful for you, be sure to  
understand what they do before you use them.

Read:
http://cac.engin.umich.edu/resources/software/gromacs.html

Last,  make sure that your using the single precision version of  
gromacs for both runs.  the double is about half the speed of the  
single.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 10, 2008, at 1:15 AM, Sangamesh B wrote:




On Thu, Oct 9, 2008 at 7:30 PM, Brock Palen  wrote:
Which benchmark did you use?

Out of 4 benchmarks I used d.dppc benchmark.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 9, 2008, at 8:06 AM, Sangamesh B wrote:



On Thu, Oct 9, 2008 at 5:40 AM, Jeff Squyres   
wrote:

On Oct 8, 2008, at 5:25 PM, Aurélien Bouteiller wrote:

Make sure you don't use a "debug" build of Open MPI. If you use  
trunk, the build system detects it and turns on debug by default.  
It really kills performance. --disable-debug will remove all those  
nasty printfs from the critical path.


You can easily tell if you have a debug build of OMPI with the  
ompi_info command:


shell$ ompi_info | grep debug
 Internal debug support: no
Memory debugging support: no
shell$
Yes. It is "no"
$ /opt/ompi127/bin/ompi_info -all | grep debug
 Internal debug support: no
Memory debugging support: no

I've tested GROMACS for a single process (mpirun -np 1):
Here are the results:

OpenMPI : 120m 6s

MPICH2 :  67m 44s

I'm trying to bulid the codes with PGI, but facing problem with  
compilation of GROMACS.


You want to see "no" for both of those.

--
Jeff Squyres
Cisco Systems



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-10 Thread V. Ram
Leonardo,

These nodes are all using intel e1000 chips.  As the nodes are AMD
K7-based, these are the older chips, not the new ones with all the
eeprom issues with the newer kernel.

The kernel in use is from the 2.6.22 family, and the e1000 driver is the
one shipped with the kernel.  I am running it compiled into the kernel,
not as a module.

When testing using the intel MPI Benchmarks, I found that increasing the
receive ring buffer size to the max (4096) helped performance, so I use
ethtool -G on startup.

Checking ethtool -k, I see that tcp segment offload is on.  I can try
turning that off to see what happens.

Oddly, on 64-bit nodes using the tg3 driver, this code doesn't crash or
have these same issues, and I'm not having to turn off tso.

Can anyone else suggest why the code might be crashing when running over
ethernet and not over shared memory?  Any suggestions on how to debug
this or interpret the error message issued from btl_tcp_frag.c ?

Thanks.


On Wed, 01 Oct 2008 18:11:34 +0200, "Leonardo Fialho"
 said:
> Ram,
> 
> What is the name and version of the kernel module for your NIC? I have 
> experimented some similar with my tg3 module. The error which appeared 
> for my was different:
> 
> [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv 
> failed: No route to host (113)
> 
> I solved it changing the following parameter in the linux kernel:
> 
> /sbin/ethtool -K eth0 tso off
> 
> Leonardo
> 
> 
> Aurélien Bouteiller escribió:
> > If you have several network cards in your system, it can sometime get 
> > the endpoints confused. Especially if you don't have the same number 
> > of cards or don't use the same subnet for all "eth0, eth1". You should 
> > try to restrict Open MPI to use only one of the available networks by 
> > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x 
> > is the network interface that is always connected to the same logical 
> > and physical network on your machine.
> >
> > Aurelien
> >
> > Le 1 oct. 08 à 11:47, V. Ram a écrit :
> >
> >> I wrote earlier about one of my users running a third-party Fortran code
> >> on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash
> >> behavior.
> >>
> >> Our cluster's nodes all have 2 single-core processors.  If this code is
> >> run on 2 processors on 1 node, it runs seemingly fine.  However, if the
> >> job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then
> >> it crashes and gives messages like:
> >>
> >> [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> >> [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> >> mca_btl_tcp_frag_recv: readv failed with errno=110
> >> mca_btl_tcp_frag_recv: readv failed with errno=104
> >>
> >> Essentially, if any network communication is involved, the job crashes
> >> in this form.
> >>
> >> I do have another user that runs his own MPI code on 10+ of these
> >> processors for days at a time without issue, so I don't think it's
> >> hardware.
> >>
> >> The original code also runs fine across many networked nodes if the
> >> architecture is x86-64 (also running OMPI 1.2.7).
> >>
> >> We have also tried different Fortran compilers (both PathScale and
> >> gfortran) and keep getting these crashes.
> >>
> >> Are there any suggestions on how to figure out if it's a problem with
> >> the code or the OMPI installation/software on the system? We have tried
> >> "--debug-daemons" with no new/interesting information being revealed.
> >> Is there a way to trap segfault messages or more detailed MPI
> >> transaction information or anything else that could help diagnose this?
> >>
> >> Thanks.
> >> -- 
> >>  V. Ram
> >>  v_r_...@fastmail.fm
> >>
> >> -- 
> >> http://www.fastmail.fm - Same, same, but different...
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Leonardo Fialho
> Computer Architecture and Operating Systems Department - CAOS
> Universidad Autonoma de Barcelona - UAB
> ETSE, Edifcio Q, QC/3088
> http://www.caos.uab.es
> Phone: +34-93-581-2888
> Fax: +34-93-581-2478
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
-- 
  V. Ram
  v_r_...@fastmail.fm

-- 
http://www.fastmail.fm - Faster than the air-speed velocity of an
  unladen european swallow




Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-10 Thread V. Ram
Sorry for replying to this so late, but I have been away.  Reply
below...

On Wed, 1 Oct 2008 11:58:30 -0400, "Aurélien Bouteiller"
 said:
> If you have several network cards in your system, it can sometime get  
> the endpoints confused. Especially if you don't have the same number  
> of cards or don't use the same subnet for all "eth0, eth1". You should  
> try to restrict Open MPI to use only one of the available networks by  
> using the --mca btl_tcp_if_include ethx parameter to mpirun, where x  
> is the network interface that is always connected to the same logical  
> and physical network on your machine.

I was pretty sure this wasn't the problem since basically all the nodes
only have one interface configured, but I had the user try the --mca
btl_tcp_if_include parameter.  The same result / crash occurred.

> 
> Aurelien
> 
> Le 1 oct. 08 à 11:47, V. Ram a écrit :
> 
> > I wrote earlier about one of my users running a third-party Fortran  
> > code
> > on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd  
> > crash
> > behavior.
> >
> > Our cluster's nodes all have 2 single-core processors.  If this code  
> > is
> > run on 2 processors on 1 node, it runs seemingly fine.  However, if  
> > the
> > job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode),  
> > then
> > it crashes and gives messages like:
> >
> > [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> > [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> > mca_btl_tcp_frag_recv: readv failed with errno=110
> > mca_btl_tcp_frag_recv: readv failed with errno=104
> >
> > Essentially, if any network communication is involved, the job crashes
> > in this form.
> >
> > I do have another user that runs his own MPI code on 10+ of these
> > processors for days at a time without issue, so I don't think it's
> > hardware.
> >
> > The original code also runs fine across many networked nodes if the
> > architecture is x86-64 (also running OMPI 1.2.7).
> >
> > We have also tried different Fortran compilers (both PathScale and
> > gfortran) and keep getting these crashes.
> >
> > Are there any suggestions on how to figure out if it's a problem with
> > the code or the OMPI installation/software on the system? We have  
> > tried
> > "--debug-daemons" with no new/interesting information being revealed.
> > Is there a way to trap segfault messages or more detailed MPI
> > transaction information or anything else that could help diagnose  
> > this?
> >
> > Thanks.
> > -- 
> >  V. Ram
> >  v_r_...@fastmail.fm
-- 
  V. Ram
  v_r_...@fastmail.fm

-- 
http://www.fastmail.fm - A no graphics, no pop-ups email service




[OMPI users] where is opal_install_dirs?

2008-10-10 Thread SLIM H.A.
I tried building Global Arrays with OpenMPI 1.2.3 and the portland
compilers 7.0.2. It gives  an error message about an undefined symbol
"opal_install_dirs":

mpif90 -O -i8  -c -o dgetf2.o dgetf2.f
mpif90: symbol lookup error: mpif90: undefined symbol: opal_install_dirs
make[1]: *** [dgetf2.o] Error 127

Does anyone have any idea what the problem could be? If I use pgf90
instead of the mpi wrapper the error does not occur, so something is
missing there.

Thanks

Henk



[OMPI users] build failed using intel compilers on mac os

2008-10-10 Thread Warner Yuen
If using the Intel v10.1.x compilers to build a 64-bit version, by  
default (default installation), Intel invokes the 64-bit compiler. But  
yes, you can use the "-m64" flag as well.



Warner Yuen
Scientific Computing
Consulting Engineer
Apple Computer
email: wy...@apple.com
Tel: 408.718.2859



On Oct 9, 2008, at 10:15 PM, users-requ...@open-mpi.org wrote:


Message: 2
Date: Thu, 9 Oct 2008 17:28:38 -0400
From: Jeff Squyres 
Subject: Re: [OMPI users] build failed using intel compilers on mac os
x
To: Open MPI Users 
Message-ID: <897c21db-cb73-430c-b306-8e492b247...@cisco.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

The CXX compiler should be icpc, not icc.


On Oct 7, 2008, at 11:08 AM, Massimo Cafaro wrote:




Dear all,

I tried to build the latest v1.2.7 open-mpi version on Mac OS X
10.5.5 using the intel c, c++ and fortran compilers v10.1.017 (the
latest ones released by intel). Before starting the build I have
properly configured the CC, CXX, F77 and FC environment variables
(to icc and ifort). The build failed due to undefined symbols.

I am attaching a log of the failed build process.
Any clue? Am I doing something wrong?

Also, to build a 64 bit version it is enough to supply in the
corresponding environment variables the -m64 option ?
Thank you in advance and best regards,

Massimo






[OMPI users] OPENMPI 1.2.7 & PGI compilers: configure option --disable-ptmalloc2-opt-sbrk

2008-10-10 Thread Francesco Iannone
Dear openmpi users

I have compiled oenmpi.1.2.7 with PGI 7.1-4 compilers with configure option
³--disable-ptmalloc2-opt-sbrk² , to fix a segmentation fault in sysMALLOC
function of ³opal/mca/memory/ptmalloc2/malloc.c².

Anybody knows what it means to compile with this option ?

thanks 


Dr. Francesco Iannone
Associazione EURATOM-ENEA sulla Fusione
C.R. ENEA Frascati
Via E. Fermi 45
00044 Frascati (Roma) Italy
phone 00-39-06-9400-5124
fax 00-39-06-9400-5524
mailto:francesco.iann...@frascati.enea.it
http://www.afs.enea.it/iannone



Re: [OMPI users] Problem launching onto Bourne shell

2008-10-10 Thread Hahn Kim

Great, I look forward to 1.2.8!

Hahn

On Oct 9, 2008, at 2:32 PM, Hahn Kim wrote:


FWIW, the fix has been pushed into the trunk, 1.2.8, and 1.3 SVN
branches.  So I'll probably take down the hg tree (we use those as
temporary branches).

On Oct 9, 2008, at 2:32 PM, Hahn Kim wrote:


Hi,

Thanks for providing a fix, sorry for the delay in response.  Once I
found out about -x, I've been busy working on the rest of our code,
so I haven't had the time to try out the fix.  I'll take a look at
it soon as I can and will let you know how it works out.

Hahn

On Oct 7, 2008, at 5:41 PM, Jeff Squyres wrote:


On Oct 7, 2008, at 4:19 PM, Hahn Kim wrote:

you probably want to set the LD_LIBRARY_PATH (and PATH, likely,  
and

possibly others, such as that LICENSE key, etc.) regardless of
whether it's an interactive or non-interactive login.


Right, that's exactly what I want to do.  I was hoping that mpirun
would run .profile as the FAQ page stated, but the -x fix works for
now.


If you're using Bash, it should be running .bashrc.  But it looks
like
you did identify a bug that we're *not* running .profile.  I have a
Mercurial branch up with a fix if you want to give it a spin:

  http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/sh-profile-fixes/


I just realized that I'm using .bash_profile on the x86 and need to
move its contents into .bashrc and call .bashrc from .bash_profile,
since eventually I will also be launching MPI jobs onto other x86
processors.

Thanks to everyone for their help.

Hahn

On Oct 7, 2008, at 2:16 PM, Jeff Squyres wrote:


On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote:


Regarding 1., we're actually using 1.2.5.  We started using Open
MPI
last winter and just stuck with it.  For now, using the -x flag
with
mpirun works.  If this really is a bug in 1.2.7, then I think
we'll
stick with 1.2.5 for now, then upgrade later when it's fixed.


It looks like this behavior has been the same throughout the  
entire

1.2 series.

Regarding 2., are you saying I should run the commands you  
suggest

from the x86 node running bash, so that ssh logs into the Cell
node
running Bourne?


I'm saying that if "ssh othernode env" gives different answers  
than
"ssh othernode"/"env", then your .bashrc or .profile or whatever  
is

dumping out early depending on whether you have an interactive
login
or not.  This is the real cause of the error -- you probably want
to
set the LD_LIBRARY_PATH (and PATH, likely, and possibly others,
such
as that LICENSE key, etc.) regardless of whether it's an
interactive
or non-interactive login.



When I run "ssh othernode env" from the x86 node, I get the
following vanilla environment:

USER=ha17646
HOME=/home/ha17646
LOGNAME=ha17646
SHELL=/bin/sh
PWD=/home/ha17646

When I run "ssh othernode" from the x86 node, then run "env" on
the
Cell, I get the following:

USER=ha17646
LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
HOME=/home/ha17646
MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key
LOGNAME=ha17646
TERM=xterm-color
PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/ 
bin:/

tools/cmake-2.4.7/bin:/tools
SHELL=/bin/sh
PWD=/home/ha17646
TZ=EST5EDT

Hahn

On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote:


Ralph and I just talked about this a bit:

1. In all released versions of OMPI, we *do* source the .profile
file
on the target node if it exists (because vanilla Bourne shells  
do

not
source anything on remote nodes -- Bash does, though, per the
FAQ).
However, looking in 1.2.7, it looks like it might not be
executing
that code -- there *may* be a bug in this area.  We're checking
into it.

2. You might want to check your configuration to see if
your .bashrc
is dumping out early because it's a non-interactive shell.   
Check

the
output of:

ssh othernode env
vs.
ssh othernode
env

(i.e., a non-interactive running of "env" vs. an interactive
login
and
running "env")



On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote:


I am unaware of anything in the code that would
"source .profile"
for you. I believe the FAQ page is in error here.

Ralph

On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote:

Great, that worked, thanks!  However, it still concerns me  
that

the
FAQ page says that mpirun will execute .profile which doesn't
seem
to work for me.  Are there any configuration issues that could
possibly be preventing mpirun from doing this?  It would
certainly
be more convenient if I could maintain my environment in a
single .profile file instead of adding what could potentially
be a
lot of -x arguments to my mpirun command.

Hahn

On Oct 6, 2008, at 5:44 PM, Aurélien Bouteiller wrote:


tYou can forward your local env with mpirun -x
LD_LIBRARY_PATH. As
an
alternative you can set specific values with mpirun -x
LD_LIBRARY_PATH=/some/where:/some/where/else . More
information
with
mpirun --help (or man mpirun).

Aurelien



Le 6 oct. 08 à 16:06, Hahn Kim a écrit :


Hi,

I'm having difficulty launching an Open MPI job onto a
machine
that
is running the Bourne shell.

Here's my 

Re: [OMPI users] build failed using intel compilers on mac os x

2008-10-10 Thread Massimo Cafaro

Thank you very much.

I am going to build again, using the new settings, as suggested.
Best regards,

Massimo

On Oct 9, 2008, at 11:28 PM, Jeff Squyres wrote:


The CXX compiler should be icpc, not icc.


On Oct 7, 2008, at 11:08 AM, Massimo Cafaro wrote:




Dear all,

I tried to build the latest v1.2.7 open-mpi version on Mac OS X  
10.5.5 using the intel c, c++ and fortran compilers v10.1.017 (the  
latest ones released by intel). Before starting the build I have  
properly configured the CC, CXX, F77 and FC environment variables  
(to icc and ifort). The build failed due to undefined symbols.


I am attaching a log of the failed build process.
Any clue? Am I doing something wrong?

Also, to build a 64 bit version it is enough to supply in the  
corresponding environment variables the -m64 option ?

Thank you in advance and best regards,

Massimo


--

***

Massimo Cafaro, Ph.D.  Additional  
affiliations:
Assistant Professor National  
Nanotechnology Laboratory (NNL/CNR-INFM)
Dept. of Engineering for Innovation Euro-Mediterranean  
Centre for Climate Change

University of Salento, Lecce, ItalySPACI Consortium
Via per Monteroni
73100 Lecce, Italy
Voice  +39 0832 297371
Fax +39 0832 298173
Web http://sara.unile.it/~cafaro
E-mail massimo.caf...@unile.it
caf...@cacr.caltech.edu

***







--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--

***

 Massimo Cafaro, Ph.D.  Additional  
affiliations:
 Assistant Professor National  
Nanotechnology Laboratory (NNL/CNR-INFM)
 Dept. of Engineering for Innovation Euro-Mediterranean  
Centre for Climate Change

 University of Salento, Lecce, ItalySPACI Consortium
 Via per Monteroni
 73100 Lecce, Italy
 Voice  +39 0832 297371
 Fax +39 0832 298173
 Web http://sara.unile.it/~cafaro
 E-mail massimo.caf...@unile.it
  caf...@cacr.caltech.edu

***