Re: [OMPI users] Passing LD_LIBRARY_PATH to orted
Hi Craig, George, list Here is a quick and dirty solution I used before for a similar problem. Link the Intel libraries statically, using the "-static-intel" flag. Other shared libraries continue to be dynamically linked. For instance: mipf90 -static-intel my_mpi_program.f90 What is not clear to me is why to use orted instead of mpirun/mpiexec/orterun, which has a mechanism to pass environment variables to the hosts with "-x LD_LIBRARY_PATH=/my/intel/lib". I hope this helps. Gus Correa -- - Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA - George Bosilca wrote: Craig, This is a problem with the Intel libraries and not the Open MPI ones. You have to somehow make these libraries available on the compute nodes. What I usually do (but it's not the best way to solve this problem) is to copy these libraries somewhere on my home area and to add the directory to my LD_LIBRARY_PATH. george. On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote: I am having problems launching openmpi jobs on my system. I support multiple versions of MPI and compilers using GNU Modules. For the default compiler, everything is fine. For non-default, I am having problems. I built Openmpi-1.2.6 (and 1.2.7) with the following configure options: # module load intel/10.1 # ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort --prefix=/ opt/openmpi/1.2.7-intel-10.1 --without- gridengine --enable-io-romio --with-io-romio-flags=--with-file- sys=nfs+ufs --with-openib=/opt/hjet/ofed/1.3.1 When I launch a job, I run the module command for the right compiler/ MPI version to set the paths correctly. Mpirun passes LD_LIBRARY_PATH to the executable I am launching, but not orted. When orted is launched on the remote system, the LD_LIBRARY_PATH doesn't come with, and the Intel 10.1 libraries can't be found. /opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading shared libraries: libintlc.so.5: cannot open shared object file: No such file or directory How do others solve this problem? Thanks, Craig -- Craig Tierney (craig.tier...@noaa.gov) ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Passing LD_LIBRARY_PATH to orted
Craig, This is a problem with the Intel libraries and not the Open MPI ones. You have to somehow make these libraries available on the compute nodes. What I usually do (but it's not the best way to solve this problem) is to copy these libraries somewhere on my home area and to add the directory to my LD_LIBRARY_PATH. george. On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote: I am having problems launching openmpi jobs on my system. I support multiple versions of MPI and compilers using GNU Modules. For the default compiler, everything is fine. For non-default, I am having problems. I built Openmpi-1.2.6 (and 1.2.7) with the following configure options: # module load intel/10.1 # ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort --prefix=/ opt/openmpi/1.2.7-intel-10.1 --without- gridengine --enable-io-romio --with-io-romio-flags=--with-file- sys=nfs+ufs --with-openib=/opt/hjet/ofed/1.3.1 When I launch a job, I run the module command for the right compiler/ MPI version to set the paths correctly. Mpirun passes LD_LIBRARY_PATH to the executable I am launching, but not orted. When orted is launched on the remote system, the LD_LIBRARY_PATH doesn't come with, and the Intel 10.1 libraries can't be found. /opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading shared libraries: libintlc.so.5: cannot open shared object file: No such file or directory How do others solve this problem? Thanks, Craig -- Craig Tierney (craig.tier...@noaa.gov) ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Passing LD_LIBRARY_PATH to orted
I am having problems launching openmpi jobs on my system. I support multiple versions of MPI and compilers using GNU Modules. For the default compiler, everything is fine. For non-default, I am having problems. I built Openmpi-1.2.6 (and 1.2.7) with the following configure options: # module load intel/10.1 # ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort --prefix=/opt/openmpi/1.2.7-intel-10.1 --without- gridengine --enable-io-romio --with-io-romio-flags=--with-file-sys=nfs+ufs --with-openib=/opt/hjet/ofed/1.3.1 When I launch a job, I run the module command for the right compiler/MPI version to set the paths correctly. Mpirun passes LD_LIBRARY_PATH to the executable I am launching, but not orted. When orted is launched on the remote system, the LD_LIBRARY_PATH doesn't come with, and the Intel 10.1 libraries can't be found. /opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading shared libraries: libintlc.so.5: cannot open shared object file: No such file or directory How do others solve this problem? Thanks, Craig -- Craig Tierney (craig.tier...@noaa.gov)
Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
On Oct 10, 2008, at 12:42 PM, V. Ram wrote: Can anyone else suggest why the code might be crashing when running over ethernet and not over shared memory? Any suggestions on how to debug this or interpret the error message issued from btl_tcp_frag.c ? Unfortunately this is a standard error message which do not enlighten us on what the real error is/was. It simply state that one node failed to read data from a socket, which usually happens when the remote peer died unexpectedly (such as a seg-fault). george.
Re: [OMPI users] Performance: MPICH2 vs OpenMPI
Hi guys, On Fri, Oct 10, 2008 at 12:57 PM, Brock Palenwrote: > Actually I had a much differnt results, > > gromacs-3.3.1 one node dual core dual socket opt2218 openmpi-1.2.7 > pgi/7.2 > mpich2 gcc > For some reason, the difference in minutes didn't come through, it seems, but I would guess that if it's a medium-large difference, then it has its roots in PGI7.2 vs. GCC rather than MPICH2 vs. OpenMPI. Though, to be fair, I find GCC vs. PGI (for C code) is often a toss-up - one may beat the other handily on one code, and then lose just as badly on another. I think my install of mpich2 may be bad, I have never installed it before, > only mpich1, OpenMPI and LAM. So take my mpich2 numbers with salt, Lots of > salt. I think the biggest difference in performance with various MPICH2 install comes from differences in the 'channel' used.. I tend to make sure that I use the 'nemesis' channel, which may or may not be the default these days. If not, though, most people would probably want it. I think it has issues with threading (or did ages ago?), but I seem to recall it being considerably faster than even the 'ssm' channel. Sangamesh: My advice to you would be to recompile Gromacs and specify, in the *Gromacs* compile / configure, to use the same CFLAGS you used with MPICH2. Eg, "-O2 -m64", whatever. If you do that, I bet the times between MPICH2 and OpenMPI will be pretty comparable for your benchmark case - especially when run on a single processor. Cheers, - Brian
Re: [OMPI users] Performance: MPICH2 vs OpenMPI
Whoops didn't include the mpich2 numbers, 20M mpich2 same node, Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Oct 10, 2008, at 12:57 PM, Brock Palen wrote: Actually I had a much differnt results, gromacs-3.3.1 one node dual core dual socket opt2218 openmpi-1.2.7 pgi/7.2 mpich2 gcc 19M OpenMPI M Mpich2 So for me OpenMPI+pgi was faster, I don't know how you got such a low mpich2 number. On the other hand if you do this preprocess before you run: grompp -sort -shuffle -np 4 mdrun -v With -sort and -shuffle the OpenMPI run time went down, 12M OpenMPI + sort shuffle I think my install of mpich2 may be bad, I have never installed it before, only mpich1, OpenMPI and LAM. So take my mpich2 numbers with salt, Lots of salt. On that point though -sort -shuffle may be useful for you, be sure to understand what they do before you use them. Read: http://cac.engin.umich.edu/resources/software/gromacs.html Last, make sure that your using the single precision version of gromacs for both runs. the double is about half the speed of the single. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Oct 10, 2008, at 1:15 AM, Sangamesh B wrote: On Thu, Oct 9, 2008 at 7:30 PM, Brock Palenwrote: Which benchmark did you use? Out of 4 benchmarks I used d.dppc benchmark. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Oct 9, 2008, at 8:06 AM, Sangamesh B wrote: On Thu, Oct 9, 2008 at 5:40 AM, Jeff Squyres wrote: On Oct 8, 2008, at 5:25 PM, Aurélien Bouteiller wrote: Make sure you don't use a "debug" build of Open MPI. If you use trunk, the build system detects it and turns on debug by default. It really kills performance. --disable-debug will remove all those nasty printfs from the critical path. You can easily tell if you have a debug build of OMPI with the ompi_info command: shell$ ompi_info | grep debug Internal debug support: no Memory debugging support: no shell$ Yes. It is "no" $ /opt/ompi127/bin/ompi_info -all | grep debug Internal debug support: no Memory debugging support: no I've tested GROMACS for a single process (mpirun -np 1): Here are the results: OpenMPI : 120m 6s MPICH2 : 67m 44s I'm trying to bulid the codes with PGI, but facing problem with compilation of GROMACS. You want to see "no" for both of those. -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Performance: MPICH2 vs OpenMPI
Actually I had a much differnt results, gromacs-3.3.1 one node dual core dual socket opt2218 openmpi-1.2.7 pgi/7.2 mpich2 gcc 19M OpenMPI M Mpich2 So for me OpenMPI+pgi was faster, I don't know how you got such a low mpich2 number. On the other hand if you do this preprocess before you run: grompp -sort -shuffle -np 4 mdrun -v With -sort and -shuffle the OpenMPI run time went down, 12M OpenMPI + sort shuffle I think my install of mpich2 may be bad, I have never installed it before, only mpich1, OpenMPI and LAM. So take my mpich2 numbers with salt, Lots of salt. On that point though -sort -shuffle may be useful for you, be sure to understand what they do before you use them. Read: http://cac.engin.umich.edu/resources/software/gromacs.html Last, make sure that your using the single precision version of gromacs for both runs. the double is about half the speed of the single. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Oct 10, 2008, at 1:15 AM, Sangamesh B wrote: On Thu, Oct 9, 2008 at 7:30 PM, Brock Palenwrote: Which benchmark did you use? Out of 4 benchmarks I used d.dppc benchmark. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Oct 9, 2008, at 8:06 AM, Sangamesh B wrote: On Thu, Oct 9, 2008 at 5:40 AM, Jeff Squyres wrote: On Oct 8, 2008, at 5:25 PM, Aurélien Bouteiller wrote: Make sure you don't use a "debug" build of Open MPI. If you use trunk, the build system detects it and turns on debug by default. It really kills performance. --disable-debug will remove all those nasty printfs from the critical path. You can easily tell if you have a debug build of OMPI with the ompi_info command: shell$ ompi_info | grep debug Internal debug support: no Memory debugging support: no shell$ Yes. It is "no" $ /opt/ompi127/bin/ompi_info -all | grep debug Internal debug support: no Memory debugging support: no I've tested GROMACS for a single process (mpirun -np 1): Here are the results: OpenMPI : 120m 6s MPICH2 : 67m 44s I'm trying to bulid the codes with PGI, but facing problem with compilation of GROMACS. You want to see "no" for both of those. -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
Leonardo, These nodes are all using intel e1000 chips. As the nodes are AMD K7-based, these are the older chips, not the new ones with all the eeprom issues with the newer kernel. The kernel in use is from the 2.6.22 family, and the e1000 driver is the one shipped with the kernel. I am running it compiled into the kernel, not as a module. When testing using the intel MPI Benchmarks, I found that increasing the receive ring buffer size to the max (4096) helped performance, so I use ethtool -G on startup. Checking ethtool -k, I see that tcp segment offload is on. I can try turning that off to see what happens. Oddly, on 64-bit nodes using the tg3 driver, this code doesn't crash or have these same issues, and I'm not having to turn off tso. Can anyone else suggest why the code might be crashing when running over ethernet and not over shared memory? Any suggestions on how to debug this or interpret the error message issued from btl_tcp_frag.c ? Thanks. On Wed, 01 Oct 2008 18:11:34 +0200, "Leonardo Fialho"said: > Ram, > > What is the name and version of the kernel module for your NIC? I have > experimented some similar with my tg3 module. The error which appeared > for my was different: > > [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv > failed: No route to host (113) > > I solved it changing the following parameter in the linux kernel: > > /sbin/ethtool -K eth0 tso off > > Leonardo > > > Aurélien Bouteiller escribió: > > If you have several network cards in your system, it can sometime get > > the endpoints confused. Especially if you don't have the same number > > of cards or don't use the same subnet for all "eth0, eth1". You should > > try to restrict Open MPI to use only one of the available networks by > > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x > > is the network interface that is always connected to the same logical > > and physical network on your machine. > > > > Aurelien > > > > Le 1 oct. 08 à 11:47, V. Ram a écrit : > > > >> I wrote earlier about one of my users running a third-party Fortran code > >> on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash > >> behavior. > >> > >> Our cluster's nodes all have 2 single-core processors. If this code is > >> run on 2 processors on 1 node, it runs seemingly fine. However, if the > >> job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then > >> it crashes and gives messages like: > >> > >> [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > >> [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > >> mca_btl_tcp_frag_recv: readv failed with errno=110 > >> mca_btl_tcp_frag_recv: readv failed with errno=104 > >> > >> Essentially, if any network communication is involved, the job crashes > >> in this form. > >> > >> I do have another user that runs his own MPI code on 10+ of these > >> processors for days at a time without issue, so I don't think it's > >> hardware. > >> > >> The original code also runs fine across many networked nodes if the > >> architecture is x86-64 (also running OMPI 1.2.7). > >> > >> We have also tried different Fortran compilers (both PathScale and > >> gfortran) and keep getting these crashes. > >> > >> Are there any suggestions on how to figure out if it's a problem with > >> the code or the OMPI installation/software on the system? We have tried > >> "--debug-daemons" with no new/interesting information being revealed. > >> Is there a way to trap segfault messages or more detailed MPI > >> transaction information or anything else that could help diagnose this? > >> > >> Thanks. > >> -- > >> V. Ram > >> v_r_...@fastmail.fm > >> > >> -- > >> http://www.fastmail.fm - Same, same, but different... > >> > >> ___ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Leonardo Fialho > Computer Architecture and Operating Systems Department - CAOS > Universidad Autonoma de Barcelona - UAB > ETSE, Edifcio Q, QC/3088 > http://www.caos.uab.es > Phone: +34-93-581-2888 > Fax: +34-93-581-2478 > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- V. Ram v_r_...@fastmail.fm -- http://www.fastmail.fm - Faster than the air-speed velocity of an unladen european swallow
Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
Sorry for replying to this so late, but I have been away. Reply below... On Wed, 1 Oct 2008 11:58:30 -0400, "Aurélien Bouteiller"said: > If you have several network cards in your system, it can sometime get > the endpoints confused. Especially if you don't have the same number > of cards or don't use the same subnet for all "eth0, eth1". You should > try to restrict Open MPI to use only one of the available networks by > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x > is the network interface that is always connected to the same logical > and physical network on your machine. I was pretty sure this wasn't the problem since basically all the nodes only have one interface configured, but I had the user try the --mca btl_tcp_if_include parameter. The same result / crash occurred. > > Aurelien > > Le 1 oct. 08 à 11:47, V. Ram a écrit : > > > I wrote earlier about one of my users running a third-party Fortran > > code > > on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd > > crash > > behavior. > > > > Our cluster's nodes all have 2 single-core processors. If this code > > is > > run on 2 processors on 1 node, it runs seemingly fine. However, if > > the > > job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), > > then > > it crashes and gives messages like: > > > > [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > > [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > > mca_btl_tcp_frag_recv: readv failed with errno=110 > > mca_btl_tcp_frag_recv: readv failed with errno=104 > > > > Essentially, if any network communication is involved, the job crashes > > in this form. > > > > I do have another user that runs his own MPI code on 10+ of these > > processors for days at a time without issue, so I don't think it's > > hardware. > > > > The original code also runs fine across many networked nodes if the > > architecture is x86-64 (also running OMPI 1.2.7). > > > > We have also tried different Fortran compilers (both PathScale and > > gfortran) and keep getting these crashes. > > > > Are there any suggestions on how to figure out if it's a problem with > > the code or the OMPI installation/software on the system? We have > > tried > > "--debug-daemons" with no new/interesting information being revealed. > > Is there a way to trap segfault messages or more detailed MPI > > transaction information or anything else that could help diagnose > > this? > > > > Thanks. > > -- > > V. Ram > > v_r_...@fastmail.fm -- V. Ram v_r_...@fastmail.fm -- http://www.fastmail.fm - A no graphics, no pop-ups email service
[OMPI users] where is opal_install_dirs?
I tried building Global Arrays with OpenMPI 1.2.3 and the portland compilers 7.0.2. It gives an error message about an undefined symbol "opal_install_dirs": mpif90 -O -i8 -c -o dgetf2.o dgetf2.f mpif90: symbol lookup error: mpif90: undefined symbol: opal_install_dirs make[1]: *** [dgetf2.o] Error 127 Does anyone have any idea what the problem could be? If I use pgf90 instead of the mpi wrapper the error does not occur, so something is missing there. Thanks Henk
[OMPI users] build failed using intel compilers on mac os
If using the Intel v10.1.x compilers to build a 64-bit version, by default (default installation), Intel invokes the 64-bit compiler. But yes, you can use the "-m64" flag as well. Warner Yuen Scientific Computing Consulting Engineer Apple Computer email: wy...@apple.com Tel: 408.718.2859 On Oct 9, 2008, at 10:15 PM, users-requ...@open-mpi.org wrote: Message: 2 Date: Thu, 9 Oct 2008 17:28:38 -0400 From: Jeff SquyresSubject: Re: [OMPI users] build failed using intel compilers on mac os x To: Open MPI Users Message-ID: <897c21db-cb73-430c-b306-8e492b247...@cisco.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes The CXX compiler should be icpc, not icc. On Oct 7, 2008, at 11:08 AM, Massimo Cafaro wrote: Dear all, I tried to build the latest v1.2.7 open-mpi version on Mac OS X 10.5.5 using the intel c, c++ and fortran compilers v10.1.017 (the latest ones released by intel). Before starting the build I have properly configured the CC, CXX, F77 and FC environment variables (to icc and ifort). The build failed due to undefined symbols. I am attaching a log of the failed build process. Any clue? Am I doing something wrong? Also, to build a 64 bit version it is enough to supply in the corresponding environment variables the -m64 option ? Thank you in advance and best regards, Massimo
[OMPI users] OPENMPI 1.2.7 & PGI compilers: configure option --disable-ptmalloc2-opt-sbrk
Dear openmpi users I have compiled oenmpi.1.2.7 with PGI 7.1-4 compilers with configure option ³--disable-ptmalloc2-opt-sbrk² , to fix a segmentation fault in sysMALLOC function of ³opal/mca/memory/ptmalloc2/malloc.c². Anybody knows what it means to compile with this option ? thanks Dr. Francesco Iannone Associazione EURATOM-ENEA sulla Fusione C.R. ENEA Frascati Via E. Fermi 45 00044 Frascati (Roma) Italy phone 00-39-06-9400-5124 fax 00-39-06-9400-5524 mailto:francesco.iann...@frascati.enea.it http://www.afs.enea.it/iannone
Re: [OMPI users] Problem launching onto Bourne shell
Great, I look forward to 1.2.8! Hahn On Oct 9, 2008, at 2:32 PM, Hahn Kim wrote: FWIW, the fix has been pushed into the trunk, 1.2.8, and 1.3 SVN branches. So I'll probably take down the hg tree (we use those as temporary branches). On Oct 9, 2008, at 2:32 PM, Hahn Kim wrote: Hi, Thanks for providing a fix, sorry for the delay in response. Once I found out about -x, I've been busy working on the rest of our code, so I haven't had the time to try out the fix. I'll take a look at it soon as I can and will let you know how it works out. Hahn On Oct 7, 2008, at 5:41 PM, Jeff Squyres wrote: On Oct 7, 2008, at 4:19 PM, Hahn Kim wrote: you probably want to set the LD_LIBRARY_PATH (and PATH, likely, and possibly others, such as that LICENSE key, etc.) regardless of whether it's an interactive or non-interactive login. Right, that's exactly what I want to do. I was hoping that mpirun would run .profile as the FAQ page stated, but the -x fix works for now. If you're using Bash, it should be running .bashrc. But it looks like you did identify a bug that we're *not* running .profile. I have a Mercurial branch up with a fix if you want to give it a spin: http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/sh-profile-fixes/ I just realized that I'm using .bash_profile on the x86 and need to move its contents into .bashrc and call .bashrc from .bash_profile, since eventually I will also be launching MPI jobs onto other x86 processors. Thanks to everyone for their help. Hahn On Oct 7, 2008, at 2:16 PM, Jeff Squyres wrote: On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote: Regarding 1., we're actually using 1.2.5. We started using Open MPI last winter and just stuck with it. For now, using the -x flag with mpirun works. If this really is a bug in 1.2.7, then I think we'll stick with 1.2.5 for now, then upgrade later when it's fixed. It looks like this behavior has been the same throughout the entire 1.2 series. Regarding 2., are you saying I should run the commands you suggest from the x86 node running bash, so that ssh logs into the Cell node running Bourne? I'm saying that if "ssh othernode env" gives different answers than "ssh othernode"/"env", then your .bashrc or .profile or whatever is dumping out early depending on whether you have an interactive login or not. This is the real cause of the error -- you probably want to set the LD_LIBRARY_PATH (and PATH, likely, and possibly others, such as that LICENSE key, etc.) regardless of whether it's an interactive or non-interactive login. When I run "ssh othernode env" from the x86 node, I get the following vanilla environment: USER=ha17646 HOME=/home/ha17646 LOGNAME=ha17646 SHELL=/bin/sh PWD=/home/ha17646 When I run "ssh othernode" from the x86 node, then run "env" on the Cell, I get the following: USER=ha17646 LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32 HOME=/home/ha17646 MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key LOGNAME=ha17646 TERM=xterm-color PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/ bin:/ tools/cmake-2.4.7/bin:/tools SHELL=/bin/sh PWD=/home/ha17646 TZ=EST5EDT Hahn On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote: Ralph and I just talked about this a bit: 1. In all released versions of OMPI, we *do* source the .profile file on the target node if it exists (because vanilla Bourne shells do not source anything on remote nodes -- Bash does, though, per the FAQ). However, looking in 1.2.7, it looks like it might not be executing that code -- there *may* be a bug in this area. We're checking into it. 2. You might want to check your configuration to see if your .bashrc is dumping out early because it's a non-interactive shell. Check the output of: ssh othernode env vs. ssh othernode env (i.e., a non-interactive running of "env" vs. an interactive login and running "env") On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote: I am unaware of anything in the code that would "source .profile" for you. I believe the FAQ page is in error here. Ralph On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote: Great, that worked, thanks! However, it still concerns me that the FAQ page says that mpirun will execute .profile which doesn't seem to work for me. Are there any configuration issues that could possibly be preventing mpirun from doing this? It would certainly be more convenient if I could maintain my environment in a single .profile file instead of adding what could potentially be a lot of -x arguments to my mpirun command. Hahn On Oct 6, 2008, at 5:44 PM, Aurélien Bouteiller wrote: tYou can forward your local env with mpirun -x LD_LIBRARY_PATH. As an alternative you can set specific values with mpirun -x LD_LIBRARY_PATH=/some/where:/some/where/else . More information with mpirun --help (or man mpirun). Aurelien Le 6 oct. 08 à 16:06, Hahn Kim a écrit : Hi, I'm having difficulty launching an Open MPI job onto a machine that is running the Bourne shell. Here's my
Re: [OMPI users] build failed using intel compilers on mac os x
Thank you very much. I am going to build again, using the new settings, as suggested. Best regards, Massimo On Oct 9, 2008, at 11:28 PM, Jeff Squyres wrote: The CXX compiler should be icpc, not icc. On Oct 7, 2008, at 11:08 AM, Massimo Cafaro wrote: Dear all, I tried to build the latest v1.2.7 open-mpi version on Mac OS X 10.5.5 using the intel c, c++ and fortran compilers v10.1.017 (the latest ones released by intel). Before starting the build I have properly configured the CC, CXX, F77 and FC environment variables (to icc and ifort). The build failed due to undefined symbols. I am attaching a log of the failed build process. Any clue? Am I doing something wrong? Also, to build a 64 bit version it is enough to supply in the corresponding environment variables the -m64 option ? Thank you in advance and best regards, Massimo -- *** Massimo Cafaro, Ph.D. Additional affiliations: Assistant Professor National Nanotechnology Laboratory (NNL/CNR-INFM) Dept. of Engineering for Innovation Euro-Mediterranean Centre for Climate Change University of Salento, Lecce, ItalySPACI Consortium Via per Monteroni 73100 Lecce, Italy Voice +39 0832 297371 Fax +39 0832 298173 Web http://sara.unile.it/~cafaro E-mail massimo.caf...@unile.it caf...@cacr.caltech.edu *** -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- *** Massimo Cafaro, Ph.D. Additional affiliations: Assistant Professor National Nanotechnology Laboratory (NNL/CNR-INFM) Dept. of Engineering for Innovation Euro-Mediterranean Centre for Climate Change University of Salento, Lecce, ItalySPACI Consortium Via per Monteroni 73100 Lecce, Italy Voice +39 0832 297371 Fax +39 0832 298173 Web http://sara.unile.it/~cafaro E-mail massimo.caf...@unile.it caf...@cacr.caltech.edu ***