Re: [OMPI users] MPI Java Bindings on Mac OSX
Ralph, The source is available at http://modules.sourceforge.net/ Doug On Jan 3, 2013, at 10:49 AM, Ralph Castain wrote: > Hi Doug > > What modules software do you use on the Mac? Would be nice to know :-) > > > On Jan 3, 2013, at 8:34 AM, Doug Reeder <d...@centurylink.net> wrote: > >> Chuck, >> >> In step 4 you might want to consider the following >> >> --prefix=/usr/local/openmpi-1.7rc5 >> >> and use the modules software to select which version of openmpi to use. I >> have to have multiple versions of openmpi available on my macs and this >> approach has worked well for me. >> >> Doug Reeder >> On Jan 3, 2013, at 9:22 AM, Chuck Mosher wrote: >> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MPI Java Bindings on Mac OSX
Chuck, In step 4 you might want to consider the following --prefix=/usr/local/openmpi-1.7rc5 and use the modules software to select which version of openmpi to use. I have to have multiple versions of openmpi available on my macs and this approach has worked well for me. Doug Reeder On Jan 3, 2013, at 9:22 AM, Chuck Mosher wrote: > Hi, > > I've been trying to get a working version of the MPI java bindings on Mac OSX > (10.6.8 with Java 1.6.0_37). > > I ran into a number of issues along the way that I thought I would record > here for others who might be foolish enough to try the same ;-) > > The issues I had to spend time with were: > > 1. Installing a C compiler that can run from the command line > 2. Finding and installing an appropriate Java JDK for my OS version > 3. Building and installing OpenMPI for the first time on a Mac > 4. Conflicts with the existing OpenMPI version 1.2.8 that was installed > already on my Mac > 5. Figuring out syntax for using the mpirun command line to run java > 6. Odd behavior when trying to use "localhost" or the output from `hostname` > on the command line or in a hostfile > > Resolution for each of these in order: > > 1. Installing a C compiler for the command line > Found a good resource here: > http://www.macobserver.com/tmo/article/install_the_command_line_c_compilers_in_os_x_lion > The solution is to install XCode, then enable command line compilers from the > XCode console. > > 2. Finding and installing an appropriate Java JDK for my OS version > Used this resource to eventually figure out what to do: > http://www.wikihow.com/Install-the-JDK-(Java-Development-Kit)-on-Mac-OS-X > It didn't exactly match my setup, but had enough clues. > The solution is to first find your java version (java -version, 1.6.0_37 in > my case) and then match that version number to the Apple Java update version > (11 in my case). > The key document is: > http://developer.apple.com/library/mac/#technotes/tn2002/tn2110.html > Which is a table relating java version numbers to the appropriate "Java for > Mac OS X xx.x Update xx". > Once you know the update number, you can download the JDK installer from > https://developer.apple.com/downloads/index.action > where you of course have to have an Apple developer ID to access. > Enter "java" in the search bar on the left and find the matching java update, > and you're good to go. > > 3. Building and installing OpenMPI for the first time on a Mac > After the usual false starts with a new installation on a new OS, I managed > to get a working build of openmpi-1.7rc5 with Java bindings. > I could only find the java bindings in the 1.7 pre-release. > I used the defaults as much as possible. > > After downloading from: > http://www.open-mpi.org/software/ompi/v1.7/ > and unarchiving to Downloads, open a Terminal window. > > cd Downloads/openmpi-1.7rc5 > ./configure --enable-java --prefix=/usr/local > make all > sudo make install > > Verify that you can run the commands and examples: > > chuck-> /usr/local/bin/mpirun -version > mpirun (Open MPI) 1.7rc5 > > chuck-> cd examples > chuck-> make > chuck-> /usr/local/bin/mpirun -np 2 hello_c > Hello, world, I am 0 of 2, (Open MPI v1.7rc5, package: Open MPI > chuck@chucks-iMac.local Distribution, ident: 1.7rc5, Oct 30, 2012, 111) > Hello, world, I am 1 of 2, (Open MPI v1.7rc5, package: Open MPI > chuck@chucks-iMac.local Distribution, ident: 1.7rc5, Oct 30, 2012, 111) > > 4. Conflicts with the existing OpenMPI version 1.2.8 that was installed > already on my Mac > OpenMPI Version 1.2.8 was already installed for my OS in /usr/bin > So, if you accidentally type: > > chuck-> mpirun -np 2 hello_c > -- > A requested component was not found, or was unable to be opened > ... > > you picked up the wrong "mpirun" and you will get a bunch of error output > complaining about sockets or mis-matched shared library versions. > > I dealt with this moving the existing OpenMPI related commands to a > subdirectory, and then created symbolic links from /usr/local/bin to /usr/bin > for the commands I needed. > > 5. Figuring out syntax for using the mpirun command line to run java > First be sure you can run Java > > chuck-> /usr/bin/java -version > java version "1.6.0_37" > Java(TM) SE Runtime Environment (build 1.6.0_37-b06-434-10M3909) > Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode) > > Then be sure you can run your java class from the command line as well. To > figure this out I created a couple of simple java files in a temp direct
Re: [OMPI users] regarding the problem occurred while running an mpi programs
That is well documented as a BAD idea. On Apr 25, 2012, at 8:23 AM, seshendra seshu wrote: > Hi > Yes i run in root. > > On Wed, Apr 25, 2012 at 4:20 PM, tyler.bal...@huskers.unl.edu >wrote: > Seshendra, > > Do you always run in root? If not your root bash file may not have the > correct path settings, but that is a shot in the dark.. > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of > seshendra seshu [seshu...@gmail.com] > Sent: Wednesday, April 25, 2012 9:16 AM > To: Open MPI Users > Subject: [OMPI users] regarding the problem occurred while running an mpi > programs > > > Hi, > I have got the following error while running mpi programs > > Here for running an mpi program i used hostfile which specifies all the nodes > in my cluster and out is my output file generated after "mpicc -o out > basic.c". then i have got the following error. > > [root@ip-10-80-106-70 openmpi-1.4.5]# mpirun --hostfile hostfile out > out: error while loading shared libraries: libmpi_cxx.so.0: cannot open > shared object file: No such file or directory > -- > mpirun was unable to launch the specified application as it could not find an > executable: > > Executable: out > Node: ip-10-85-134-176.example.com > > while attempting to start process rank 1. > > > so kindly provide me solution iam lagging of time. > > -- > WITH REGARDS > M.L.N.Seshendra > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > WITH REGARDS > M.L.N.Seshendra > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] heterogenous cluster
Jody, With the gnu compilers the -m32 flag works. With other compilire's the same or other flag should work. Doug Reeder On Feb 1, 2011, at 11:46 PM, jody wrote: > Thanks for your reply. > > If i try your suggestion, every process fails with the following message: > > *** The MPI_Init() function was called before MPI_INIT was invoked. > *** This is disallowed by the MPI standard. > *** Your MPI job will now abort. > [aim-triops:15460] Abort before MPI_INIT completed successfully; not > able to guarantee that all other processes were killed! > > I think this is caused by the fact that on the 64Bit machine Open MPI > is also built as a 64 bit application. > How can i force OpenMPI to be built as a 32Bit application on a 64Bit machine? > > Thank You > Jody > > On Tue, Feb 1, 2011 at 9:00 PM, David Mathog <mat...@caltech.edu> wrote: >> >>> I have sofar used a homogenous 32-bit cluster. >>> Now i have added a new machine which is 64 bit >>> >>> This means i have to reconfigure open MPI with >> `--enable-heterogeneous`, right? >> >> Not necessarily. If you don't need the 64bit capabilities you could run >> 32 bit binaries along with a 32 bit version of OpenMPI. At least that >> approach has worked so far for me. >> >> Regards, >> >> David Mathog >> mat...@caltech.edu >> Manager, Sequence Analysis Facility, Biology Division, Caltech >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software?
Jeff, We have similar circumstances and have been able to install and use versions of openmpi newer than supplied with the OS. It is necessary to have some means of path management to ensure that applications build against the desired version of openmpi and run with the version of openmpi they were built with. We use the module system for this path management. We create modules for each version of openmpi and each version of the applications. We than include the appropriate openmpi module in the module for the application. Then when a user loads a module for their application they automatically get the correct version of openmpi. Doug Reeder On Feb 1, 2011, at 2:02 PM, Jeffrey A Cummings wrote: > I use OpenMPI on a variety of platforms: stand-alone servers running Solaris > on sparc boxes and Linux (mostly CentOS) on AMD/Intel boxes, also Linux > (again CentOS) on large clusters of AMD/Intel boxes. These platforms all > have some version of the 1.3 OpenMPI stream. I recently requested an upgrade > on all systems to 1.4.3 (for production work) and 1.5.1 (for > experimentation). I'm getting a lot of push back from the SysAdmin folks > claiming that OpenMPI is closely intertwined with the specific version of the > operating system and/or other system software (i.e., Rocks on the clusters). > I need to know if they are telling me the truth or if they're just making > excuses to avoid the work. To state my question another way: Apparently > each release of Linux and/or Rocks comes with some version of OpenMPI bundled > in. Is it dangerous in some way to upgrade to a newer version of OpenMPI? > Thanks in advance for any insight anyone can provide. > > - Jeff___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Mac Ifort and gfortran together
Hello, You may me bumping into conflicts between the apple supplied ompi and your mpi. I use modules to force my mpi to the front of the PATH and DYLD_LIBRARY_PATH variables. Doug Reeder On Dec 15, 2010, at 5:22 PM, Jeff Squyres wrote: > Sorry for the ginormous delay in replying here; I blame SC'10, Thanksgiving, > and the MPI Forum meeting last week... > > > On Nov 29, 2010, at 2:12 PM, David Robertson wrote: > >> I'm noticing a strange problem with Open MPI 1.4.2 on Mac OS X 10.6. We use >> both Intel Ifort 11.1 and gfortran 4.3 on the same machine and switch >> between them to test and debug code. >> >> I had runtime problems when I compiled openmpi in my usual way of no shared >> libraries so I switched to shared and it runs now. > > What problems did you have? OMPI should work fine when compiled statically. > >> However, in order for it to work with ifort I ended up needing to add the >> location of my intel compiled Open MPI libraries >> (/opt/intelsoft/openmpi/lib) to my DYLD_LIBRARY_PATH environment variable to >> to get codes to compile and/or run with ifort. > > Is this what Intel recommends for anything compiled with ifort on OS X, or is > this unique to OMPI-compiled MPI applications? > >> The problem is that adding /opt/intelsoft/openmpi/lib to DYLD_LIBRARY_PATH >> broke my Open MPI for gfortran. Now when I try to compile with mpif90 for >> gfortran it thinks it's actually trying to compile with ifort still. As soon >> as I take the above path out of DYLD_LIBRARY_PATH everything works fine. >> >> Also, when I run ompi_info everything looks right except prefix. It says >> /opt/intelsoft/openmpi rather than /opt/gfortransoft/openmpi like it should. >> It should be noted that having /opt/intelsoft/openmpi in LD_LIBRARY_PATH >> does not produce the same effect. > > I'm not quite clear on your setup, but it *sounds* like you're somehow mixing > up 2 different installations of OMPI -- one in /opt/intelsoft and the other > in /opt/gfortransoft. > > Can you verify that you're using the "right" mpif77 (and friends) when you > intend to, and so on? > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Bad performance when scattering big size of data?
In my experience hyperthreading can't really deliver two cores worth of processing simultaneously for processes expecting sole use of a core. Since you really have 512 cores I'm not surprised that you see a performance hit when requesting > 512 compute units. We should really get input from a hyperthreading expert, preferably form intel. Doug Reeder On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote: > We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. So > we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to > scatter an array from the master node to the compute nodes using mpiCC and > mpirun using C++. > > Here is my test: > > The array size is 18KB * Number of compute nodes and is scattered to the > compute nodes 5000 times repeatly. > > The average running time(seconds): > > 100 nodes: 170, > 400 nodes: 690, > 500 nodes: 855, > 600 nodes: 2550, > 700 nodes: 2720, > 800 nodes: 2900, > > There is a big jump of running time from 500 nodes to 600 nodes. Don't know > what's the problem. > Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster for > all the tests in 1.4.2 but the jump still exists. > Tried using either Bcast function or simply Send/Recv which give very close > results. > Tried both in running it directly or using SGE and got the same results. > > The code and ompi_info are attached to this email. The direct running command > is : > /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile > ../machines -np 600 scatttest > > The ifconfig of head node for eth0 is: > eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44 > inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0 > TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5 GiB) > Interrupt:106 Memory:d600-d6012800 > > A typical ifconfig of a compute node is: > eth0 Link encap:Ethernet HWaddr 00:21:9B:9A:15:AC > inet addr:192.168.1.253 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0 > TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:139699954685 (130.1 GiB) TX bytes:338207741480 (314.9 GiB) > Interrupt:82 Memory:d600-d6012800 > > > Does anyone help me out of this? It bothers me a lot. > > Thank you very much. > > Linbao > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Building OpenMPI 10.4 with PGI fortran 10.8 and gcc
Axel, Should the argument be -ipthread? Doug Reeder On Sep 14, 2010, at 12:17 PM, Axel Schweiger wrote: > Trying to build a hybrid OpenMPI with PGI fortran and gcc to support WRF model > The problem appears to be due to a -pthread switch passed to pgfortran. > > > > libtool: link: pgfortran -shared -fpic -Mnomain .libs/mpi.o > .libs/mpi_sizeof.o .libs/mpi_comm_spawn_multiple_f90.o > .libs/mpi_testall_f90.o .libs/mpi_testsome_f90.o .libs/mpi_waitall_f90.o > .libs/mpi_waitsome_f90.o .libs/mpi_wtick_f90.o .libs/mpi_wtime_f90.o > -Wl,-rpath -Wl,/home/axel/AxboxInstall/openmpi-1.4.2/ompi/.libs -Wl,-rpath > -Wl,/home/axel/AxboxInstall/openmpi-1.4.2/orte/.libs -Wl,-rpath > -Wl,/home/axel/AxboxInstall/openmpi-1.4.2/opal/.libs -Wl,-rpath > -Wl,/opt/openmpi-pgi-gcc-1.42/lib > -L/home/axel/AxboxInstall/openmpi-1.4.2/orte/.libs > -L/home/axel/AxboxInstall/openmpi-1.4.2/opal/.libs > ../../../ompi/.libs/libmpi.so > /home/axel/AxboxInstall/openmpi-1.4.2/orte/.libs/libopen-rte.so > /home/axel/AxboxInstall/openmpi-1.4.2/opal/.libs/libopen-pal.so -ldl -lnsl > -lutil -lm-pthread -Wl,-soname -Wl,libmpi_f90.so.0 -o > .libs/libmpi_f90.so.0.0.0 > pgfortran-Error-Unknown switch: -pthread > make[4]: *** [libmpi_f90.la] Error 1 > > > There has been discussion on this issue and the below solution suggested. > This doesn't appear to work for the 10.8 > release. > > http://www.open-mpi.org/community/lists/users/2009/04/8911.php > > There was a previous thread: > http://www.open-mpi.org/community/lists/users/2009/03/8687.php > > suggesting other solutions. > > Wondering if there is a better solution right now? Building 1.4.2 > > Thanks > Axel > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Configuring with torque: error and patch
John, I haven't done a build with torque lately, but I think you need to have a -ltorque argument in the load step. Doug Reeder On May 30, 2010, at 9:13 AM, John Cary wrote: Upon configuring and building openmpi on a system with torque, I repeatedly got build errors of the sort, /bin/sh ../../../libtool --tag=CXX --mode=link g++ -O3 -DNDEBUG - finline-functions -pthread -o ompi_info components.o ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl -lutil -lm libtool: link: g++ -O3 -DNDEBUG -finline-functions -pthread -o .libs/ ompi_info components.o ompi_info.o output.o param.o version.o ../../../ompi/.libs/libmpi.so -L/usr/local/torque-2.4.0b1/ lib /scr_multipole/cary/facetspkgs/builds/openmpi-1.4.2/nodl/ orte/.libs/libopen-rte.so /scr_multipole/cary/facetspkgs/builds/ openmpi-1.4.2/nodl/opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm - pthread -Wl,-rpath -Wl,/usr/local/contrib/openmpi-1.4.2-nodl/lib /scr_multipole/cary/facetspkgs/builds/openmpi-1.4.2/nodl/orte/.libs/ libopen-rte.so: undefined reference to `tm_spawn' /scr_multipole/cary/facetspkgs/builds/openmpi-1.4.2/nodl/orte/.libs/ libopen-rte.so: undefined reference to `tm_poll' /scr_multipole/cary/facetspkgs/builds/openmpi-1.4.2/nodl/orte/.libs/ libopen-rte.so: undefined reference to `tm_finalize' /scr_multipole/cary/facetspkgs/builds/openmpi-1.4.2/nodl/orte/.libs/ libopen-rte.so: undefined reference to `tm_init' collect2: ld returned 1 exit status which I fixed by adding one or the other of $(ORTE_WRAPPER_EXTRA_LDFLAGS) $(ORTE_WRAPPER_EXTRA_LIBS) $(OMPI_WRAPPER_EXTRA_LDFLAGS) $(OMPI_WRAPPER_EXTRA_LIBS) to various LDADD variables. I doubt that this is consistent with how your build system is designed, but it works for me. I am sending you the diff in case it helps you in any way. BTW, I also fixed some blanks after backslashes in contrib/Makefile.am. This is also in the attached patch. BestJohn Cary ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Building 1.4.x on mac snow leopard with intel compilers
Mike, Are you sure that you are getting the openmpi that you built and not the one supplied w/ OS X. I use modules to make sure that I am getting the openmpi version I build instead of the OS X suppleid version. Doug Reeder On May 23, 2010, at 10:45 AM, Glass, Micheal W wrote: I’m having problems building a working version of openmpi 1.4.1/2 on a new Apple Mac Pro (dual quad-core nehalem processors) running snow leopard (10.6.3) with the Intel 11.1 compilers. I’ve tried the Intel 11.1.084 and 11.1.088 versions of the compilers. Everything appears to build just fine and some mpi test programs run but whenever I run a program with an MPI_Reduce() or MPI_Allreduce() I get a segfault (even with np=1). I’m building openmpi with: configure —without-xgrid —prefix= CC=icc CXX=icpc F77=ifort FC=ifort When I build openmpi 1.4.1/2 with the GNU 4.3 compilers (installed via macports) using: configure —without-xgrid —prefix= CC=gcc-mp-4.3 CXX=g++-mp-4.3 F77=gfortran-mp-4.3 FC=gfortran-mp-4.3 all my mpi tests (6000+) run fine. Any help would be appreciated. Thanks, Mike ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hello, I have a mac with two quad core nehalem chips (8 cores). The sysctl command shows 16 cpus (apparently w/ hyperthreading). I have a finite element code that runs in parallel using openmpi. Running on the single machine using openmpi -np 8 runs in about 2/3 time that running with -np 16 does. The program is very well optimized for parallel processing so I strongly suspect that hyperthreading is not helping. The program fairly aggressively uses 100% of each cpu it is on so I don't think hyperthreading gets much of a chance to split the cpu activity. I would certainly welcome input/insight from an intel hardware engineer. I make sure that I don't ask for more processors than there are physical cores and that seems to work. Doug Reeder On May 4, 2010, at 7:06 PM, Gus Correa wrote: Hi Ralph Thank you so much for your help. You are right, paffinity is turned off (default): ** /opt/sw/openmpi/1.4.2/gnu-4.4.3-4/bin/ompi_info --param opal all | grep paffinity MCA opal: parameter "opal_paffinity_alone" (current value: "0", data source: default value, synonyms: mpi_paffinity_alone, mpi_paffinity_alone) ** I will try your suggestion to turn off HT tomorrow, and report back here. Douglas Guptill kindly sent a recipe to turn HT off via BIOS settings. Cheers, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Ralph Castain wrote: On May 4, 2010, at 4:51 PM, Gus Correa wrote: Hi Ralph Ralph Castain wrote: One possibility is that the sm btl might not like that you have hyperthreading enabled. I remember that hyperthreading was discussed months ago, in the previous incarnation of this problem/thread/discussion on "Nehalem vs. Open MPI". (It sounds like one of those supreme court cases ... ) I don't really administer that machine, or any machine with hyperthreading, so I am not much familiar to the HT nitty-gritty. How do I turn off hyperthreading? Is it a BIOS or a Linux thing? I may try that. I believe it can be turned off via an admin-level cmd, but I'm not certain about it Another thing to check: do you have any paffinity settings turned on (e.g., mpi_paffinity_alone)? I didn't turn on or off any paffinity setting explicitly, either in the command line or in the mca config file. All that I did on the tests was to turn off "sm", or just use the default settings. I wonder if paffinity is on by default, is it? Should I turn it off? It is off by default - I mention it because sometimes people have it set in the default MCA param file and don't realize it is on. Sounds okay here, though. Our paffinity system doesn't handle hyperthreading at this time. OK, so *if* paffinity is on by default (Is it?), and hyperthreading is also on, as it is now, I must turn off one of them, maybe both, right? I may go combinatorial about this tomorrow. Can't do it today. Darn locked office door! I would say don't worry about the paffinity right now - sounds like it is off. You can always check, though, by running "ompi_info -- param opal all" and checking for the setting of the opal_paffinity_alone variable I'm just suspicious of the HT since you have a quad-core machine, and the limit where things work seems to be 4... It may be. If you tell me how to turn off HT (I'll google around for it meanwhile), I will do it tomorrow, if I get a chance to hard reboot that pesky machine now locked behind a door. Yeah, I'm beginning to believe it is the HT that is causing the problem... Thanks again for your help. Gus On May 4, 2010, at 3:44 PM, Gus Correa wrote: Hi Jeff Sure, I will certainly try v1.4.2. I am downloading it right now. As of this morning, when I first downloaded, the web site still had 1.4.1. Maybe I should have refreshed the web page on my browser. I will tell you how it goes. Gus Jeff Squyres wrote: Gus -- Can you try v1.4.2 which was just released today? On May 4, 2010, at 4:18 PM, Gus Correa wrote: Hi Ralph Thank you very much. The "-mca btl ^sm" workaround seems to have solved the problem, at least for the little hello_c.c test. I just ran it fine up to 128 processes. I confess I am puzzled by this workaround. * Why should we turn off "sm" in a standalone machine, where everything is supposed to operate via shared memory? * Do I incur in a performance penalty by not using "sm"? * What other mechanism is actually used by OpenMPI for process communication in this case? It seems to be using tcp, because when I try -np 256 I get this error: [spinoza:02715] [[11518,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process ca
Re: [OMPI users] openmpi 1.4.1 and xgrid
Cristobal, It may be a 10.6 vs 10.5 difference. In the configure --help output it looks like --with-xgrid=no should turn off the default behavior of building with support for xgrid. Doug Reeder On Apr 30, 2010, at 3:28 PM, Cristobal Navarro wrote: this is strange, because some weeks ago i compiled openmpi 1.4.1 on a mac 10.5.6 and the parameter --without-xgrid worked good. can you turn off xgrid on the macs you are working with?? that might help Cristobal On Fri, Apr 30, 2010 at 6:19 PM, Doug Reeder <d...@cox.net> wrote: Alan, I haven't tried to build 1.4.x on os x 10.6.x yet, but it sounds like the configure script has become too clever by half. Is there a configure argument to force no xgrid (e.g., --with-xgrid=no or -- enable-xgrid=no). Doug Reeder On Apr 30, 2010, at 3:12 PM, Alan wrote: Hi guys, thanks, Well, I can assure there I have the right things as explained here: ompi 1.2.8 (apple) /usr/bin/ompi_info | grep xgrid MCA ras: xgrid (MCA v1.0, API v1.3, Component v1.2.8) MCA pls: xgrid (MCA v1.0, API v1.3, Component v1.2.8) ompi 1.3.3 (Fink) /sw/bin/ompi_info | grep xgrid "nothing" ompi 1.4.1 (mine, for Amber11) /Users/alan/Programmes/amber11/exe/ompi_info | grep xgrid MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1) So, my problem is "simple", the formula I used to compile ompi without xgrid used to work, but it's simply not working anymore with ompi 1.4.1, even though I see in compilation: --- MCA component plm:xgrid (m4 configuration macro) checking for MCA component plm:xgrid compile mode... static checking if C and Objective C are link compatible... yes checking for XgridFoundation Framework... yes configure: WARNING: XGrid components must be built as DSOs. Disabling checking if MCA component plm:xgrid can compile... no Any help helps. Thanks, Alan On Fri, Apr 30, 2010 at 20:32, Cristobal Navarro <axisch...@gmail.com> wrote: try launching mpirun -v a see what version is picking up. maybe its the included 1.2.x Cristobal On Fri, Apr 30, 2010 at 3:22 PM, Doug Reeder <d...@cox.net> wrote: Alan, Are you sure that the ompi_info and mpirun that you are using are the 1.4.1 versions and not the apple supplied versions. I use modules to help ensure that I am using the openmpi that I built and not the apple supplied versions. Doug Reeder On Apr 30, 2010, at 12:14 PM, Alan wrote: Hi there, No matter I do I cannot disable xgrid while compiling opempi. I tried: --without-xgrid --enable-shared --enable-static And still see with ompi_info: MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1) And because of xgrid on ompi, I have: openmpi-1.4.1/examples% mpirun -c 2 hello_c [amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1 in file src/plm_xgrid_module.m at line 119 [amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1 in file src/plm_xgrid_module.m at line 15 Using mac SL 10.6.3 Compiling 1.3.3 and haven't any problem. Thanks in advance, Alan -- Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate Department of Biochemistry, University of Cambridge. 80 Tennis Court Road, Cambridge CB2 1GA, UK. >>http://www.bio.cam.ac.uk/~awd28<< ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate Department of Biochemistry, University of Cambridge. 80 Tennis Court Road, Cambridge CB2 1GA, UK. >>http://www.bio.cam.ac.uk/~awd28<< ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] openmpi 1.4.1 and xgrid
Alan, I haven't tried to build 1.4.x on os x 10.6.x yet, but it sounds like the configure script has become too clever by half. Is there a configure argument to force no xgrid (e.g., --with-xgrid=no or -- enable-xgrid=no). Doug Reeder On Apr 30, 2010, at 3:12 PM, Alan wrote: Hi guys, thanks, Well, I can assure there I have the right things as explained here: ompi 1.2.8 (apple) /usr/bin/ompi_info | grep xgrid MCA ras: xgrid (MCA v1.0, API v1.3, Component v1.2.8) MCA pls: xgrid (MCA v1.0, API v1.3, Component v1.2.8) ompi 1.3.3 (Fink) /sw/bin/ompi_info | grep xgrid "nothing" ompi 1.4.1 (mine, for Amber11) /Users/alan/Programmes/amber11/exe/ompi_info | grep xgrid MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1) So, my problem is "simple", the formula I used to compile ompi without xgrid used to work, but it's simply not working anymore with ompi 1.4.1, even though I see in compilation: --- MCA component plm:xgrid (m4 configuration macro) checking for MCA component plm:xgrid compile mode... static checking if C and Objective C are link compatible... yes checking for XgridFoundation Framework... yes configure: WARNING: XGrid components must be built as DSOs. Disabling checking if MCA component plm:xgrid can compile... no Any help helps. Thanks, Alan On Fri, Apr 30, 2010 at 20:32, Cristobal Navarro <axisch...@gmail.com> wrote: try launching mpirun -v a see what version is picking up. maybe its the included 1.2.x Cristobal On Fri, Apr 30, 2010 at 3:22 PM, Doug Reeder <d...@cox.net> wrote: Alan, Are you sure that the ompi_info and mpirun that you are using are the 1.4.1 versions and not the apple supplied versions. I use modules to help ensure that I am using the openmpi that I built and not the apple supplied versions. Doug Reeder On Apr 30, 2010, at 12:14 PM, Alan wrote: Hi there, No matter I do I cannot disable xgrid while compiling opempi. I tried: --without-xgrid --enable-shared --enable-static And still see with ompi_info: MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1) And because of xgrid on ompi, I have: openmpi-1.4.1/examples% mpirun -c 2 hello_c [amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1 in file src/plm_xgrid_module.m at line 119 [amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1 in file src/plm_xgrid_module.m at line 15 Using mac SL 10.6.3 Compiling 1.3.3 and haven't any problem. Thanks in advance, Alan -- Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate Department of Biochemistry, University of Cambridge. 80 Tennis Court Road, Cambridge CB2 1GA, UK. >>http://www.bio.cam.ac.uk/~awd28<< ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate Department of Biochemistry, University of Cambridge. 80 Tennis Court Road, Cambridge CB2 1GA, UK. >>http://www.bio.cam.ac.uk/~awd28<< ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] openmpi 1.4.1 and xgrid
Alan, Are you sure that the ompi_info and mpirun that you are using are the 1.4.1 versions and not the apple supplied versions. I use modules to help ensure that I am using the openmpi that I built and not the apple supplied versions. Doug Reeder On Apr 30, 2010, at 12:14 PM, Alan wrote: Hi there, No matter I do I cannot disable xgrid while compiling opempi. I tried: --without-xgrid --enable-shared --enable-static And still see with ompi_info: MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1) And because of xgrid on ompi, I have: openmpi-1.4.1/examples% mpirun -c 2 hello_c [amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1 in file src/plm_xgrid_module.m at line 119 [amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1 in file src/plm_xgrid_module.m at line 15 Using mac SL 10.6.3 Compiling 1.3.3 and haven't any problem. Thanks in advance, Alan -- Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate Department of Biochemistry, University of Cambridge. 80 Tennis Court Road, Cambridge CB2 1GA, UK. >>http://www.bio.cam.ac.uk/~awd28<< ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] configure script fails
Christoph, It looks like you need to add -L/usr/local/lib to the fortran and f90 flags, either on the configure input or in the environment variables, so that the loader can find libgfortran. Doug On Jan 13, 2010, at 4:09 PM, von Tycowicz, Christoph wrote: Hi, when running the configure script it breaks with: configure: error: Could not run a simple Fortran 77 program. Aborting. (logs with details attached) I don't know how to interpret this error since I already successfully compiled fortran code using these compilers(gcc/ gfortran 4.5). If would be really grateful for any clues on this. best regards Christoph ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OpenMPI on OS X - file is not of required architecture
Andreas, Have you checked that ifort is creating 64 bit objects. If I remember correctly with 10.1 the default was to create 32 bit objects. Doug Reeder On Sep 11, 2009, at 3:25 PM, Andreas Haselbacher wrote: On Fri, Sep 11, 2009 at 5:10 PM, Jeff Squyres <jsquy...@cisco.com> wrote: On Sep 11, 2009, at 10:05 AM, Andreas Haselbacher wrote: I've built openmpi version 1.3.3 on a MacPro with OS X 10.5.8 and the Intel 10.1.006 Fortran compiler and gcc 4.0. As far as I can tell, the configure and make commands completed fine. There are some warnings, but it's not clear to me that they are critical - or the explanation for what's not working. After installing, I try to compile a simple F77 hello world code. The output is: % mpif77 helloworld_mpi.f -o helloworld_mpi ld: warning in /opt/openmpi/lib/libmpi_f77.a, file is not of required architecture This means that it skipped that library because it didn't match what you were trying to compile against. Can you send the output of mpif77 --showme? ifort -I/opt/openmpi/include -L/opt/openmpi/lib -lmpi_f77 -lmpi - lopen-rte -lopen-pal -lutil Undefined symbols: "_mpi_init_", referenced from: _MAIN__ in ifortIsUNoZ.o None of these symbols were found because libmpi_f77.a was skipped. Right. Here's my configure command: ./configure --prefix=/opt/openmpi --enable-static --disable-shared CC=gcc CFLAGS=-m64 CXX=g++ CXXFLAGS=-m64 F77=ifort FC=ifort FFLAGS=- assume nounderscore FCFLAGS=-assume nounderscore I do not have the intel compilers for Mac; do they default to producing 64 bit objects? I ask because it looks like you forced the C and C++ compilers to produce 64 bit objects -- do you need to do the same with ifort? (via the FCFLAGS and FFLAGS env variables) If I remember correctly, I had to add those flags, otherwise configure claimed that the compilers were not compatible. I can rerun configure if you suspect that this is an issue. I did not add these flags to the Fortran variables because configure did not complain further, but I can see that this might be an issue. Also, did you quote the "-assume nounderscore" arguments to FFLAGS/ FCFLAGS? I.e., something like this: "FFLAGS=-assume nounderscore" Yes, I did. Andreas -- Jeff Squyres jsquy...@cisco.com ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] ompi_info segmentation fault with Snow Leopard
Marcus, What version of openmpi ships with 10.6. Are you making sure that you are getting the includes and libraries for 1.3.3 and not the native apple version of openmpi. Doug Reeder On Sep 1, 2009, at 4:31 PM, Marcus Herrmann wrote: Hi, I am trying to install openmpi 1.3.3 under OSX 10.6 (Snow Leopard) using the 11.1.058 intel compilers. Configure and build seem to work fine. However trying to run ompi_info after install causes directly a segmentation fault without any additional information being printed. Did anyone have success in using 1.3.3 under Snow Leopard? Thanks Marcus ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Configuration problem or network problem?
Lin, Try -np 16 and not running on the head node. Doug Reeder On Jul 6, 2009, at 7:08 PM, Zou, Lin (GE, Research, Consultant) wrote: Hi all, The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as a headnode, they are connected by a high speed switch. There are point-to-point communication functions( MPI_Send and MPI_Recv ), the data size is about 40KB, and a lot of computings which will consume a long time(about 1 sec)in a loop.The co- processor in PS3 can take care of the computation, the main processor take care of point-to-point communication,so the computing and communication can overlap.The communication funtions should return much faster than computing function. My question is that after some circles, the time consumed by communication functions in a PS3 will increase heavily, and the whole cluster's sync state will corrupt.When I decrease the computing time, this situation just disappeare.I am very confused about this. I think there is a mechanism in OpenMPI that cause this case, does everyone get this situation before? I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there something i should added? Lin ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Compiling Open MPI with PGI compilers in 32-bit mode
Ethan, It looks likesome of the object files that you are trying to link to the malloc.o and malloc-stats.o were compiled as 64 bit objects. Are you using the 32 bit compiler flag for the compile step as well as the link step. Doug Reeder On Mar 20, 2009, at 10:49 AM, Ethan Mallove wrote: Hi, Has anyone successfully compiled Open MPI with the PGI compilers in 32-bit mode (e.g., using -tp=k8-32 flag)? I am getting the following error with 32-bit: $ cd opal/mca/memory/ptmalloc2 $ make /bin/sh ../../../../libtool --tag=CC --mode=link pgcc -O -DNDEBUG - tp=k8-32 -export-dynamic -o libopenmpi-malloc.la -rpath /opt/SUNWhpc/ HPC8.2/pgi/lib malloc.lo malloc-stats.lo -lnsl -lutil -lpthread libtool: link: pgcc -shared -fpic -DPIC .libs/malloc.o .libs/ malloc-stats.o -lnsl -lutil -lpthread -lc -Wl,-soname -Wl,libopenmpi- malloc.so.0 -o .libs/libopenmpi-malloc.so.0.0.0 /usr/bin/ld: warning: i386 architecture of input file `.libs/ malloc.o' is incompatible with i386:x86-64 output /usr/bin/ld: warning: i386 architecture of input file `.libs/malloc- stats.o' is incompatible with i386:x86-64 output .libs/malloc.o(.text+0xcb3): In function `realloc_check': : undefined reference to `opal_memcpy_base_module' .libs/malloc.o(.text+0x14e3): In function `munmap_chunk': : undefined reference to `opal_mem_free_ptmalloc2_munmap' .libs/malloc.o(.text+0x1560): In function `mremap_chunk': : undefined reference to `opal_mem_hooks_release_hook' .libs/malloc.o(.text+0x2be2): In function `_int_free': : undefined reference to `opal_mem_free_ptmalloc2_munmap' .libs/malloc.o(.text+0x30ae): In function `_int_realloc': : undefined reference to `opal_mem_hooks_release_hook' .libs/malloc.o(.text+0x3c2a): In function `opal_mem_free_ptmalloc2_sbrk': : undefined reference to `opal_mem_hooks_release_hook' .libs/malloc.o(.text+0x3fab): In function `ptmalloc_init': : undefined reference to `opal_mem_hooks_set_support' .libs/malloc.o(.text+0x40ad): In function `new_heap': : undefined reference to `opal_mem_free_ptmalloc2_munmap' .libs/malloc.o(.text+0x40d5): In function `new_heap': : undefined reference to `opal_mem_free_ptmalloc2_munmap' .libs/malloc.o(.text+0x414f): In function `new_heap': : undefined reference to `opal_mem_free_ptmalloc2_munmap' .libs/malloc.o(.text+0x4198): In function `new_heap': : undefined reference to `opal_mem_free_ptmalloc2_munmap' .libs/malloc.o(.text+0x4282): In function `heap_trim': : undefined reference to `opal_mem_free_ptmalloc2_munmap' .libs/malloc.o(.text+0x44aa): In function `arena_get2': : undefined reference to `opal_atomic_wmb' make: *** [libopenmpi-malloc.la] Error 2 -Ethan ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] 1.3 and --preload-files and --preload-binary
Josh, It sounds like . is not in your path. That would prevent mpirun from seeing the binary in the current directory. Doug Reeder On Jan 22, 2009, at 10:48 AM, Josh Hursey wrote: As a followup. I can confirm that --preload-files is not working as it should. I was able to use --preload-binary with a full path to the binary without a problem though. The following commands worked fine (where / tmp is not mounted on all machines): shell$ mpirun -np 2 --preload-binary /tmp/hello shell$ mpirun -np 2 -s /tmp/hello However if I referred directly to the binary in the current directory I saw the same failure: shell$ cd /tmp shell$ mpirun -np 2 -s hello -- mpirun was unable to launch the specified application as it could not find an executable: Executable: hello Node: odin101 while attempting to start process rank 0. -- I'll keep digging into this bug, and let you know when I have a fix. I filed a ticket (below) that you can use to track the progress on this bug. https://svn.open-mpi.org/trac/ompi/ticket/1770 Thanks again for the bug report, I'll try to resolve this soon. Josh On Jan 22, 2009, at 10:49 AM, Josh Hursey wrote: The warning is to be expected if the file already exists on the remote side. Open MPI has a policy not to replace the file if it already exists. The segv is concerning. :/ I will take a look and see if I can diagnose what is going on here. Probably in the next day or two. Thanks for the bug report, Josh On Jan 22, 2009, at 10:11 AM, Geoffroy Pignot wrote: Hello, As you can notice , I am trying the work done on this new release. preload-files and preload-binary options are very interesting to me because I work on a cluster without any shared space between nodes. I tried those basically , but no success . You will find below the error messages. If I did things wrong, would it be possible to get simple examples showing how these options work. Thanks Geoffroy /tmp/openmpi-1.3/bin/mpirun --preload-files hello.c --hostfile / tmp/hostlist -np 2 hostname -- WARNING: Could not preload specified file: File already exists. Fileset: /tmp/hello.c Host: compil03 Will continue attempting to launch the process. -- [compil03:26657] filem:rsh: get(): Failed to preare the request structure (-1) -- WARNING: Could not preload the requested files and directories. Fileset: Fileset: hello.c Will continue attempting to launch the process. -- [compil03:26657] [[13938,0],0] ORTE_ERROR_LOG: Error in file base/ odls_base_state.c at line 127 [compil03:26657] [[13938,0],0] ORTE_ERROR_LOG: Error in file base/ odls_base_default_fns.c at line 831 [compil03:26657] *** Process received signal *** [compil03:26657] Signal: Segmentation fault (11) [compil03:26657] Signal code: Address not mapped (1) [compil03:26657] Failing at address: 0x395eb15000 [compil03:26657] [ 0] /lib64/tls/libpthread.so.0 [0x395f80c420] [compil03:26657] [ 1] /lib64/tls/libc.so.6(memcpy+0x3f) [0x395ed718df] [compil03:26657] [ 2] /tmp/openmpi-1.3/lib64/libopen-pal.so.0 [0x2a956b0a10] [compil03:26657] [ 3] /tmp/openmpi-1.3/lib64/libopen-rte.so. 0(orte_odls_base_default_launch_local+0x55c) [0x2a955809cc] [compil03:26657] [ 4] /tmp/openmpi-1.3/lib64/openmpi/ mca_odls_default.so [0x2a963655f2] [compil03:26657] [ 5] /tmp/openmpi-1.3/lib64/libopen-rte.so. 0(orte_daemon_cmd_processor+0x57d) [0x2a9557812d] [compil03:26657] [ 6] /tmp/openmpi-1.3/lib64/libopen-pal.so.0 [0x2a956b9828] [compil03:26657] [ 7] /tmp/openmpi-1.3/lib64/libopen-pal.so. 0(opal_progress+0xb0) [0x2a956ae820] [compil03:26657] [ 8] /tmp/openmpi-1.3/lib64/libopen-rte.so. 0(orte_plm_base_launch_apps+0x1ed) [0x2a95584e7d] [compil03:26657] [ 9] /tmp/openmpi-1.3/lib64/openmpi/ mca_plm_rsh.so [0x2a95c3ed98] [compil03:26657] [10] /tmp/openmpi-1.3/bin/mpirun [0x403330] [compil03:26657] [11] /tmp/openmpi-1.3/bin/mpirun [0x402ad3] [compil03:26657] [12] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x395ed1c4bb] [compil03:26657] [13] /tmp/openmpi-1.3/bin/mpirun [0x402a2a] [compil03:26657] *** End of error message *** Segmentation fault And it's not better with --preload-binary . a.out_32 compil03% /tmp/openmpi-1.3/bin/mpirun -s --hostfile /tmp/hostlist - wdir /tmp -np 2 a.out_32 -- mpirun was unable to launch the specified application as it could not find an executable: Executable: a.out_32 Node: compil02 while attempting to start process rank 1. ___ users mailing
Re: [OMPI users] FW: Re: [MTT users] Is the stock MPI that comes with OSX leopard broken with xgrid?
I believe that the openmpi that comes with leopard doesn't support xgrid. If you type ompi_info|grep xgrid you get nothing. I'm not sure what apple was thinking. Doug Reeder On Dec 17, 2008, at 6:30 AM, Ethan Mallove wrote: Hi John, I'm forwarding your question to the Open MPI users list. Regards, Ethan On Wed, Dec/17/2008 08:35:00AM, John Fink wrote: Hello OpenMPI folks, I've got a large pool of Macs running Leopard that are all on an xgrid. However, I can't seem to use the mpirun that comes with Leopard with the xgrid. I've got my grid and password environment variables set up okay on my controller, all the xgrid command line commands work (displaying grid IDs, things like that) but mpirun only wants to run things on the local host. I'm extremely new to OpenMPI and only slightly less new to Macs so there's probably something very obvious that I'm missing, but I'm trying what's detailed on this page: http://www.macresearch.org/runing_mpi_job_through_xgrid (the / bin/hostname example). Here's my output: as-0003-l:~ locadmin$ mpirun -n 8 /bin/hostname as-0003-l.lib.mcmaster.ca as-0003-l.lib.mcmaster.ca as-0003-l.lib.mcmaster.ca as-0003-l.lib.mcmaster.ca as-0003-l.lib.mcmaster.ca as-0003-l.lib.mcmaster.ca as-0003-l.lib.mcmaster.ca as-0003-l.lib.mcmaster.ca Issuing the same command with -nolocal yields the following: as-0003-l:~ locadmin$ mpirun --nolocal -n 8 /bin/hostname - - There are no available nodes allocated to this job. This could be because no nodes were found or all the available nodes were already used. Note that since the -nolocal option was given no processes can be launched on the local node. - - [as-0003-l.lib.mcmaster.ca:82776] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/base/ rmaps_base_support_fns.c at line 168 [as-0003-l.lib.mcmaster.ca:82776] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/ round_robin/rmaps_rr.c at line 402 [as-0003-l.lib.mcmaster.ca:82776] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/base/ rmaps_base_map_job.c at line 210 [as-0003-l.lib.mcmaster.ca:82776] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmgr/urm/ rmgr_urm.c at line 372 [as-0003-l.lib.mcmaster.ca:82776] mpirun: spawn failed with errno=-3 Thanks very much for any help you can provide! jf -- http://libgrunt.blogspot.com -- library culture and technology. References Visible links . http://www.macresearch.org/runing_mpi_job_through_xgrid . http://as-0003-l.lib.mcmaster.ca/ . http://as-0003-l.lib.mcmaster.ca/ . http://as-0003-l.lib.mcmaster.ca/ . file:///tmp/http:/as-0003-l.lib.mcmaster.ca:82776 . file:///tmp/http:/as-0003-l.lib.mcmaster.ca:82776 . file:///tmp/http:/as-0003-l.lib.mcmaster.ca:82776 . file:///tmp/http:/as-0003-l.lib.mcmaster.ca:82776 . file:///tmp/http:/as-0003-l.lib.mcmaster.ca:82776 . http://libgrunt.blogspot.com/ ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OpenMPI runtime-specific environment variable?
Brian, I'm not sure I understand the problem. The ale3d program from LLNL operates exactly as you describe and it can be built with mpich, lam, or openmpi. Doug Reeder On Oct 21, 2008, at 3:08 PM, Adams, Brian M wrote: -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Reuti Sent: Tuesday, October 21, 2008 11:36 AM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI runtime-specific environment variable? Hi, Am 21.10.2008 um 18:52 schrieb Ralph Castain: On Oct 21, 2008, at 10:37 AM, Adams, Brian M wrote: Doug is right that we could use an additional command line flag to indicate MPI runs, but at this point, we're trying to hide that from the user, such that all they have to do is run the binary vs. orterun/mpirun the binary and we detect whether it's a serial or parallel run. And when you have this information you decide for your user, whether to use mpirun (and the correct version to use) or just the plain binary? I might have created some confusion here too. The goal is to build an MPI-enabled binary 'foo' which a user may invoke as (1) ./foo -OR- (2) mpirun -np 4 ./foo The binary foo then determines at run-time whether it is to run in (1) serial, where MPI_Init will never be called; or (2) parallel, calling MPI_Init and so on. This is a historical behavior which we need to preserve, at least for our present software release. You are making something like "strings the_binary" and grep for indications of the compilation type? For the standard Open MPI with shared libraries a "ldd the_binary" might reveal some information. Hadn't thought to do that actually, since it addresses a slightly different problem than I propose above. Thanks for the suggestion. This is another possibility if instead of doing this detection directly in our binary, we decide to change to a wrapper script approach. In any case, I appreciate all the discussion -- I believe I have a reasonable path forward using a combination of pre-processor defines that the OMPI wrappers and headers make with the runtime environment variables Ralph suggested (I'll just check for both the <1.3 and >= 1.3 environment cases). Brian ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OpenMPI runtime-specific environment variable?
Brian, In your code branch for the parallel run you could set an environment or internal variable when you call mpi_init. Can you parse the command line (arg 0) and figure out if you are running parallel or serial. Doug Reeder On Oct 20, 2008, at 3:40 PM, Adams, Brian M wrote: I work on an application (DAKOTA) that has opted for single binaries with source code to detect serial vs. MPI execution at run- time. While I realize there are many other ways to handle this (wrapper scripts, command-line switches, different binaries for serial vs. MPI, etc.), I'm looking for a reliable way to detect (in source) whether a binary has been launched in serial or with orterun. We typically do this via detecting environment variables, so the easiest path for me would be to know an environment variable present when an application is invoked with orterun that is not typically present outside that MPI runtime environment. Some candidates that came up in my particular environment include the following, but I don't know if any is a safe bet: OMPI_MCA_gpr_replica_uri OMPI_MCA_mpi_paffinity_processor OMPI_MCA_mpi_yield_when_idle OMPI_MCA_ns_nds OMPI_MCA_ns_nds_cellid OMPI_MCA_ns_nds_jobid OMPI_MCA_ns_nds_num_procs OMPI_MCA_ns_nds_vpid OMPI_MCA_ns_nds_vpid_start OMPI_MCA_ns_replica_uri OMPI_MCA_orte_app_num OMPI_MCA_orte_base_nodename OMPI_MCA_orte_precondition_transports OMPI_MCA_pls OMPI_MCA_ras OMPI_MCA_rds OMPI_MCA_rmaps OMPI_MCA_rmgr OMPI_MCA_universe I'd also welcome suggestions for other in-source tests that might reliably detect run via orterun. Thanks! Brian -- Brian M. Adams, PhD (bria...@sandia.gov) Optimization and Uncertainty Estimation Sandia National Laboratories, Albuquerque, NM http://www.sandia.gov/~briadam ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Passing LD_LIBRARY_PATH to orted
In torque/pbs using the #PBS -V command pushes the environment variables out to the nodes. I don't know if that is what was happening with slurm. Doug Reeder On Oct 14, 2008, at 12:33 PM, Ralph Castain wrote: I -think- there is...at least here, it does seem to behave that way on our systems. Not sure if there is something done locally to make it work. Also, though, I have noted that LD_LIBRARY_PATH does seem to be getting forwarded on the 1.3 branch in some environments. OMPI isn't doing it directly to the best of my knowledge, but I think the base environment might be. Specifically, I noticed it on slurm earlier today. I'll check the others as far as I can. Craig: what environment are you using? ssh? Ralph On Oct 14, 2008, at 1:18 PM, George Bosilca wrote: I use modules too, but they only work locally. Or is there a feature in "module" to automatically load the list of currently loaded local modules remotely ? george. On Oct 14, 2008, at 3:03 PM, Ralph Castain wrote: You might consider using something like "module" - we use that system for exactly this reason. Works quite well and solves the multiple compiler issue. Ralph On Oct 14, 2008, at 12:56 PM, Craig Tierney wrote: George Bosilca wrote: The option to expand the remote LD_LIBRARY_PATH, in such a way that Open MPI related applications have their dependencies satisfied, is in the trunk. The fact that the compiler requires some LD_LIBRARY_PATH is out of the scope of an MPI implementation, and I don't think we should take care of it. Passing the local LD_LIBRARY_PATH to the remote nodes doesn't make much sense. There are plenty of environment, where the head node have a different configuration than the compute nodes. Again, in this case my original solution seems not that bad. If you copy (or make a link if you prefer) in the Open MPI lib directory to the compiler shared libraries, this will work. george. This does work. It just increases maintenance for each new version of OpenMPI. How often does a head node have a different configuration than the compute node? It would see that this would even more support the passing of LD_LIBRARY_PATH for OpenMPI tools to support a heterogeneous configuration as you described. Thanks, Craig On Oct 14, 2008, at 12:11 PM, Craig Tierney wrote: George Bosilca wrote: Craig, This is a problem with the Intel libraries and not the Open MPI ones. You have to somehow make these libraries available on the compute nodes. What I usually do (but it's not the best way to solve this problem) is to copy these libraries somewhere on my home area and to add the directory to my LD_LIBRARY_PATH. george. This is ok when you only ever use one compiler, but it isn't very flexible. I want to keep it as simple as possible for my users, while having a maintainable system. The libraries are on the compute nodes, the problem deals with supporting multiple versions of compilers. I can't just list all of the lib paths in ld.so.conf, because then the user will never get the correct one. I can't specify a static LD_LIBRARY_PATH for the same reason. I would prefer not to build my system libraries static. To the OpenMPI developers, what is your opinion on changing orterun/mpirun to pass LD_LIBRARY_PATH to the remote hosts when starting OpenMPI processes? By hand, all that would be done is: env LD_LIBRARY_PATH=$LD_LIBRARY_PATH $OPMIPATH/orted This would ensure that orted is launched correctly. Or is it better to just build the OpenMPI tools statically? We also use other compilers (PGI, Lahey) so I need a solution that works for all of them. Thanks, Craig On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote: I am having problems launching openmpi jobs on my system. I support multiple versions of MPI and compilers using GNU Modules. For the default compiler, everything is fine. For non-default, I am having problems. I built Openmpi-1.2.6 (and 1.2.7) with the following configure options: # module load intel/10.1 # ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort -- prefix=/opt/openmpi/1.2.7-intel-10.1 --without- gridengine --enable-io-romio --with-io-romio-flags=--with- file-sys=nfs+ufs --with-openib=/opt/hjet/ofed/1.3.1 When I launch a job, I run the module command for the right compiler/MPI version to set the paths correctly. Mpirun passes LD_LIBRARY_PATH to the executable I am launching, but not orted. When orted is launched on the remote system, the LD_LIBRARY_PATH doesn't come with, and the Intel 10.1 libraries can't be found. /opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading shared libraries: libintlc.so.5: cannot open shared object file: No such file or directory How do others solve this problem? Thanks, Craig -- Craig Tierney (craig.tier...@noaa.gov) ___ users mailing list us...@open-mpi.org htt
Re: [OMPI users] OMPI link error with petsc 2.3.3
Yann, It looks like somehow the libmpi and libmpi_f90 have different values for the variable mpi_fortran_status_ignore. It sounds like a configure problem. You might check the mpi include files to see if you can see where the different values are coming from. Doug Reeder On Oct 7, 2008, at 7:55 AM, Yann JOBIC wrote: Hello, I'm using openmpi 1.3r19400 (ClusterTools 8.0), with sun studio 12, and solaris 10u5 I've got this error when linking a PETSc code : ld: warning: symbol `mpi_fortran_status_ignore_' has differing sizes: (file /opt/SUNWhpc/HPC8.0/lib/amd64/libmpi.so value=0x8; file /opt/SUNWhpc/HPC8.0/lib/amd64/libmpi_f90.so value=0x14); /opt/SUNWhpc/HPC8.0/lib/amd64/libmpi.so definition taken Isn't it very strange ? Have you got any idea on the way to solve it ? Many thanks, Yann ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] segfault issue - possible bug in openmpi
Shafagh, I missed the dependence on the number of processors. Apparently there is some thread support. Doug On Oct 4, 2008, at 5:29 PM, Shafagh Jafer wrote: Doug Reeder, Daniel is saying that the problem only occurs in openmpi when running more than 16 processes. So could that still be cause becasue openmpi does not support threads??!! --- On Fri, 10/3/08, Doug Reeder <d...@rain.org> wrote: From: Doug Reeder <d...@rain.org> Subject: Re: [OMPI users] segfault issue - possible bug in openmpi To: "Open MPI Users" <us...@open-mpi.org> Date: Friday, October 3, 2008, 2:40 PM Daniel, Are you using threads. I don't think the opempi-1.2.x work with threads. Doug Reeder On Oct 3, 2008, at 2:30 PM, Daniel Hansen wrote: Oh, by the way, here is the segfault: [m4b-1-8:11481] *** Process received signal *** [m4b-1-8:11481] Signal: Segmentation fault (11) [m4b-1-8:11481] Signal code: Address not mapped (1) [m4b-1-8:11481] Failing at address: 0x2b91c69eed [m4b-1-8:11483] [ 0] /lib64/libpthread.so.0 [0x33e8c0de70] [m4b-1-8:11483] [ 1] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 [0x2abea7c0] [m4b-1-8:11483] [ 2] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 [0x2abea675] [m4b-1-8:11483] [ 3] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 (mca_pml_ob1_send+0x2da) [0x2abeaf55] [m4b-1-8:11483] [ 4] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 (MPI_Send+0x28e) [0x2ab52c5a] [m4b-1-8:11483] [ 5] /fslhome/dhansen7/compute/for_DanielHansen/ replica_mpi_marylou2/Openmpi_md_twham(twham_init+0x708) [0x42a8a8] [m4b-1-8:11483] [ 6] /fslhome/dhansen7/compute/for_DanielHansen/ replica_mpi_marylou2/Openmpi_md_twham(repexch+0x73c) [0x425d5c] [m4b-1-8:11483] [ 7] /fslhome/dhansen7/compute/for_DanielHansen/ replica_mpi_marylou2/Openmpi_md_twham(main+0x855) [0x4133a5] [m4b-1-8:11483] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x33e841d8a4] [m4b-1-8:11483] [ 9] /fslhome/dhansen7/compute/for_DanielHansen/ replica_mpi_marylou2/Openmpi_md_twham [0x4040b9] [m4b-1-8:11483] *** End of error message *** On Fri, Oct 3, 2008 at 3:20 PM, Daniel Hansen <dhan...@byu.net> wrote: I have been testing some code against openmpi lately that always causes it to crash during certain mpi function calls. The code does not seem to be the problem, as it runs just fine against mpich. I have tested it against openmpi 1.2.5, 1.2.6, and 1.2.7 and they all exhibit the same problem. Also, the problem only occurs in openmpi when running more than 16 processes. I have posted this stack trace to the list before, but I am submitting it now as a potential bug report. I need some help debugging it and finding out exactly what is going on in openmpi when the segfault occurs. Are there any suggestions on how best to do this? Is there an easy way to attach gdb to one of the processes or something?? I have already compiled openmpi with debugging, memory profiling, etc. How can I best take advantage of these features? Thanks, Daniel Hansen Systems Administrator BYU Fulton Supercomputing Lab ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] segfault issue - possible bug in openmpi
Daniel, Are you using threads. I don't think the opempi-1.2.x work with threads. Doug Reeder On Oct 3, 2008, at 2:30 PM, Daniel Hansen wrote: Oh, by the way, here is the segfault: [m4b-1-8:11481] *** Process received signal *** [m4b-1-8:11481] Signal: Segmentation fault (11) [m4b-1-8:11481] Signal code: Address not mapped (1) [m4b-1-8:11481] Failing at address: 0x2b91c69eed [m4b-1-8:11483] [ 0] /lib64/libpthread.so.0 [0x33e8c0de70] [m4b-1-8:11483] [ 1] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 [0x2abea7c0] [m4b-1-8:11483] [ 2] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 [0x2abea675] [m4b-1-8:11483] [ 3] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 (mca_pml_ob1_send+0x2da) [0x2abeaf55] [m4b-1-8:11483] [ 4] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 (MPI_Send+0x28e) [0x2ab52c5a] [m4b-1-8:11483] [ 5] /fslhome/dhansen7/compute/for_DanielHansen/ replica_mpi_marylou2/Openmpi_md_twham(twham_init+0x708) [0x42a8a8] [m4b-1-8:11483] [ 6] /fslhome/dhansen7/compute/for_DanielHansen/ replica_mpi_marylou2/Openmpi_md_twham(repexch+0x73c) [0x425d5c] [m4b-1-8:11483] [ 7] /fslhome/dhansen7/compute/for_DanielHansen/ replica_mpi_marylou2/Openmpi_md_twham(main+0x855) [0x4133a5] [m4b-1-8:11483] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x33e841d8a4] [m4b-1-8:11483] [ 9] /fslhome/dhansen7/compute/for_DanielHansen/ replica_mpi_marylou2/Openmpi_md_twham [0x4040b9] [m4b-1-8:11483] *** End of error message *** On Fri, Oct 3, 2008 at 3:20 PM, Daniel Hansen <dhan...@byu.net> wrote: I have been testing some code against openmpi lately that always causes it to crash during certain mpi function calls. The code does not seem to be the problem, as it runs just fine against mpich. I have tested it against openmpi 1.2.5, 1.2.6, and 1.2.7 and they all exhibit the same problem. Also, the problem only occurs in openmpi when running more than 16 processes. I have posted this stack trace to the list before, but I am submitting it now as a potential bug report. I need some help debugging it and finding out exactly what is going on in openmpi when the segfault occurs. Are there any suggestions on how best to do this? Is there an easy way to attach gdb to one of the processes or something?? I have already compiled openmpi with debugging, memory profiling, etc. How can I best take advantage of these features? Thanks, Daniel Hansen Systems Administrator BYU Fulton Supercomputing Lab ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] 1.2.2 to 1.2.7 differences.
Shafagh, You should be able to google modules. it should take you to http:// modules.sourceforge.net. That is where the software is. Doug On Oct 1, 2008, at 9:32 PM, Shafagh Jafer wrote: could you please be specific on what I should google? please give me the keywords. I couldn't hit the target:< --- On Wed, 10/1/08, Doug Reeder <d...@rain.org> wrote: From: Doug Reeder <d...@rain.org> Subject: Re: [OMPI users] 1.2.2 to 1.2.7 differences. To: "Open MPI Users" <us...@open-mpi.org> Date: Wednesday, October 1, 2008, 8:58 PM Shafagh, You should be able to run whatever version of open-mpi you want. You just need to make sure that in the build and run steps that you don't mix the two. I have had good results using modules (you can google it, download it, build and install it) to keep them separate. You probably want to upgrade to gcc 3.x.x or 4.x.x and use module for that also). Doug Reeder On Oct 1, 2008, at 8:11 PM, Shafagh Jafer wrote: On our cluster we have RedHat Linux 7.3 Professional, and on the cluster specification it says the following: -The cluster should be able to run the follwoing software tools: gcc 2.96.x(or 2.95.x or 2.91.66) Bison 1.28 flex 2.5.4 mpich 1.2.5 So i am just wondering if my cluster is capable to run openmpi 1.2.7?? I haven't contacted the cluster technicians yet but i just wanted to know your answer first. Many thanks in advance. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] 1.2.2 to 1.2.7 differences.
Shafagh, You should be able to run whatever version of open-mpi you want. You just need to make sure that in the build and run steps that you don't mix the two. I have had good results using modules (you can google it, download it, build and install it) to keep them separate. You probably want to upgrade to gcc 3.x.x or 4.x.x and use module for that also). Doug Reeder On Oct 1, 2008, at 8:11 PM, Shafagh Jafer wrote: On our cluster we have RedHat Linux 7.3 Professional, and on the cluster specification it says the following: -The cluster should be able to run the follwoing software tools: gcc 2.96.x(or 2.95.x or 2.91.66) Bison 1.28 flex 2.5.4 mpich 1.2.5 So i am just wondering if my cluster is capable to run openmpi 1.2.7?? I haven't contacted the cluster technicians yet but i just wanted to know your answer first. Many thanks in advance. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] qsub - mpirun problem
It sounds like you may not have setup paswordless ssh between all your nodes. Doug Reeder On Sep 29, 2008, at 2:12 PM, Zhiliang Hu wrote: At 10:45 PM 9/29/2008 +0200, you wrote: Am 29.09.2008 um 22:33 schrieb Zhiliang Hu: At 07:37 PM 9/29/2008 +0200, Reuti wrote: "-l nodes=6:ppn=2" is all I have to specify the node requests: this might help: http://www.open-mpi.org/faq/?category=tm Essentially the examples given on this web is no difference from what I did. Only thing new is, I suppose "qsub -I " is for interactive mode. When I did this: qsub -I -l nodes=7 mpiblastn.sh It hangs on "qsub: waiting for job 798.nagrp2.ansci.iastate.edu to start". UNIX_PROMPT> qsub -l nodes=6:ppn=2 /path/to/mpi_program where "mpi_program" is a file with one line: /path/to/mpirun -np 12 /path/to/my_program Can you please try this jobscript instead: #!/bin/sh set | grep PBS /path/to/mpirun /path/to/my_program All should be handled by Open MPI automatically. With the "set" bash command you will get a list with all defined variables for further analysis; and where you can check for the variables set by Torque. -- Reuti "set | grep PBS" part had nothing in output. Strange - you checked the .o end .e files of the job? - Reuti There is nothing in -o nor -e output. I had to kill the job. I checked torque log, it shows (/var/spool/torque/server_logs): 09/29/2008 15:52:16;0100;PBS_Server;Job;799.xxx.xxx.xxx;enqueuing into default, state 1 hop 1 09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Queued at request of z...@xxx.xxx.xxx, owner = z...@xxx.xxx.xxx, job name = mpiblastn.sh, queue = default 09/29/2008 15:52:16;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent command new 09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Modified at request of schedu...@xxx.xxx.xxx 09/29/2008 15:52:27;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job deleted at request of z...@xxx.xxx.xxx 09/29/2008 15:52:27;0100;PBS_Server;Job;799.xxx.xxx.xxx;dequeuing from default, state EXITING 09/29/2008 15:52:27;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent command term 09/29/2008 15:52:47;0001;PBS_Server;Svr;PBS_Server;is_request, bad attempt to connect from 172.16.100.1:1021 (address not trusted - check entry in server_priv/nodes) where the server_priv/nodes has: node001 np=4 node002 np=4 node003 np=4 node004 np=4 node005 np=4 node006 np=4 node007 np=4 which was set up by the vender. What is "address not trusted"? Zhiliang ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] compile openmpi with a gcc that is not default gcc??
Shafagh, You could put the full paths to the 3.4.4 compiler in the configure arguments. See configure -help. Doug Reeder On Sep 27, 2008, at 3:21 PM, Shafagh Jafer wrote: I have a simple question: My default gcc is 2.95.3, so I installed a newer version in my own home directory, it's gcc-3.4.4. Now I want to install openmpi and compile it with this new version. I dont know how to force it not to pick the default one. I want it to use the 3.4.4 version. Please let me know what to do exactly. Thanks. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Configure and Build ok, but mpi module not recognized?
Jeff, I think that unless make all depends on make clean and make clean depends on Makefile, you have to manually run make clean and/or manually delete the module files. Doug Reeder On Sep 22, 2008, at 3:16 PM, Jeff Squyres wrote: On Sep 22, 2008, at 6:08 PM, Brian Harker wrote: Here's the config.log file...now that I look through it more carefully, I see some errors that I didn't see when watching ./configure scroll by...still don't know what to do though. :( Not to worry; there are many tests in configure that are designed to fail. So it's not a problem to see lots of failures in config.log. I see that it did use ifort for both the F77 and F90 compilers; that's what I wanted to check with configure output and config.log. Per Doug's comment, if OMPI is not re-compiling the Fortran module when you reconfigure with a new fortran compiler, that is likely a bug. Can you "make clean all install" and see if it works? If not, send all the output here (see http://www.open-mpi.org/ community/help/ for instructions; please compress). -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Configure and Build ok, but mpi module not recognized?
Brian, Try doing a make clean before doing the build with your new make file (from the new configure process). It looks like you are getting the leftover module files from the old makefile/compilers. Doug reeder On Sep 22, 2008, at 2:52 PM, Brian Harker wrote: Ok, here's something funny/weird/stupid: Looking at the actual mpi.mod module file in the $OPENMPI_HOME/lib directory, the very first line is: GFORTRAN module created from mpi.f90 on Fri Sep 19 14:01:27 2008 WTF!? I specified that I wanted to use the ifort/icc/icpc compiler suite when I installed (see my first post)! Why would it create the module with gfortran? This would seem to be the source of my troubles... On Mon, Sep 22, 2008 at 11:27 AM, Gus Correa <g...@ldeo.columbia.edu> wrote: Hi Brian and list I read your original posting and Jeff's answers. Here on CentOS from Rocks Cluster I have a "native" OpenMPI, with a mpi.mod, compiled with gfortran. Note that I don't even have gfortran installed! This is besides the MPI versions (MPICH2 and OpenMPI) I installed from scratch using combinations of ifort and pgi with gcc. It may be that mpif90 is not picking the right mpi.mod, as Jeff suggested. Something like this may be part of your problem. A "locate mpi.mod" should show what your system has. Have you tried to force the directory where mpi.mod is searched for? Something like this: /full/path/to/openmpi/bin/mpif90 -module /full/path/to/openmpi_mpi.mod_directory/ hello_f90.f90 The ifort man pages has the "-module" syntax details. I hope this helps. Gus Correa -- - Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA - Brian Harker wrote: Hi Gus- Thanks for the input. I have been using full path names to both the wrapper compilers and mpiexec from the first day I had two MPI implementations on my machine, depending on if I want to use MPICH or openMPI, but still the problem remains. ARGG! On Mon, Sep 22, 2008 at 9:40 AM, Gus Correa <g...@ldeo.columbia.edu> wrote: Hello Brian and list My confusing experiences with multiple MPI implementations were fixed the day I decided to use full path names to the MPI compiler wrappers (mpicc, mpif77, etc) at compile time, and to the MPI job launcher (mpirun, mpiexec, and so on) at run time, and to do this in a consistent fashion (using the tools from the same install to compile and to run the programs). Most Linux distributions come with built in MPI implementations (often times more than one), and so do commercial compilers and other tools. You end up with a mess of different MPI versions on your "native" PATH, as well as variety of bin, lib, and include directories containing different MPI stuff. The easy way around is to use full path names, particularly if you install yet another MPI implementation from scratch. Another way is to fix your PATH on your initialization files (.cshrc, etc) to point to your preferred implementation (put the appropriate bin directory ahead of everything else). Yet another is to install the "environment modules" package on your system and use it consistently. My two cents. Gus Correa -- --- -- Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA --- -- Brian Harker wrote: I built and installed both MPICH2 and openMPI from source, so no distribution packages or anything. MPICH2 has the modules located in /usr/local/include, which I assume would be found (since its in my path), were it not for specifying -I$OPENMPI_HOME/lib at compile time, right? I can't imagine that if you tell it where to look for the correct modules, it would search through your path first before going to where you tell it to go. Or am I too optimistic? Thanks again for the input! On Mon, Sep 22, 2008 at 8:58 AM, Jeff Squyres <jsquy...@cisco.com> wrote: On Sep 22, 2008, at 10:10 AM, Brian Harker wrote: Thanks for the reply...crap, $HOME/openmpi/lib does contains all the various lilbmpi* files as well as mpi.mod, That should be correct. but still get the same error at compile-time. Yes, I made sure to specifically build openMPI with ifort 10.1.012, and did run the --showme command right after installation to make sure the wrapper compiler was using ifort as well. Ok, good. Before posting to this mailing list, I did uninstall and re- install openMPI several times to make sure I had a clean install.
Re: [OMPI users] How to get started?
Yes, I run it on my dual core apple notebook. Doug Reeder On Aug 15, 2008, at 9:58 AM, Anugraha Sankaranarayanan wrote: >>Are you talking about single notebook or multiple? Doesn't make sense to just have it single machine - unless you're building codes that gonna go into a cluster. I have a HP Compaq Notebook with dual core processor.Can i use MPI in this?For learning purpose? ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] mpirun on 8-way node with rsh
Pete, I don't know why the behavior on an 8 processor machine differs with the machine file format/syntax. You don't need to specify a machine file on a single multiprocessor machine. On you torque scheduled cluster you shouldn't need a machine file for openmpi. Openmpi should just use the number of processors you requested from torque. It will communicate with torque to find out which ones to use. Doug Reeder On Aug 3, 2008, at 10:45 AM, Pete Schmitt wrote: I use the following: mpirun -machinefile machine.file -np 8 ./mpi- program and the machine file has the following: t01 t01 t01 t01 t01 t01 t01 t01 I get the following error: rm_12992: (0.632812) net_send: could not write to fd=4, errno = 32 rm_13053: (0.421875) net_send: could not write to fd=4, errno = 32 rm_l_3_13050: (0.636719) net_send: could not write to fd=5, errno = 32 rm_13114: (0.210938) net_send: could not write to fd=4, errno = 32 rm_12870: (1.066406) net_send: could not write to fd=4, errno = 32 rm_12931: (0.855469) net_send: could not write to fd=4, errno = 32 rm_l_4_13111: (0.425781) net_send: could not write to fd=5, errno = 32 rm_l_1_12929: (1.070312) net_send: could not write to fd=5, errno = 32 rm_l_2_12989: (0.859375) net_send: could not write to fd=5, errno = 32 rm_l_5_13172: (0.214844) net_send: could not write to fd=5, errno = 32 p0_12866: (5.285156) net_send: could not write to fd=4, errno = 32 If I use np=6 or less, it works fine. It also works with 8 if the machine.file just contains t01:8 Since we want to submit this to a torque/moab cluster, it's not possible to get the latter format. The OS is a 64b RH5.2 -- Pete Schmitt Technical Director: Discovery Cluster / Computational Genetics Lab URL: http://discovery.dartmouth.edu 179M Berry Baker Library, HB 6224 Dartmouth College Hanover, NH 03755 Dart: 603-646-8109 DHMC: 603-653-3598 Fax: 603-646-1042 Cell: 603-252-2452 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] getting fortran90 to compile
Zachary, I believe you need to ad F90=/usr/bin/gfortran-4.2 (or something similar) to the configure arguments, FC= just gets f77 support. Doug Reeder On Jul 13, 2008, at 8:58 AM, zach wrote: I installed openmpi like ./configure --prefix= FC=/usr/bin/gfortran-4.2 make all install When i type mpif90 file1.f90 file2.f90 file3.f90 I get Unfortunately, this installation of Open MPI was not compiled with Fortran 90 support. As such, the mpif90 compiler is non-functional. What am i doing wrong? Zachary ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] gfortran bindings apparently not built on mac os leopard
Greg, In your run_output file you don't appear to be using the openmpi versions that you built. From your make-install.out file it looks like your versions are in /usr/local/openmpi/1.2.6-gcc4.0/bin. You need to use that absolute path or prepend that path to your PATH environment variable. Doug Reeder On Jun 16, 2008, at 9:25 AM, Weirs, V Gregory wrote: I am having trouble building mpif77/mpif90 with gfortran on Mac OS 10.5. Or maybe just running. The configure, make all, and make install seemed to go just fine, finding my gfortran and apparently using it, but the scripts mpif77 and mpif90 give the error that my openmpi was not built with fortran bindings. Mpicc and mpicxx don’t give this error. Ompi_info says the f77 and f90 bindings were built. I know that OS X 10.5 comes with openmpi mpicc and mpicxx installed, but not fortran bindings, and I was careful to put the openmpi I built first in the path. Some run output (mpif77 —version, ompi_info), config.log, configure.log, make.out, make-install.out are in the attached tarball. Any clues? Thanks, Greg -- V. Gregory Weirs Sandia National Laboratoriesvgwe...@sandia.gov P.O.Box 5800, MS 0378phone: 505 845 2032 Albuquerque, NM 87185-0378 fax: 505 284 0154 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Different CC for orte and opmi?
Ashley, I had a similar situation linking to the intel libraries and used the following in the link step -L/opt/intel/compiler_10.1/x86_64/lib -Wl,-non_shared -limf -lsvml - lintlc -Wl,-call_shared This created binaries statically linked to the intel compiler libraries so I didn't have to push the intel libraries out to the nodes or worry about the LD_LIBRARY_PATH. Doug Reeder On Jun 10, 2008, at 4:28 AM, Ashley Pittman wrote: Sorry, I'll try and fill in the background. I'm attempting to package openmpi for a number of customers we have, whenever possible on our clusters we use modules to provide users with a choice of MPI environment. I'm using the 1.2.6 stable release and have built the code twice, once to /opt/openmpi-1.2.6/gnu and once to /opt/openmpi-1.2.6/intel, I have create two modules environments called openmpi-gnu and openmpi- intel and am also using a existing one called intel-compiler. The build was successful in both cases. If I load the openmpi-gnu module I can compile and run code using mpicc/mpirun as expected, if I load openmpi-intel and intel-compiler I find I can compile code but I get an error about missing libimf.so when I try to run it (reproduced below). The application *will* run if I add the line "module load intel-compiler" to my bashrc as this allows orted to link. What I think I want to do is to compile the actual library with icc but to compile orted with gcc so that I don't need to load the intel environment by default. I'm assuming that the link problems only exist with orted and not with the actual application as the LD_LIBRARY_PATH is set correctly in the shell which is launching the program. Ashley Pittman. sccomp@demo4-sles-10-1-fe:~/benchmarks/IMB_3.0/src> mpirun -H comp00,comp01 ./IMB-MPI1 /opt/openmpi-1.2.6/intel/bin/orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory /opt/openmpi-1.2.6/intel/bin/orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory [demo4-sles-10-1-fe:29303] ERROR: A daemon on node comp01 failed to start as expected. [demo4-sles-10-1-fe:29303] ERROR: There may be more information available from [demo4-sles-10-1-fe:29303] ERROR: the remote shell (see above). [demo4-sles-10-1-fe:29303] ERROR: The daemon exited unexpectedly with status 127. [demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275 [demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166 [demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90 [demo4-sles-10-1-fe:29303] ERROR: A daemon on node comp00 failed to start as expected. [demo4-sles-10-1-fe:29303] ERROR: There may be more information available from [demo4-sles-10-1-fe:29303] ERROR: the remote shell (see above). [demo4-sles-10-1-fe:29303] ERROR: The daemon exited unexpectedly with status 127. [demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188 [demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1198 -- mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS. -- $ ldd /opt/openmpi-1.2.6/intel/bin/orted linux-vdso.so.1 => (0x7fff877fe000) libopen-rte.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen- rte.so.0 (0x7fe97f3ac000) libopen-pal.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen- pal.so.0 (0x7fe97f239000) libdl.so.2 => /lib64/libdl.so.2 (0x7fe97f135000) libnsl.so.1 => /lib64/libnsl.so.1 (0x7fe97f01f000) libutil.so.1 => /lib64/libutil.so.1 (0x7fe97ef1c000) libm.so.6 => /lib64/libm.so.6 (0x7fe97edc7000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fe97ecba000) libpthread.so.0 => /lib64/libpthread.so.0 (0x7fe97eba3000) libc.so.6 => /lib64/libc.so.6 (0x7fe97e972000) libimf.so => /opt/intel/compiler_10.1/x86_64/lib/libimf.so (0x7fe97e61) libsvml.so => /opt/intel/compiler_10.1/x86_64/lib/ libsvml.so (0x7fe97e489000) libintlc.so.5 => /opt/intel/compiler_10.1/x86_64/lib/ libintlc.so.5 (0x7fe97e35) /lib64/ld-linux-x86-64.so.2 (0x7fe97f525000) $ ssh comp00 ldd /opt/openmpi-1.2.6/intel/bin/orted libopen-rte.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen- rte.so.0 (0x2b1f0c0c5000) libopen-pal.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen- pal.so.0 (0x2b1f0c23e000) libdl.so.2 => /lib64/libdl.so.2 (0x2b1f0c3bc000) libnsl.so.1 => /lib64/libnsl.so.1
Re: [OMPI users] Different CC for orte and opmi?
Ashley, I am confused. In your first post you said orted fails, with link errors, when you try to launch a job. From this I inferred that the build and install steps for creating openmpi were successful. Was the build/install step successful. If so what dynamic libraries does ldd say that orted is using. Doug Reeder On Jun 9, 2008, at 12:54 PM, Ashley Pittman wrote: Putting to side any religious views I might have about static linking how would that help in this case? It appears to be orted itself that fails to link, I'm assuming that the application would actually run, either because the LD_LIBRARY_PATH is set correctly on the front end or the --prefix option to mpirun. Or do you mean static linking of the tools? I could go for that if there is a configure option for it. Ashley Pittman. On Mon, 2008-06-09 at 08:27 -0700, Doug Reeder wrote: Ashley, It could work but I think you would be better off to try and statically link the intel libraries. Doug Reeder On Jun 9, 2008, at 4:34 AM, Ashley Pittman wrote: Is there a way to use a different compiler for the orte component and the shared library component when using openmpi? We are finding that if we use icc to compile openmpi then orted fails with link errors when I try and launch a job as the intel environment isn't loaded by default. We use the module command heavily and have modules for openmpi- gnu and openmpi-intel as well as a intel_compiler module. To use openmpi- intel we have to load intel_compiler by default on the compute nodes which isn't ideal, is it possible to compile the orte component with gcc and the library component with icc? Yours, Ashley Pittman, ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Different CC for orte and opmi?
Ashley, It could work but I think you would be better off to try and statically link the intel libraries. Doug Reeder On Jun 9, 2008, at 4:34 AM, Ashley Pittman wrote: Is there a way to use a different compiler for the orte component and the shared library component when using openmpi? We are finding that if we use icc to compile openmpi then orted fails with link errors when I try and launch a job as the intel environment isn't loaded by default. We use the module command heavily and have modules for openmpi-gnu and openmpi-intel as well as a intel_compiler module. To use openmpi- intel we have to load intel_compiler by default on the compute nodes which isn't ideal, is it possible to compile the orte component with gcc and the library component with icc? Yours, Ashley Pittman, ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Open MPI instructional videos
Jeff, I believe that with quicktime-pro you can export the videos in several formats. Doug Reeder On Jun 3, 2008, at 1:48 PM, Jeff Squyres wrote: On May 30, 2008, at 9:55 AM, Andreas Schäfer wrote: I've never really dig into Open MPI's guts, not because I wasn't interested, but mainly because the time required to get my bearings seemed just too much. Until now. I've watched a couple of the videos while coding and it was pretty awesome. Easy to understand, structured and well spoken. Good! I'm glad you've found them useful. - Do you like the format? - Is the (slides+narration) format useful? Yes, I like it a lot. I guess a pure podcast would be insufficient for complex issues where you simply need diagrams. That was definitely my thought here -- pictures can be worth a million words, etc. Maybe a small suggestion: maybe it's just me, but I'd actually prefer (even) leaner slides. Currently you're basically duplicating on screen what you're saying, which is good when you're a nervous, moumbling college student and might lose your audience somewhere. But when you're an experenced speaker (which you obviously are), the audience does rarely need this redundancy and might rather get confused when trying to digest both streams of information (visual and auditory) simultaneously. But this is of course a question of personal preference. Thanks for the compliment snuggled in there. :-) Yes, this might be a style thing -- I have found that at least some people like to have slides that are more-or-less what the speaker actually said for two reasons: - so that the visuals and audio agree with each other -- it's not two different through processes while you're trying to absorb the information. Sure, some people read ahead on the slide and get bored because the speaker eventually catches up, but at least in my experience, these people are a minority. - more importantly, however, the audience likes to take the slides away and when they actually look at them 6 weeks after the lecture, they might actually remember the content better because they received the same information via two forms of sensory input (audio + visual). - Would terminal screen-scrape sessions be useful? I'd prefer how-to pages for this, as you can copy the commands directly into your own shell. Good point. - ...other [low-budget] suggestions? Maybe an a tad higher audio bitrate. And some people don't like the .mov format, but that isn't really important. Ok, I can bump up the audio rate and see what happens to the filesize (that was my prime concern, actually). Plus it *is* just the builtin microphone on my Mac, so it may not be the greatest sound quality to begin with. :-) As for .mov, yes, this is definitely a compromise. I tried uploading the videos to YouTube and Google Video and a few others, but a) most have a time or file size restriction (e.g., 10 mins max) -- I was not willing to spend the extra work to split up the videos into multiple segments, and b) they down-res'ed the videos so much as to make the slides look crappy and/or unreadable. So I had to go with the video encoder that I could get for darn little money (Cisco's a big company, but my budget is still tiny :-) ). That turned out to be a fun little program called iShowU for OS X that does screen scraping + audio capture. It outputs Quicktime movies, so that was really my only choice. Is it a real hardship for people to install the QT player? Are there easy-to-install convertors? I'm not opposed to hosting it in multiple formats if it's easy and free to convert them. -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] openmpi 32-bit g++ compilation issue
Arif, It looks like your system is 64 bit by default and it therefore doesn't pick up the 32 bit libraries automatically at the link step (note the -L/.../x86_64-suse-linux/lib entries prior to the correspond entries pointing to the 32 bit library versions). I don't use suse linux so I don't know if this is something you can control in the configure step for open-mpi. Doug Reeder On May 19, 2008, at 2:48 PM, Arif Ali wrote: Hi, OS: SLES10 SP1 OFED: 1.3 openmpi: 1.2 1.2.5 1.2.6 compilers: gcc g++ gfortran I am creating a 32-bit build of openmpi on an Infiniband cluster, and the compilation gets stuck, If I use the /usr/lib64/gcc/x86_64- suse-linux/4.1.2/32/libstdc++.so library manually it compiles that piece of code. I was wandering if anyone else has had this problem. Or is there any other way of getting this to work. I feel that there may be something very silly here that I have missed out. but I can't seem to gather it. I have also tried this on a fresh install of OFED 1.3 with openmpi 1.2.6 libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../../../opal/include - I../../../orte/include -I../../../ompi/include - DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 - I../../.. -O3 -DNDEBUG -m32 -finline-functions -pthread -MT file.lo -MD -MP -MF .deps/file.Tpo -c file.cc -fPIC -DPIC -o .libs/file.o depbase=`echo win.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../../../libtool --tag=CXX --mode=compile g++ - DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include - I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 - DOMPI_SKIP_MPICXX=1 -I../../..-O3 -DNDEBUG -m32 -finline- functions -pthread -MT win.lo -MD -MP -MF $depbase.Tpo -c -o win.lo win.cc &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../../../opal/include - I../../../orte/include -I../../../ompi/include - DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 - I../../.. -O3 -DNDEBUG -m32 -finline-functions -pthread -MT win.lo - MD -MP -MF .deps/win.Tpo -c win.cc -fPIC -DPIC -o .libs/win.o /bin/sh ../../../libtool --tag=CXX --mode=link g++ -O3 -DNDEBUG - m32 -finline-functions -pthread -export-dynamic -m32 -o libmpi_cxx.la -rpath /opt/openmpi/1.2.6/gnu_4.1.2/32/lib mpicxx.lo intercepts.lo comm.lo datatype.lo file.lo win.lo -lnsl -lutil -lm libtool: link: g++ -shared -nostdlib /usr/lib64/gcc/x86_64-suse- linux/4.1.2/../../../../lib/crti.o /usr/lib64/gcc/x86_64-suse-linux/ 4.1.2/32/crtbeginS.o .libs/mpicxx.o .libs/intercepts.o .libs/ comm.o .libs/datatype.o .libs/file.o .libs/win.o -Wl,-rpath -Wl,/ usr/lib64/gcc/x86_64-suse-linux/4.1.2 -Wl,-rpath -Wl,/usr/lib64/gcc/ x86_64-suse-linux/4.1.2 -lnsl -lutil -L/usr/lib64/gcc/x86_64-suse- linux/4.1.2/32 -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../ x86_64-suse-linux/lib/../lib -L/usr/lib64/gcc/x86_64-suse-linux/ 4.1.2/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib64/ gcc/x86_64-suse-linux/4.1.2 -L/usr/lib64/gcc/x86_64-suse-linux/ 4.1.2/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc/x86_64- suse-linux/4.1.2/../../.. /usr/lib64/gcc/x86_64-suse-linux/4.1.2/ libstdc++.so -lm -lpthread -lc -lgcc_s /usr/lib64/gcc/x86_64-suse- linux/4.1.2/32/crtendS.o /usr/lib64/gcc/x86_64-suse-linux/ 4.1.2/../../../../lib/crtn.o -m32 -pthread -m32 -pthread -Wl,- soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0 /usr/lib64/gcc/x86_64-suse-linux/4.1.2/libstdc++.so: could not read symbols: File in wrong format collect2: ld returned 1 exit status -- Arif Ali Software Engineer OCF plc Mobile: +44 (0)7970 148 122 DDI:+44 (0)114 257 2240 Office: +44 (0)114 257 2200 Fax:+44 (0)114 257 0022 Email: a...@ocf.co.uk Web:http://www.ocf.co.uk Support Phone: +44 (0)845 702 3829 Support E-mail: supp...@ocf.co.uk Skype: arif_ali80 MSN:a...@ocf.co.uk This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Install BLACS and ScaLAPACK on Leopard
Linwei, Did you build the liblapack.a file, it is of the wrong architecture. Doug Reeder On May 7, 2008, at 2:58 PM, Linwei Wang wrote: Hi, Doug I've checked the makefiles and make sure that flag -m64 is used for all the compiling but the error still exists.. Linwei On May 7, 2008, at 5:33 PM, Doug Reeder wrote: Linwei, It looks like you are getting a mix of 32 and 64 bit code (hence the 'file is not of required architecture' error). Are you using the command line flag -m64 for some parts of the build and not for others. You need to use either -m32 or -m64 for all the builds. Doug Reeder On May 7, 2008, at 2:25 PM, Linwei Wang wrote: Dear sir, Thanks very much for your detailed guideline~ I'm now trying to follow it out~ I've installed gcc 4.3 & openmpi~ When compiling CLAPACK, I'm trying to use the optimized BLAS library by ATLAS, so I set the BLASLIB in the make.inc as: BLASLIB = ../../libcblaswr.a -lcblas -latlas then build the libraries (also before that, I built the f2clib following the guideline in netlib It went well, but when I tried to built the blas testing code, it generates errors for "undefined symbols" looks like those should be in the f2clib, but I already built it "gccsblat2.o \ ../../F2CLIBS/libf2c.a -lm -o ../xblat2s Undefined symbols: "_f2c_ssbmv", referenced from: _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schk2_ in sblat2.o "_f2c_sgbmv", referenced from: _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schk1_ in sblat2.o ..." On the other side, when compiling ATLAS, I did the configure as you said and "make build" went well. But when I tried "make check" for testing, it again give errors for "undefined symbols"... "d: warning in /Users/maomaowlw/ATLAS/build/lib/liblapack.a, file is not of required architecture Undefined symbols: "_ATL_slauum", referenced from: _test_inv in sinvtst.o "_ATL_strtri", referenced from: _test_inv in sinvtst.o "_ATL_spotrf", referenced from: _test_inv in sinvtst.o "_ATL_sgetrf", referenced from: _test_inv in sinvtst.o "_ATL_sgetri", referenced from: _test_inv in sinvtst.o " I'm not sure where is the problem? Can you provide any help? Thanks again! Linwei On May 6, 2008, at 11:11 AM, Gregory John Orris wrote: Points to clarify if I may, having gone through this relatively recently: g77 and gfortran are NOT one and the same. gfortran from sourceforge works well, but it is based on gnu gcc 4.3 and not on the gnu gcc 4.0.1 that comes with Leopard. Your best bet is to download the ENTIRE gcc package from sourceforge and install it into /usr/local. This includes gcc, g++, and gfortran. Then you will need to do a number of things to actually get a reliable set of packages all compiled from the same version of gcc 4.3. Why? Because 4.3 seems to be notoriously faster. AND, I had a lot of problems integrating the 4.0.1 libs with the 4.3 libs without errors 1. download CLAPACK-3.1.1 from netlib And compile 2. Download ATLAS-1.8 from dourceforge (netlib is a little behind here) and configure it with the --with-netlib-lapack=your just compiled lapack from CLAPACK 3. Download OpenMPI 1.2.6 and install it also so that openMPI will have the fortran not installed with Leopard. 4. NOW you can compile BLACS and ScaLAPACK In all of this you will need to do a couple of additional things like set the env's setenv LDFLAGS "-L/usr/local/lib/x86_64" setenv DYLD_LIBRARY_PATH "your openmpi path" setenv LD_LIBRARY_PATH "your openmpi path" Do all this right and make sure you compile with the -m64 - mtune=core2 flags and you will be golden. So what will you have--- A new cblas, atlas, lapack, openmpi, fortran, c, c++, blacs, and scalapack. All on the same version of gnu c. Alternatively you can buy and use the intel compiler. It is significantly faster than gfortran, but it has a host of other problems associated with it. But if you follow the outline above, you will be left with the best that's available. I have lots more info on this, but time is short. FINALLY, and this is important, DO NOT FORGET ABOUT THE small STACK size on Mac's when using gfortran. It's so small that it's useless for large parallel jobs. On May 6, 2008, at 10:09 AM, Jeff Squyres wrote: FWIW, I'm not a fortran expert, but if you built your Fortran libraries with g77 and then tried to link against them with gfortran, you might run into problems. My advice would be to use a single fortran compiler for building everything: Open
Re: [OMPI users] Install BLACS and ScaLAPACK on Leopard
Linwei, It looks like you are getting a mix of 32 and 64 bit code (hence the 'file is not of required architecture' error). Are you using the command line flag -m64 for some parts of the build and not for others. You need to use either -m32 or -m64 for all the builds. Doug Reeder On May 7, 2008, at 2:25 PM, Linwei Wang wrote: Dear sir, Thanks very much for your detailed guideline~ I'm now trying to follow it out~ I've installed gcc 4.3 & openmpi~ When compiling CLAPACK, I'm trying to use the optimized BLAS library by ATLAS, so I set the BLASLIB in the make.inc as: BLASLIB = ../../libcblaswr.a -lcblas -latlas then build the libraries (also before that, I built the f2clib following the guideline in netlib It went well, but when I tried to built the blas testing code, it generates errors for "undefined symbols" looks like those should be in the f2clib, but I already built it "gccsblat2.o \ ../../F2CLIBS/libf2c.a -lm -o ../xblat2s Undefined symbols: "_f2c_ssbmv", referenced from: _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schk2_ in sblat2.o "_f2c_sgbmv", referenced from: _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schke_ in sblat2.o _schk1_ in sblat2.o ..." On the other side, when compiling ATLAS, I did the configure as you said and "make build" went well. But when I tried "make check" for testing, it again give errors for "undefined symbols"... "d: warning in /Users/maomaowlw/ATLAS/build/lib/liblapack.a, file is not of required architecture Undefined symbols: "_ATL_slauum", referenced from: _test_inv in sinvtst.o "_ATL_strtri", referenced from: _test_inv in sinvtst.o "_ATL_spotrf", referenced from: _test_inv in sinvtst.o "_ATL_sgetrf", referenced from: _test_inv in sinvtst.o "_ATL_sgetri", referenced from: _test_inv in sinvtst.o " I'm not sure where is the problem? Can you provide any help? Thanks again! Linwei On May 6, 2008, at 11:11 AM, Gregory John Orris wrote: Points to clarify if I may, having gone through this relatively recently: g77 and gfortran are NOT one and the same. gfortran from sourceforge works well, but it is based on gnu gcc 4.3 and not on the gnu gcc 4.0.1 that comes with Leopard. Your best bet is to download the ENTIRE gcc package from sourceforge and install it into /usr/local. This includes gcc, g++, and gfortran. Then you will need to do a number of things to actually get a reliable set of packages all compiled from the same version of gcc 4.3. Why? Because 4.3 seems to be notoriously faster. AND, I had a lot of problems integrating the 4.0.1 libs with the 4.3 libs without errors 1. download CLAPACK-3.1.1 from netlib And compile 2. Download ATLAS-1.8 from dourceforge (netlib is a little behind here) and configure it with the --with-netlib-lapack=your just compiled lapack from CLAPACK 3. Download OpenMPI 1.2.6 and install it also so that openMPI will have the fortran not installed with Leopard. 4. NOW you can compile BLACS and ScaLAPACK In all of this you will need to do a couple of additional things like set the env's setenv LDFLAGS "-L/usr/local/lib/x86_64" setenv DYLD_LIBRARY_PATH "your openmpi path" setenv LD_LIBRARY_PATH "your openmpi path" Do all this right and make sure you compile with the -m64 - mtune=core2 flags and you will be golden. So what will you have--- A new cblas, atlas, lapack, openmpi, fortran, c, c++, blacs, and scalapack. All on the same version of gnu c. Alternatively you can buy and use the intel compiler. It is significantly faster than gfortran, but it has a host of other problems associated with it. But if you follow the outline above, you will be left with the best that's available. I have lots more info on this, but time is short. FINALLY, and this is important, DO NOT FORGET ABOUT THE small STACK size on Mac's when using gfortran. It's so small that it's useless for large parallel jobs. On May 6, 2008, at 10:09 AM, Jeff Squyres wrote: FWIW, I'm not a fortran expert, but if you built your Fortran libraries with g77 and then tried to link against them with gfortran, you might run into problems. My advice would be to use a single fortran compiler for building everything: Open MPI, your libraries, your apps. I prefer gfortran because it's more modern, but I have not done any performance evaluations of gfortran vs. g77 -- I have heard [unverified] anecdotes that gfortran is "slower" than g77 -- google around and see what the recent buzz is. FW
Re: [OMPI users] Install BLACS and ScaLAPACK on Leopard
Linwei, Have you tried using -funderscoring with gfortran. I don't think the trouble you are having is caused by having g77 and gfortran both installed. Do you know where the unreferenced symbols (_s_wsle, _e_wsle, etc ) are supposed to be coming from. If they are in your fortran programs then using -funderscoring should help. Doug Reeder On May 5, 2008, at 11:21 AM, Linwei Wang wrote: Dear Reeder, I've tried add gfortran flag "-fno-underscoring", but the same errors persist... Is that possible because that I have both g77 and gfortran in my computer? Best, Linwei On May 5, 2008, at 1:17 PM, Doug Reeder wrote: Linwei, Is there a problem with trailing underscores. Are you linking c/c++ files with fortran. Do the _s_wsle family members need to have a trailing underscore where are the unrefernced symbols supposed to be coming from. If they have a trailing underscore in their names you probably need to add a command line flag to you fortran command to append the underscore. Doug Reeder On May 5, 2008, at 10:12 AM, Linwei Wang wrote: Dear Dr. Simon, Do I need to remove g77 from my computer then? Since after installing gfortran (for Leopard), there is some link problem with gfortran.. When I try to build some routines in the BLACS, it gives error like: Undefined symbols: "_s_wsle", referenced from: _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o "_e_wsle", referenced from: _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o "_do_lio", referenced from: _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o _MAIN__ in tc_fCsameF77.o "_s_stop", referenced from: _MAIN__ in tc_fCsameF77.o ld: symbol(s) not found collect2: ld returned 1 exit status for some which is successfully built, it can not be run either, giving errors like: iris-wl03:14541] *** Process received signal *** [iris-wl03:14541] Signal: Bus error (10) [iris-wl03:14541] Signal code: (2) [iris-wl03:14541] Failing at address: 0xe3 [iris-wl03:14541] [ 0] 2 libSystem.B.dylib 0x955f45eb _sigtramp + 43 [iris-wl03:14541] [ 1] 3 ??? 0x 0x0 + 4294967295 [iris-wl03:14541] [ 2] 4 xcmpi_sane 0x1cc3 main + 51 [iris-wl03:14541] [ 3] 5 xcmpi_sane 0x1c56 start + 54 [iris-wl03:14541] *** End of error message *** mpirun noticed that job rank 0 with PID 14541 on node iris- wl03.rit.edu exited on signal 10 (Bus error). The second problem happens when I use g77 too, but there were no linking problems with g77... Thanks for any help! Best, Linwei On May 2, 2008, at 7:04 AM, Christian Simon wrote: Dear Linwei, On 1 mai 08, at 20:32, Linwei Wang wrote: other type at (1) [info -f g77 M GLOBALS] What compiler are you using ? -- Dr. Christian SIMON Laboratoire LI2C-UMR7612 Universite Pierre et Marie Curie Case 51 4 Place Jussieu 75252 Paris Cedex 05 France/Europe ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Install BLACS and ScaLAPACK on Leopard
Linwei, mpif.h is the include file for fortran programs to use openmpi. The apple version does not support fortran. If you want to use openmpi from fortran you will need to install a version of openmpi that supports fortran, this will install mpif.h. I suggest you install the new version in a different directory than the apple version ( use -- prefix in the openmpi configure command). You will also need to remove the apple version or rename the openmpi include and library files so that the linker can find your new, fortran supporting version. Doug Reeder On May 1, 2008, at 8:42 AM, Linwei Wang wrote: Dear all, I'm new to openmpi. I'm now trying to use BLACS and ScaLAPACK on Leopard. Since it has built-in Open MPI, I didn't install any other versions. I followed the BLACS install guidances in FAQ section, and it generated errors as: "No rule to make target `/usr/include/mpif.h', needed by `mpif.h'. Stop." The problem is I could not find "mpif.h" in my computer. Does this mean I should install other Open MPI version rather than using Leopard's built-in version? Thanks for the help! Best, Linwei ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] trouble building on a macbook
Robert, Did you mean to install openmpi-1.2.6 in /usr. That is where the apple supplied openmpi-1.2.3 in is installed. That doesn't appear to be the problem causing your make install error. Were there any warnings or errors when you ran make. Doug Reeder On Apr 27, 2008, at 1:11 PM, Robert Taylor wrote: I have had trouble building on an macbook running OS X 10.5.2 Specifically it fails after the configure when I run make all -- files attached. Is this the right place to get help? I do note that the ompi_config.h is in ompi/include not share/include. rlt___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] install intel mac with Laopard
Jeff, I think that error message is a good compromise and addresses the most common problems that cause it to be written. Doug Reeder On Apr 25, 2008, at 4:08 AM, Jeff Squyres wrote: Sorry, I should have been more specific: how about this? ** It appears that your Fortran 77 compiler is unable to link against object files created by your C compiler. This typically indicates one of a few possibilities: - A conflict between CFLAGS and FFLAGS - A problem with your compiler installation(s) - Different default build options between compilers (e.g., C building for 32 bit and Fortran building for 64 bit) - Incompatible compilers Such problems can usually be solved by picking compatible compilers and/or CFLAGS and FFLAGS. More information (including exactly what command was given to the compilers and what error resulted when the commands were executed) is available in the config.log file in this directory. ** On Apr 25, 2008, at 7:00 AM, Jeff Squyres wrote: How about a compromise -- I'll extend the message to also include the possibility of architecture mismatches. -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] install intel mac with Laopard
Jeff, I don't know if it there is a way to capture the "not of required architecture" response and add it to the error message. I agree that the current error message captures the problem in broad terms and points to the config.log file. It is just not very specific. If the architecture problem can't be added to the error message then I think we are stuck with what we have. If that is the case is it worthwhile to add this to the FAQ for building openmpi. Doug On Apr 24, 2008, at 9:34 AM, Jeff Squyres wrote: On Apr 24, 2008, at 12:24 PM, George Bosilca wrote: There are so many special errors that are compiler and operating system dependent that there is no way to handle each of them specifically. And even if it was possible, I will not use autoconf if the resulting configure file was 100MB ... More specifically, the error messages in config.log are mostly written by the compiler/linker (i.e., redirect stdout/stderr from the command line to config.log). We don't usually modify that -- the Autoconf Way is that Autoconf is 100% responsible for config.log. Additionally, I think the error message is more than clear. It clearly state that the problem is coming from a mismatch between the CFLAGS and FFLAGS. There is even a hint that one has to look in config.log to find the real cause... As George specifies, the stdout from configure is what we can most directly affect, and that's why we chose to output this message: * It appears that your Fortran 77 compiler is unable to link against * object files created by your C compiler. This generally indicates * either a conflict between the options specified in CFLAGS and FFLAGS * or a problem with the local compiler installation. More * information (including exactly what command was given to the * compilers and what error resulted when the commands were executed) is * available in the config.log file in this directory. OMPI doesn't know *why* the test link failed; we just know that it failed. I agree with George that trying to put in compiler-specific stdout/stderr analysis is a black hole that would be extraordinarily difficult. Do you have any suggestions for re-wording this message? That's probably the best that we can do. george. On Apr 24, 2008, at 11:57 AM, Doug Reeder wrote: Jeff, For the specific problem of the gcc compiler creating i386 objects and ifort creating x86_64 objects, in the config.log file it says configure:26935: ifort -o conftest conftest.f conftest_c.o >&% ld: warning in conftest_c.o, file is not of required architecture If configure could pick up on this and write an error message something like "Your C and fortran compilers are creating objects for different architectures. You probably need to change your CFLAG or FFLAG arguments to ensure that they are consistent" it would point the user more directly to the real problem. Right now the information is in the config.log file but it doesn't jump out at you. Doug Reeder On Apr 24, 2008, at 8:40 AM, Jeff Squyres wrote: On Apr 24, 2008, at 11:07 AM, Doug Reeder wrote: Make sure that your compilers are all creaqting code for the same architecture (i386 or x86-64). ifort usually installs such that the 64 bit version of the compiler is the dfault while the apple gcc compiler creates i386 output by default. Check the architecture of the .o files with file *.o and if the gcc output needs to be x86_64 add the -m64 flag to the c and c++ flags. That has worked for me. You shouldn't need the intel c/c++ compilers. I find the configure error message to be a little bit cryptic and not very insightful. Do you have a suggestion for a new configure error message? I thought it was very clear, but then again, I'm one of the implementors... checking if C and Fortran 77 are link compatible... no * *** ** * It appears that your Fortran 77 compiler is unable to link against * object files created by your C compiler. This generally indicates * either a conflict between the options specified in CFLAGS and FFLAGS * or a problem with the local compiler installation. More * information (including exactly what command was given to the * compilers and what error resulted when the commands were executed) is * available in the config.log file in this directory. * *** ** configure: error: C and Fortran 77 compilers are not link compatible. Can not continue. -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org
Re: [OMPI users] install intel mac with Laopard
Jeff, For the specific problem of the gcc compiler creating i386 objects and ifort creating x86_64 objects, in the config.log file it says configure:26935: ifort -o conftest conftest.f conftest_c.o >&% ld: warning in conftest_c.o, file is not of required architecture If configure could pick up on this and write an error message something like "Your C and fortran compilers are creating objects for different architectures. You probably need to change your CFLAG or FFLAG arguments to ensure that they are consistent" it would point the user more directly to the real problem. Right now the information is in the config.log file but it doesn't jump out at you. Doug Reeder On Apr 24, 2008, at 8:40 AM, Jeff Squyres wrote: On Apr 24, 2008, at 11:07 AM, Doug Reeder wrote: Make sure that your compilers are all creaqting code for the same architecture (i386 or x86-64). ifort usually installs such that the 64 bit version of the compiler is the dfault while the apple gcc compiler creates i386 output by default. Check the architecture of the .o files with file *.o and if the gcc output needs to be x86_64 add the -m64 flag to the c and c++ flags. That has worked for me. You shouldn't need the intel c/c++ compilers. I find the configure error message to be a little bit cryptic and not very insightful. Do you have a suggestion for a new configure error message? I thought it was very clear, but then again, I'm one of the implementors... checking if C and Fortran 77 are link compatible... no ** * It appears that your Fortran 77 compiler is unable to link against * object files created by your C compiler. This generally indicates * either a conflict between the options specified in CFLAGS and FFLAGS * or a problem with the local compiler installation. More * information (including exactly what command was given to the * compilers and what error resulted when the commands were executed) is * available in the config.log file in this directory. ** configure: error: C and Fortran 77 compilers are not link compatible. Can not continue. -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Problems with program-execution with OpenMPI: Orted: command not found
Stephan, A couple things to try Put -np 2 after -hostfile /home/stephan/mpd.hosts put the command you want to run after -np 2 Good luck, Doug Reeder On Apr 21, 2008, at 11:56 PM, gildo@gmx.de wrote: Dear all, I wanted to compare MPICH and OpenMPI. MPICH works fine. So I installed OpenMPI the same way (configure, make, make install). The commands are found in the OpenMPI installation directory. When I tried to run programs I was a little bit confused, that there seems not to be a default hosts-file like in MPICH. I included it in the command with "--hostfile". When I now want to run my first test with mpirun -np 2 --hostfile /home/stephan/mpd.hosts I get the error-message: orted: command not found The "orted"-executable resides as well as the "mpirun"- and "mpiexec"-executables in the directory /home/stephan/openmpi- install. "orted" is also found by "which orted". What might be the problem? How does "orted" work? I'm not conscious about anything equivalent in MPICH... Thanks in advance for your help! Kind Regards Stephan -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] remote host not accessible
Danesh, The filesystem physically on the master, specifically including the directory where you are running the open-mpi program, should be NFS mounted by the slave machines. The absolute path name should be the same on all machines. I don't know if that will fix your problem but we had to do thaqt on our linux clusters and os x clusters. Doug On Apr 1, 2008, at 2:22 PM, Danesh Daroui wrote: You mean I should mount NFS filesystems of slave machine on master so their disks can be accessed from a mount point on master? In that cases, what moint point on master shoud it be? Should I configure open-MPI about this mount point? Can't it work without mounting? I think it should work since the processes are locally run via SSH on remote machines. D. Doug Reeder skrev: Danesh, Do they all have access to the sam file system/physical hard drive. You will probably need to NFS mount the filesystem on master on the other two systems. Doug Reeder On Apr 1, 2008, at 1:46 PM, Danesh Daroui wrote: Hi all, I have installed Open-MPI on three machine which runs OpenSUSE and it has been installed successfully. I can submit jobs locally on each machine using "mpirun" and it works fine. I have defined a host file on one of them (master) where I have defined IP address of each machine and number of slots. First when I tried to submit jobs to master it asked for password for SSH connection which showed that master can communicate with slaves. Then I setup all machines to communicate with each other using SSH without password. Now when I submit a job on master, the job just blocks and nothing happens. The program runs locally on each machine but it will not run when I submit it on master to be run on slaves. What can it be? D. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] remote host not accessible
Danesh, Do they all have access to the sam file system/physical hard drive. You will probably need to NFS mount the filesystem on master on the other two systems. Doug Reeder On Apr 1, 2008, at 1:46 PM, Danesh Daroui wrote: Hi all, I have installed Open-MPI on three machine which runs OpenSUSE and it has been installed successfully. I can submit jobs locally on each machine using "mpirun" and it works fine. I have defined a host file on one of them (master) where I have defined IP address of each machine and number of slots. First when I tried to submit jobs to master it asked for password for SSH connection which showed that master can communicate with slaves. Then I setup all machines to communicate with each other using SSH without password. Now when I submit a job on master, the job just blocks and nothing happens. The program runs locally on each machine but it will not run when I submit it on master to be run on slaves. What can it be? D. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] ScaLapack and BLACS on Leopard
Greg, I would disagree with your statement that the available fortran options can't pass a cost-benefit analysis. I have found that for scientific programming (e.g., Livermore Fortran Kernels and actual PDE solvers) that code produced by the intel compiler runs 25 to 55% faster than code from gfortran or g95. Looking at the cost of adding processors with g95/gfortran to get the same throughput as with ifort you recover the $549 compiler cost real quickly. Doug Reeder On Mar 6, 2008, at 9:20 AM, Gregory John Orris wrote: Sorry for the long delay in response. Let's get back to the beginning: My original compiler configuration was gcc from the standard Leopard Developer Tools supplied off the installation DVD. This version was 4.0.1. However, it has been significantly modified by Apple to work with Leopard. If you haven't used Apple's Developer Environment, you're missing out on something. It's pretty sweet. But the price you pay for it is no fortran support (not usually a problem for me but it is relevant here) and usually a somewhat time- lagged compiler. I'm not as plugged into Apple as perhaps I should be, but I can only imagine that their philosophy is to really over test their compiler. Gratis, Apple throws into it's "frameworks" a shared library called vecLib, that includes machine optimized BLAS and CLAPACK routines. Also, with Leopard, Apple has integrated open- mpi (yea!). But they have once again not included fortran support (boo!). Now, to get fortran on a Mac you have several options (most of which cannot really survive the cost-benefit analysis of a competent manager), but a perfectly fine freeware option is to get it off of hpc.sourceforge.net. This version is based on gcc 4.3.0. There are a few legitimate reasons to stick with Apple's older gcc. As it's not really a good idea to try an mix libraries from one compiler version with another. Especially here, because (without knowing precisely what Apple has done) there is a tremendous difference in execution speed of code written with gcc 4.0 and 4.1 as opposed to 4.2 and later. (This has been well documented on many systems.) Also, out of a bit of laziness, I really didn't want to go to the trouble of re-writing (or finding) all of the compiler scripts in the Developer Environment to use the new gcc. So, I compiled open-mpi-1.2.5 with gcc, g++ 4.0.1, and gfortran 4.3. Then, I compiled BLACS and ScaLAPACK using the configuration from the open-mpi FAQ page. Everything compiles perfectly ok, independent of whether you choose 32 or 64 bit addressing. First problem was that I was still calling mpicc from the Apple supplied openmpi and mpif77 from the newly installed distribution. Once again, I've not a clue what Apple has done, but while the two would compile items together, they DO NOT COMMUNICATE properly in 64-bit mode. MPI_COMM_WORLD even in the test routines of openMPI would fail! This is the point at which I originated the message asking if anyone had gotten a 64-bit version to actually work. The errors were in libSystem and were not what I'd expect from a simple openmpi error. I believe this problem is caused by a difference in how pointers were/are treated within gcc from version to version. Thus mixing versions essentially caused failure within the Apple supplied openmpi distribution and the new one I installed. How to get over this hurdle? Install the complete gcc 4.3.0 from the hpc.sourceforge.net site and recompile EVERYTHING! You might think you were done here, but there is one (or actually four) additional problem(s). Now NONE of the complex routines worked. All of the test routines returned failure. And I tracked it down the the fact that pzdotc, pzdotu, pcdotc, and pcdotu inside of the PBLAS routines were failing. Potentially this was a much more difficult problem, since rewriting these codes is really not what I'm paid to do. Tracing down these errors further I found that the actual problem is with the zdotc, zdotu, cdotc, and cdotu BLAS routines inside of Apple's vecLib. So, the problem seemed as though a faulty manufacturer supplied and optimized library was not functioning properly. Well, as it turns out there is a peculiar difference (again) between versions of the gcc suite in how it regards, returned values from complex fortran functions (I'm only assuming this since the workaround was successful). This problem has been know for some time now (perhaps 4 years or more). See, http://developer.apple.com/hardware/ve/errata.html#fortran_conventions How to get over this hurdle? Install ATLAS, CLAPACK, and CBLAS off the netlib.org web site, and compile them with the gcc 4.3.0 suite. So, where am I now? BLACS and ScaLAPACK, and PBLAS work in 64-bit mode with CLAPACK-3.1.1, ATLAS 3.8.1, Open-MPI-1.2.5, and GCC 4.3.0 and link with ATLAS and CLAPACK and NOT vecLib! Long way
Re: [OMPI users] -prefix option to mpirun.
Ashley, Could you define an alias for mpirun that includes -prefix and the necessary argument. Doug Reeder On Mar 4, 2008, at 6:28 AM, Ashley Pittman wrote: Hello, I work for medium sized UK based ISV and am packaging open-mpi so that is can be made available as an option to our users, so far I've been very impressed by how smoothly things have gone but I've got one problem which doesn't seem to be covered by the FAQ. We install openmpi to /opt/openmpi-1.2.5 and are using the modules command to select which mpi to use, the modules command correctly sets PATH to pick up mpicc and mpirun on the head node however the issue comes with running a job, users need to specify -prefix on the mpirun command line. Is there a way to specify this in the environment so I could make it happen automatically as part of the modules environment? I've searched the archives for this, the closest I can find is this exchange in 2006, if I specify a full path to mpirun then it does the right thing but is there a way to extend this functionality to the case where mpirun is run from path? http://www.open-mpi.org/community/lists/users/2006/01/0480.php Yours, Ashley Pittman. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] shared libraly problem with openmpi-1.2.3 and opensuse10.2
Yoshi, Is the appropriate verison of libgfortran.so.1 (32 bit or 64 bit in your LD_LIBRARY_PATH. what is the out put from ldd ./a.out The version of libgfortran.so.1 it lists needs to be in your LD_LIBRARY_PATH what does file ./a.out say if it is an AMD x86-64 then you should put /usr/lib64 in your LD_LIBRARY_PATH otherwise put /usr/lib in your LD_LIBRARY_PATH. Doug Reeder On Feb 19, 2008, at 10:00 PM, yoshi.plala wrote: Dear sirs I am a beginer with openmpi-1.2.3 (and opensuse10.2). But, I have some experience with mpich-1.2 and FreeBSD5.4. I am struggling with them to build scalapack, parallel-octave and matlab on. I succeeded in installing intel fortran/c 10.0.026 and openMPI-1.2.3, now. like belows #mkdir build #cd build #../configure --prefix=/opt/openmpi/1.2.3 --enable-mpi-threads CC=icc CXX=icpc F77=ifort FC=ifort #make all #make install test@linux-4e1d:~> set |grep LD_ DYLD_LIBRARY_PATH=/opt/intel/cce/10.0.026/lib:/opt/intel/fce/ 10.0.026/lib LD_LIBRARY_PATH=/opt/openmpi/1.2.3/lib:/opt/intel/cce/10.0.026/lib:/ opt/intel/fc e/10.0.026/lib LD_RUN_PATH=/opt/openmpi/1.2.3/lib:/opt/intel/cce/10.0.026/lib:/opt/ intel/fce/10 .0.026/lib:/usr/lib64:/usr/lib64/gcc/x86_64-suse-linux/4.1.2 test@linux-4e1d:~> hello_c worked without any trouble. test@linux-4e1d:~/openmpi-1.2.3/examples> mpirun -np 8 hello_c - hostfile /opt/o penmpi/1.2.3/etc/openmpi-default-hostfile Hello, world, I am 7 of 8 Hello, world, I am 6 of 8 Hello, world, I am 4 of 8 Hello, world, I am 3 of 8 Hello, world, I am 5 of 8 Hello, world, I am 0 of 8 Hello, world, I am 2 of 8 Hello, world, I am 1 of 8 test@linux-4e1d:~/openmpi-1.2.3/examples> But, my bench mark program doesn't work. Are there any mistake in my configuration?. test@linux-4e1d:~/himenoBMT/mpi> ls README.txt a.out himenoBMTxpr.f param.h paramset.sh test@linux-4e1d:~/himenoBMT/mpi> mpirun -np 8 ./a.out -hostfile /opt/openmpi/1. 2.3/etc/openmpi-default-hostfile ./a.out: error while loading shared libraries: libgfortran.so.1: cannot open sha red object file: No such file or directory ./a.out: error while loading shared libraries: libgfortran.so.1: cannot open sha red object file: No such file or directory ./a.out: error while loading shared libraries: libgfortran.so.1: cannot open sha red object file: No such file or directory ./a.out: error while loading shared libraries: libgfortran.so.1: cannot open sha red object file: No such file or directory ./a.out: error while loading shared libraries: libgfortran.so.1: cannot open sha red object file: No such file or directory ./a.out: error while loading shared libraries: libgfortran.so.1: cannot open sha red object file: No such file or directory [1]+ Stopped mpirun -np 8 ./a.out -hostfile /opt/openmpi/1.2.3/ etc/openmpi-default-hostfile test@linux-4e1d:~/himenoBMT/mpi> linux-4e1d:/home/test/himenoBMT/mpi # find / -name libgfortran.so.1 -print /usr/lib64/libgfortran.so.1 /usr/lib/libgfortran.so.1 /usr/local/matlab75/sys/os/glnxa64/libgfortran.so.1 linux-4e1d:/home/test/himenoBMT/mpi # ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] flash2.5 with openmpi
Brock, Do you mean flash memory, like a USB memory stick. What kid of file system is on the memory. Is there some filesystem limit you are bumping into. Doug Reeder On Jan 25, 2008, at 8:38 AM, Brock Palen wrote: Is anyone using flash with openMPI? we are here, but when ever it tries to write its second checkpoint file it segfaults once it gets to 2.2GB always in the same location. Debugging is a pain as it takes 3 days to get to that point. Just wondering if anyone else has seen this same behavior. Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Tracing the library using gdb and xterm
Krishna, Would it work to launch the gdb/ddd process separately on the remote machine and then attaching to the mpi running jobfrom within gdb/ddd. Something like ssh -X [hostname|ip address] [ddd|gdb] Doug Reeder On Jan 3, 2008, at 8:32 AM, Jeff Squyres wrote: Per my previous mail, Open MPI (by default) closes its ssh sessions after the remote processes are launched, so X forwarding through ssh will not work. If it is possible (and I think it is, based on your subsequent replies), you might be best served with unencrypted X forwarding. On Jan 3, 2008, at 11:02 AM, Doug Reeder wrote: Krishna, Review the ssh and sshd man pages. When using ssh -X it takes care of defining the DISPLAY and sending the X11 images to your screen. Defining DISPLY directly generally won't work (that is how you do it with rlogin but not with ssh). Doug Reeder On Jan 3, 2008, at 1:54 AM, Krishna Chaitanya wrote: Hi Rolf, Thanks for that. There is still one minor problem, though. The xwindow is getting spawned on the remote machine and not on my local machine. It now looks like, mpirun --prefix /usr/local -hostfile machines -x DISPLAY -x PATH - np 2 xterm -e gdb peruse_ex1 Please let me know what i can do to have it displayed on my machine. I have the DISPLAY variable set to 0.0 on both the machines and I am ssh-ing into the other machine by using the -X switch. Thanks, Krishna Chaitanya On 1/2/08, Rolf Vandevaart <rolf.vandeva...@sun.com> wrote: Krishna Chaitanya wrote: Hi, I have been tracing the interactions between the PERUSE and MPI library,on one machine. I have been using gdb along with xterm to have two windows open at the same time as I step through the code. I wish to get a better glimpse of the working of the point to point calls, by launching the job on two machines and by tracing the flow in a similar manner. This is where I stand as of now : mpirun --prefix /usr/local -hostfile machines -np 2 xterm -e gdb peruse_ex1 xterm Xt error: Can't open display: xterm: DISPLAY is not set I tried using the display option for xterm and setting the value as 0.0, that was not of much help. If someone can guide me as to where the DISPLAY parameter has to be set to allow the remote machine to open the xterm window, it will be of great help. Thanks, Krishna I also do the the following: -x DISPLAY -x PATH In this way, both your DISPLAY and PATH settings make it to the remote node. Rolf -- = rolf.vandeva...@sun.com 781-442-3043 = -- In the middle of difficulty, lies opportunity ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Tracing the library using gdb and xterm
Krishna, Review the ssh and sshd man pages. When using ssh -X it takes care of defining the DISPLAY and sending the X11 images to your screen. Defining DISPLY directly generally won't work (that is how you do it with rlogin but not with ssh). Doug Reeder On Jan 3, 2008, at 1:54 AM, Krishna Chaitanya wrote: Hi Rolf, Thanks for that. There is still one minor problem, though. The xwindow is getting spawned on the remote machine and not on my local machine. It now looks like, mpirun --prefix /usr/local -hostfile machines -x DISPLAY -x PATH - np 2 xterm -e gdb peruse_ex1 Please let me know what i can do to have it displayed on my machine. I have the DISPLAY variable set to 0.0 on both the machines and I am ssh-ing into the other machine by using the -X switch. Thanks, Krishna Chaitanya On 1/2/08, Rolf Vandevaart <rolf.vandeva...@sun.com> wrote: Krishna Chaitanya wrote: > Hi, >I have been tracing the interactions between the PERUSE > and MPI library,on one machine. I have been using gdb along with xterm > to have two windows open at the same time as I step through the code. I > wish to get a better glimpse of the working of the point to point calls, > by launching the job on two machines and by tracing the flow in a > similar manner. This is where I stand as of now : > > mpirun --prefix /usr/local -hostfile machines -np 2 xterm -e gdb peruse_ex1 > xterm Xt error: Can't open display: > xterm: DISPLAY is not set > >I tried using the display option for xterm and setting > the value as 0.0, that was not of much help. >If someone can guide me as to where the DISPLAY parameter > has to be set to allow the remote machine to open the xterm window, it > will be of great help. > > Thanks, > Krishna > I also do the the following: -x DISPLAY -x PATH In this way, both your DISPLAY and PATH settings make it to the remote node. Rolf -- = rolf.vandeva...@sun.com 781-442-3043 = -- In the middle of difficulty, lies opportunity ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Tracing the library using gdb and xterm
Krishna, If you are using ssh to connect to the second machine you need to be sure that ssh X11 forwarding is enabled and you may need to have mpi use ssh -X or ssh -Y to connect to the second machine. That is how the DISPLAY gets set using ssh. Doug Reeder On Jan 1, 2008, at 8:11 AM, Krishna Chaitanya wrote: Hi, I have been tracing the interactions between the PERUSE and MPI library,on one machine. I have been using gdb along with xterm to have two windows open at the same time as I step through the code. I wish to get a better glimpse of the working of the point to point calls, by launching the job on two machines and by tracing the flow in a similar manner. This is where I stand as of now : mpirun --prefix /usr/local -hostfile machines -np 2 xterm -e gdb peruse_ex1 xterm Xt error: Can't open display: xterm: DISPLAY is not set I tried using the display option for xterm and setting the value as 0.0, that was not of much help. If someone can guide me as to where the DISPLAY parameter has to be set to allow the remote machine to open the xterm window, it will be of great help. Thanks, Krishna -- In the middle of difficulty, lies opportunity ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] compiler warnings in openmpi-1.2.5rc2
Hello, The attachment contains a short explanation of a compiler warning using the gcc-4.3.0 compilers from hpc-sourceforge on os x 10.5.1. The warning doesn't occur when using the apple gcc-4.0.1 compilers. This was on a mac /x86 machine. Doug Reeder openmpi.wrn Description: Binary data