[OMPI users] Is this an OpenMPI bug?
I am trying to use the mpi_bcast function in fortran. I am using open-mpi-v-1.2.7 Say x is a real variable of size 100. np =100 I try to bcast this to all the processors. I use call mpi_bcast(x,np,mpi_real,0,ierr) When I do this and try to print the value from the resultant processor, exactly half the values gets broadcast. In this case, I get 50 correct values in the resultant processor and rest are junk. Same happened when i tried with np=20.. Exactly 10 values gets populated and rest are junk.!! ps: I am running this in a single processor. ( Just testing purposes ) I run this with "mpirun -np 4 " Cheerio, Gim
Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers
Does applying the following patch fix the problem? Index: ompi/datatype/dt_args.c === --- ompi/datatype/dt_args.c (revision 20616) +++ ompi/datatype/dt_args.c (working copy) @@ -18,6 +18,9 @@ */ #include "ompi_config.h" + +#include + #include "opal/util/arch.h" #include "opal/include/opal/align.h" #include "ompi/constants.h" On Feb 20, 2009, at 4:33 PM, Tamara Rogers wrote: Jeff: See attached.I'm using the 9.0 version of the intel compilers. Interestngly I have no problems on a 32bit intel machine using these same compilers. There only seems to be a problem on the 64bit machine. --- On Fri, 2/20/09, Jeff Squyres wrote: From: Jeff Squyres Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers To: "Open MPI Users" Date: Friday, February 20, 2009, 8:37 AM Can you also send a copy of your mpi.h? (OMPI's mpi.h is generated by configure; I want to see what was put into your mpi.h) Finally, what version of icc are you using? I test regularly with icc 9.0, 9.1, 10.0, and 10.1 with no problems. Are you using newer or older? (I don't have immediate access to 11.x or 8.x) On Feb 20, 2009, at 8:09 AM, Jeff Squyres wrote: > Can you send your config.log as well? > > It looks like you forgot to specify FC=ifort on your configure line (i.e., you need to specify F77=ifort for the Fortran 77 *and* FC=ifort for the Fortran 90 compiler -- this is an Autoconf thing; we didn't make it up). > > That shouldn't be the problem here, but I thought I'd mention it. > > > On Feb 19, 2009, at 12:00 PM, Tamara Rogers wrote: > >> >> Jeff: >> You're correct. That was the incorrect config file. I've attached the correct one as per the recommendations in the help page. >> >> Thanks for your help >> >> --- On Thu, 2/19/09, Jeff Squyres wrote: >> From: Jeff Squyres >> Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers >> To: talmesh...@yahoo.com, "Open MPI Users" >> Date: Thursday, February 19, 2009, 8:32 AM >> >> Your config.log looks incomplete -- it failed saying that your C and C++ >> compilers were incompatible with each other. >> >> This does not seem related to what you described -- are you sure you're >> sending the right config.log? >> >> Specifically, can you send all the information listed here: >> >>http://www.open-mpi.org/community/help/ >> >> >> On Feb 17, 2009, at 5:10 PM, Tamara Rogers wrote: >> >> > Hello all: >> > I was unable to compile the latest version (1.3) on my intel 64bit system >> with the intel compilers (version 9.0). Configuration goes fine, but I get this >> error when running make: >> > >> > ../../ompi/include/mpi.h(203): error: identifier "ptrdiff_t" is >> undefined >> > typedef OMPI_PTRDIFF_TYPE MPI_Aint; >> > >> > compilation aborted for dt_args.c (cod 21) >> > >> > My config line was: >> > ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=xxx >> > >> > I've attached my config.log file. Has anyone encourtered this? I was >> able to build openmpi on this exact system using the gcc/g++ compilers, however >> the intel compilers are substantially faster on our system. >> > >> > Thanks! >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> --Jeff Squyres >> Cisco Systems >> >> >> >> < openmp -1.3_output.tar.gz>___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > --Jeff Squyres > Cisco Systems > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users --Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users < openmpi -1.3_64_output.tar.gz>___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] OpenMPI 1.3.1 rpm build error
There won't be an official SRPM until 1.3.1 is released. But to test if 1.3.1 is on-track to deliver a proper solution to you, can you try a nightly tarball, perhaps in conjunction with our "buildrpm.sh" script? https://svn.open-mpi.org/source/xref/ompi_1.3/contrib/dist/linux/buildrpm.sh It should build a trivial SRPM for you from the tarball. You'll likely need to get the specfile, too, and put it in the same dir as buildrpm.sh. The specfile is in the same SVN directory: https://svn.open-mpi.org/source/xref/ompi_1.3/contrib/dist/linux/openmpi.spec On Feb 20, 2009, at 3:51 PM, Jim Kusznir wrote: As long as I can still build the rpm for it and install it via rpm. I'm running it on a ROCKS cluster, so it needs to be an RPM to get pushed out to the compute nodes. --Jim On Fri, Feb 20, 2009 at 11:30 AM, Jeff Squyres wrote: On Feb 20, 2009, at 2:20 PM, Jim Kusznir wrote: I just went to www.open-mpi.org, went to download, then source rpm. Looks like it was actually 1.3-1. Here's the src.rpm that I pulled in: http://www.open-mpi.org/software/ompi/v1.3/downloads/openmpi-1.3-1.src.rpm Ah, gotcha. Yes, that's 1.3.0, SRPM version 1. We didn't make up this nomenclature. :-( The reason for this upgrade is it seems a user found some bug that may be in the OpenMPI code that results in occasionally an MPI_Send() message getting lost. He's managed to reproduce it multiple times, and we can't find anything in his code that can cause it...He's got logs of mpi_send() going out, but the matching mpi_receive() never getting anything, thus killing his code. We're currently running 1.2.8 with ofed support (Haven't tried turning off ofed, etc. yet). Ok. 1.3.x is much mo' betta' then 1.2 in many ways. We could probably help track down the problem, but if you're willing to upgrade to 1.3.x, it'll hopefully just make the problem go away. Can you try a 1.3.1 nightly tarball? -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers
Jeff: See attached.I'm using the 9.0 version of the intel compilers. Interestngly I have no problems on a 32bit intel machine using these same compilers. There only seems to be a problem on the 64bit machine. --- On Fri, 2/20/09, Jeff Squyres wrote: From: Jeff Squyres Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers To: "Open MPI Users" List-Post: users@lists.open-mpi.org Date: Friday, February 20, 2009, 8:37 AM Can you also send a copy of your mpi.h? (OMPI's mpi.h is generated by configure; I want to see what was put into your mpi.h) Finally, what version of icc are you using? I test regularly with icc 9.0, 9.1, 10.0, and 10.1 with no problems. Are you using newer or older? (I don't have immediate access to 11.x or 8.x) On Feb 20, 2009, at 8:09 AM, Jeff Squyres wrote: > Can you send your config.log as well? > > It looks like you forgot to specify FC=ifort on your configure line (i.e., you need to specify F77=ifort for the Fortran 77 *and* FC=ifort for the Fortran 90 compiler -- this is an Autoconf thing; we didn't make it up). > > That shouldn't be the problem here, but I thought I'd mention it. > > > On Feb 19, 2009, at 12:00 PM, Tamara Rogers wrote: > >> >> Jeff: >> You're correct. That was the incorrect config file. I've attached the correct one as per the recommendations in the help page. >> >> Thanks for your help >> >> --- On Thu, 2/19/09, Jeff Squyres wrote: >> From: Jeff Squyres >> Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers >> To: talmesh...@yahoo.com, "Open MPI Users" >> Date: Thursday, February 19, 2009, 8:32 AM >> >> Your config.log looks incomplete -- it failed saying that your C and C++ >> compilers were incompatible with each other. >> >> This does not seem related to what you described -- are you sure you're >> sending the right config.log? >> >> Specifically, can you send all the information listed here: >> >>http://www.open-mpi.org/community/help/ >> >> >> On Feb 17, 2009, at 5:10 PM, Tamara Rogers wrote: >> >> > Hello all: >> > I was unable to compile the latest version (1.3) on my intel 64bit system >> with the intel compilers (version 9.0). Configuration goes fine, but I get this >> error when running make: >> > >> > ../../ompi/include/mpi.h(203): error: identifier "ptrdiff_t" is >> undefined >> > typedef OMPI_PTRDIFF_TYPE MPI_Aint; >> > >> > compilation aborted for dt_args.c (cod 21) >> > >> > My config line was: >> > ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=xxx >> > >> > I've attached my config.log file. Has anyone encourtered this? I was >> able to build openmpi on this exact system using the gcc/g++ compilers, however >> the intel compilers are substantially faster on our system. >> > >> > Thanks! >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> --Jeff Squyres >> Cisco Systems >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > --Jeff Squyres > Cisco Systems > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users --Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users openmpi-1.3_64_output.tar.gz Description: application/gzip-compressed
Re: [OMPI users] OpenMPI 1.3.1 rpm build error
As long as I can still build the rpm for it and install it via rpm. I'm running it on a ROCKS cluster, so it needs to be an RPM to get pushed out to the compute nodes. --Jim On Fri, Feb 20, 2009 at 11:30 AM, Jeff Squyres wrote: > On Feb 20, 2009, at 2:20 PM, Jim Kusznir wrote: > >> I just went to www.open-mpi.org, went to download, then source rpm. >> Looks like it was actually 1.3-1. Here's the src.rpm that I pulled >> in: >> >> http://www.open-mpi.org/software/ompi/v1.3/downloads/openmpi-1.3-1.src.rpm > > Ah, gotcha. Yes, that's 1.3.0, SRPM version 1. We didn't make up this > nomenclature. :-( > >> The reason for this upgrade is it seems a user found some bug that may >> be in the OpenMPI code that results in occasionally an MPI_Send() >> message getting lost. He's managed to reproduce it multiple times, >> and we can't find anything in his code that can cause it...He's got >> logs of mpi_send() going out, but the matching mpi_receive() never >> getting anything, thus killing his code. We're currently running >> 1.2.8 with ofed support (Haven't tried turning off ofed, etc. yet). > > Ok. 1.3.x is much mo' betta' then 1.2 in many ways. We could probably help > track down the problem, but if you're willing to upgrade to 1.3.x, it'll > hopefully just make the problem go away. > > Can you try a 1.3.1 nightly tarball? > > -- > Jeff Squyres > Cisco Systems > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] OpenMPI 1.3.1 rpm build error
On Feb 20, 2009, at 2:20 PM, Jim Kusznir wrote: I just went to www.open-mpi.org, went to download, then source rpm. Looks like it was actually 1.3-1. Here's the src.rpm that I pulled in: http://www.open-mpi.org/software/ompi/v1.3/downloads/openmpi-1.3-1.src.rpm Ah, gotcha. Yes, that's 1.3.0, SRPM version 1. We didn't make up this nomenclature. :-( The reason for this upgrade is it seems a user found some bug that may be in the OpenMPI code that results in occasionally an MPI_Send() message getting lost. He's managed to reproduce it multiple times, and we can't find anything in his code that can cause it...He's got logs of mpi_send() going out, but the matching mpi_receive() never getting anything, thus killing his code. We're currently running 1.2.8 with ofed support (Haven't tried turning off ofed, etc. yet). Ok. 1.3.x is much mo' betta' then 1.2 in many ways. We could probably help track down the problem, but if you're willing to upgrade to 1.3.x, it'll hopefully just make the problem go away. Can you try a 1.3.1 nightly tarball? -- Jeff Squyres Cisco Systems
Re: [OMPI users] OpenMPI 1.3.1 rpm build error
I just went to www.open-mpi.org, went to download, then source rpm. Looks like it was actually 1.3-1. Here's the src.rpm that I pulled in: http://www.open-mpi.org/software/ompi/v1.3/downloads/openmpi-1.3-1.src.rpm The reason for this upgrade is it seems a user found some bug that may be in the OpenMPI code that results in occasionally an MPI_Send() message getting lost. He's managed to reproduce it multiple times, and we can't find anything in his code that can cause it...He's got logs of mpi_send() going out, but the matching mpi_receive() never getting anything, thus killing his code. We're currently running 1.2.8 with ofed support (Haven't tried turning off ofed, etc. yet). --Jim On Thu, Feb 19, 2009 at 6:46 PM, Jeff Squyres wrote: > There is no 1.3.1 RPM yet (only a 1.3 RPM) -- what file specifically are you > trying to build? > > Could you try building one of the 1.3.1 nightly snapshot tarballs? I > *think* the problem you're seeing is a problem due to FORTIFY_SOURCE in the > VT code in 1.3 and should be fixed by now. > >http://www.open-mpi.org/nightly/v1.3/ > > > On Feb 19, 2009, at 12:00 PM, Jim Kusznir wrote: > >> Hi all: >> >> I'm trying to build openmpi RPMs from the included spec file. The >> build fails with: >> >> gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib >> -I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE >> -DBINDIR=\"/opt/openmpi-gcc/1.3/bin\" >> -DDATADIR=\"/opt/openmpi-gcc/1.3/share\" -DRFG -DVT_BFD -DVT_MEMHOOK >> -DVT_IOWRAP -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions >> -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -MT >> vt_iowrap.o -MD -MP -MF .deps/vt_iowrap.Tpo -c -o vt_iowrap.o >> vt_iowrap.c >> vt_iowrap.c:1242: error: expected declaration specifiers or '...' >> before numeric constant >> vt_iowrap.c:1243: error: conflicting types for '__fprintf_chk' >> make[5]: *** [vt_iowrap.o] Error 1 >> >> >> My build command was: >> rpmbuild -bb --define 'install_in_opt 1' --define 'install_modulefile >> 1' --define 'modules_rpm_name environment-modules' --define >> 'build_all_in_one_rpm 0' --define 'configure_options >> --with-tm=/opt/torque --with-openib=/opt/mlnx-ofed/src/OFED-1.3.1' >> --define '_name openmpi-gcc' openmpi-1.3.spec >> >> This build for the 1.2.8 worked fine; this is my first attempt at >> building 1.3.1. >> The system is Rocks 5.1 (CentSO 5.2), GCC 4.1.2-42 (CentOS 5.2 default). >> >> Any suggestions? >> >> Thanks! >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > Cisco Systems > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] lammps MD code fails with Open MPI 1.3
On Feb 20, 2009, at 10:08 AM, Jeff Pummill wrote: It's probably not the same issue as this is one of the very few codes that I maintain which is C++ and not fortran :-( Ok. Note that the error Nysal pointed out was a problem with our handling of stdin. That might be an issue as well; should be fixed in any recent 1.3.1 nightly snapshot. It behaved similarly on another system when I built it against a new version (1.0??) of MVAPICH. I had to roll back a version from that as well. I may contact the lammps people and see if they know what's going on as well. Gotcha. -- Jeff Squyres Cisco Systems
Re: [OMPI users] WRF, OpenMPI and PGI 7.2
Note that (beginning with 1.3) you can also use "platform files" to save configure and default mca params so that you build consistently. Check the examples in contrib/platform. Most of us developers use these religiously, as do our host organizations, for precisely this reason. I believe there should be something on the FAQ about platform files - if not, I'll try to add it in the next few days. If you want to contribute platform files to support some community with similar configurations, please send them to me - we shouldn't need a contributors agreement for them as there is no code involved. Ralph On Feb 20, 2009, at 8:23 AM, Gus Correa wrote: Hi Gerry I usually put configure commands (and environment variables) on little shell scripts, which I edit to fit the combination of hardware/compiler(s), and keep them in the build directory. Otherwise I would forget the details next time I need to build. If Myrinet and GigE are on separate clusters, you'll have to install OpenMPI on each one, sorry. However, if Myrinet and GigE are available on the same cluster, you can build a single OpenMPI, and choose the "byte transport layer (BTL)" to be Myrinet or GigE (or IB, for that matter), and even the NICs/networks to use, on your job submission script. Check the OpenMPI FAQ: http://www.open-mpi.org/faq/?category=myrinet#myri-btl-gm http://www.open-mpi.org/faq/?category=tcp#tcp-btl http://www.open-mpi.org/faq/?category=openfabrics#ib-btl http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network http://www.open-mpi.org/faq/?category=tcp#tcp-selection Gus Correa PS - BTW - Our old non-Rocks cluster has Myrinet-2000 (GM). After I get the new cluster up and running and in production, I am thinking of revamping the old cluster, and install Rocks on it. I would love to learn from your experience with your Rocks+Myrinet cluster, if you have the time to post a short "bullet list" of "do's and don'ts". (The Rocks list may be more appropriate than the OpenMPI for this.) Last I checked Myrinet had a roll only for Rocks 5.0, not 5.1, right? Did you install it with on top of Rocks 5.0 or 5.1? (For instance, my recollection of old postings on the list, is that the Torque 5.0 roll worked with Rocks 5.1, but it is always a risky business to mix different releases.) Many thanks, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Gerry Creager wrote: Gus, I'll give that a try real quick (or as quickly as the compiles can run. I'd not thought of this solution. I've been context-switching too much lately. I've gotta look at this for a gigabit cluster as well. Thanks! Gus Correa wrote: Hi Gerry You may need to compile a hybrid OpenMPI using gcc for C, PGI f90 for Fortran on the OpenMPI configure script. This should give you the required mpicc and mpif90 to do the job. I guess this is what Elvedin meant on his message. I have these hybrids for OpenMPI and MPICH2 here (not Myrinet but GigE), and they work fine with a WRF relative (CAM3, atmospheric climate). Two cents from Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Gerry Creager wrote: Elvedin, Yeah, I thought about that after finding a reference to this in the archives, so I redirected the path to MPI toward the gnu- compiled version. It died in THIS manner: make[3]: Entering directory `/home/gerry/WRFv3/WRFV3/external/ RSL_LITE' mpicc -cc=gcc -DFSEEKO64_OK -w -O3 -DDM_PARALLEL -c c_code.c pgcc-Error-Unknown switch: -cc=gcc make[3]: [c_code.o] Error 1 (ignored) Methinks the wrf configuration script and make file will need some tweeks. Interesting thing: I have another system (alas, with mpich) where it compiles just fine. I'm trying to sort this out, as on 2 systems, with openMPI, it does odd dances before dying. I'm still trying things. I've gotta get this up both for MY research and to support other users. Thanks, Gerry Elvedin Trnjanin wrote: WRF almost requires that you use gcc for the C/C++ part and the PGI Fortran compilers, if you choose that option. I'd suggest compiling OpenMPI in the same way as that has resolved our various issues. Have you tried that with the same result? Gerry Creager wrote: Howdy, I'm new to this list. I've done a little review but likely missed something specific to what I'm asking. I'll keep looking but need to resolve this soon. I'm running a Rocks cluster (centos 5), with PGI 7.2-3 compilers, Myricom MX2 hardware and drivers, and OpenMPI1.3 I installed the Myricom roll which has OpenMPI compiled with gcc. I recently compiled the openmpi code w/
Re: [OMPI users] WRF, OpenMPI and PGI 7.2
Hi Gerry I usually put configure commands (and environment variables) on little shell scripts, which I edit to fit the combination of hardware/compiler(s), and keep them in the build directory. Otherwise I would forget the details next time I need to build. If Myrinet and GigE are on separate clusters, you'll have to install OpenMPI on each one, sorry. However, if Myrinet and GigE are available on the same cluster, you can build a single OpenMPI, and choose the "byte transport layer (BTL)" to be Myrinet or GigE (or IB, for that matter), and even the NICs/networks to use, on your job submission script. Check the OpenMPI FAQ: http://www.open-mpi.org/faq/?category=myrinet#myri-btl-gm http://www.open-mpi.org/faq/?category=tcp#tcp-btl http://www.open-mpi.org/faq/?category=openfabrics#ib-btl http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network http://www.open-mpi.org/faq/?category=tcp#tcp-selection Gus Correa PS - BTW - Our old non-Rocks cluster has Myrinet-2000 (GM). After I get the new cluster up and running and in production, I am thinking of revamping the old cluster, and install Rocks on it. I would love to learn from your experience with your Rocks+Myrinet cluster, if you have the time to post a short "bullet list" of "do's and don'ts". (The Rocks list may be more appropriate than the OpenMPI for this.) Last I checked Myrinet had a roll only for Rocks 5.0, not 5.1, right? Did you install it with on top of Rocks 5.0 or 5.1? (For instance, my recollection of old postings on the list, is that the Torque 5.0 roll worked with Rocks 5.1, but it is always a risky business to mix different releases.) Many thanks, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Gerry Creager wrote: Gus, I'll give that a try real quick (or as quickly as the compiles can run. I'd not thought of this solution. I've been context-switching too much lately. I've gotta look at this for a gigabit cluster as well. Thanks! Gus Correa wrote: Hi Gerry You may need to compile a hybrid OpenMPI using gcc for C, PGI f90 for Fortran on the OpenMPI configure script. This should give you the required mpicc and mpif90 to do the job. I guess this is what Elvedin meant on his message. I have these hybrids for OpenMPI and MPICH2 here (not Myrinet but GigE), and they work fine with a WRF relative (CAM3, atmospheric climate). Two cents from Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Gerry Creager wrote: Elvedin, Yeah, I thought about that after finding a reference to this in the archives, so I redirected the path to MPI toward the gnu-compiled version. It died in THIS manner: make[3]: Entering directory `/home/gerry/WRFv3/WRFV3/external/RSL_LITE' mpicc -cc=gcc -DFSEEKO64_OK -w -O3 -DDM_PARALLEL -c c_code.c pgcc-Error-Unknown switch: -cc=gcc make[3]: [c_code.o] Error 1 (ignored) Methinks the wrf configuration script and make file will need some tweeks. Interesting thing: I have another system (alas, with mpich) where it compiles just fine. I'm trying to sort this out, as on 2 systems, with openMPI, it does odd dances before dying. I'm still trying things. I've gotta get this up both for MY research and to support other users. Thanks, Gerry Elvedin Trnjanin wrote: WRF almost requires that you use gcc for the C/C++ part and the PGI Fortran compilers, if you choose that option. I'd suggest compiling OpenMPI in the same way as that has resolved our various issues. Have you tried that with the same result? Gerry Creager wrote: Howdy, I'm new to this list. I've done a little review but likely missed something specific to what I'm asking. I'll keep looking but need to resolve this soon. I'm running a Rocks cluster (centos 5), with PGI 7.2-3 compilers, Myricom MX2 hardware and drivers, and OpenMPI1.3 I installed the Myricom roll which has OpenMPI compiled with gcc. I recently compiled the openmpi code w/ PGI. I've the MPICH_F90 pointing to the right place, and we're looking for the right includes and libs by means of LD_LIBRARY_PATH, etc. When I tried to run, I got the following error: make[3]: Entering directory `/home/gerry/WRFv3/WRFV3/external/RSL_LITE' mpicc -DFSEEKO64_OK -w -O3 -DDM_PARALLEL -c c_code.c PGC/x86-64 Linux 7.2-3: compilation completed with warnings mpicc -DFSEEKO64_OK -w -O3 -DDM_PARALLEL -c buf_for_proc.c PGC-S-0036-Syntax error: Recovery attempted by inserting identifier .Z before '(' (/share/apps/openmpi-1.3-pgi/include/mpi.h: 889) PGC-S-0082-Function returning array not allowed (/share/apps/openmpi-1.3-pgi/include/mpi.h: 889) PGC-S-0043-Redefi
Re: [OMPI users] round-robin scheduling question [hostfile]
It is a little bit of both: * historical, because most MPI's default to mapping by slot, and * performance, because procs that share a node can communicate via shared memory, which is faster than sending messages over an interconnect, and most apps are communication-bound If your app is disk-intensive, then mapping it -bynode may be a better option for you. That's why we provide it. Note, however, that you can still wind up with multiple procs on a node. All "bynode" means is that the ranks are numbered consecutively bynode - it doesn't mean that there is only one proc/node. If you truly want one proc/node, then you should use the -pernode option. This maps one proc on each node up to either the number of procs you specified or the number of available nodes. If you don't specify -np, we just put one proc on each node in your allocation/ hostfile. HTH Ralph On Feb 20, 2009, at 1:25 AM, Raymond Wan wrote: Hi all, According to FAQ 14 (How do I control how my processes are scheduled across nodes?) [http://www.open-mpi.org/faq/?category=running#mpirun-scheduling ], it says that the default scheduling policy is by slot and not by node. I'm curious why the default is "by slot" since I am thinking of explicitly specifying by node but I'm wondering if there is an issue which I haven't considered. I would think that one reason for "by node" is to distribute HDD access across machines [as is the case for me since my program is HDD access intensive]. Or perhaps I am mistaken? I'm now thinking that "by slot" is the default because processes with ranks that are close together might do similar tasks and you would want them on the same node? Is that the reason? Also, at the end of this FAQ, it says "NOTE: This is the scheduling policy in Open MPI because of a long historical precendent..." -- does this "This" refer to "the fact that there are two scheduling policies" or "the fact that 'by slot' is the default"? If the latter, then that explains why "by slot" is the default, I guess... Thank you! Ray ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Strange problem
Hi Gabriele Could be we have a problem in our LSF support - none of us have a way of testing it, so this is somewhat of a blind programming case for us. From the message, it looks like there is some misunderstanding about how many slots were allocated vs how many were mapped to a specific host. I don't see your cmd line here - could you pass it along too? My initial guess is that mpirun is running on node0023, and that we then mapped procs local to mpirun such that we exceeded LSF's slot allocation on that node. We don't account for mpirun taking a process slot in our mapping, and LSF does - hence the error. I think... You could test this by adding --nolocal to your cmd line. This will force mpirun to map all procs on other nodes. If my analysis is correct, the job should run. Ralph On Feb 20, 2009, at 6:46 AM, Gabriele Fatigati wrote: Dear OpenMPi developers, i'm running my MPI code compiled with OpenMPI 1.3 over Infiniband and LSF scheduler. But i got the error attached. I suppose that spawning process doesn't works well. The same program under OpenMPI 1.2.5 works well. Could you help me? Thanks in advance. -- Ing. Gabriele Fatigati Parallel programmer CINECA Systems & Tecnologies Department Supercomputing Group Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy www.cineca.itTel: +39 051 6171722 g.fatigati [AT] cineca.it ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] lammps MD code fails with Open MPI 1.3
It's probably not the same issue as this is one of the very few codes that I maintain which is C++ and not fortran :-( It behaved similarly on another system when I built it against a new version (1.0??) of MVAPICH. I had to roll back a version from that as well. I may contact the lammps people and see if they know what's going on as well. Jeff F. Pummill Senior Linux Cluster Administrator TeraGrid Campus Champion - UofA University of Arkansas Fayetteville, Arkansas 72701 (479) 575 - 4590 http://hpc.uark.edu "In theory, there is no difference between theory and practice. But in practice, there is!" /-- anonymous/ Jeff Squyres wrote: Actually, there was a big Fortran bug that crept in after 1.3 that was just fixed on the trunk last night. If you're using Fortran applications with some compilers (e.g., Intel), the 1.3.1 nightly snapshots may have hung in some cases. The problem should be fixed in tonight's 1.3.1 nightly snapshot. On Feb 20, 2009, at 12:46 AM, Nysal Jan wrote: It could be the same bug reported here http://www.open-mpi.org/community/lists/users/2009/02/8010.php Can you try a recent snapshot of 1.3.1 (http://www.open-mpi.org/nightly/v1.3/) to verify if this has been fixed --Nysal On Thu, 2009-02-19 at 16:09 -0600, Jeff Pummill wrote: I built a fresh version of lammps v29Jan09 against Open MPI 1.3 which in turn was built with Gnu compilers v4.2.4 on an Ubuntu 8.04 x86_64 box. This Open MPI build was able to generate usable binaries such as XHPL and NPB, but the lammps binary it generated was not usable. I tried it with a couple of different versions of the lammps source, but to no avail. No errors during the builds and a binary was created, but when executing the job it quickly exits with no messages other than: jpummil@stealth:~$ mpirun -np 4 -hostfile hosts /home/jpummil/lmp_Stealth-OMPI < in.testbench_small LAMMPS (22 Jan 2008) Interestingly, I downloaded Open MPI 1.2.8, built it with the same configure options I had used with 1.3, and it worked. I'm getting by fine with 1.2.8. I just wanted to file a possible bug report on 1.3 and see if others have seen this behavior. Cheers! -- Jeff F. Pummill Senior Linux Cluster Administrator TeraGrid Campus Champion - UofA University of Arkansas ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] openmpi 1.3: undefined symbol: mca_base_param_reg_int [was: Re: OpenMPI 1.3:]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi again! Sorry for messing up the subject. Also, I wanted to attach the output of ompi_info -all. Olaf -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFJnsSztQ3riQ3oo/oRAhMoAJ0ezp13kNOSEwbph5p/sS2hdMMR+wCgmkus PuyoW3hfklqfUhYwJXaKvHM= =7Kl+ -END PGP SIGNATURE- ompi_info.out.gz Description: GNU Zip compressed data
[OMPI users] OpenMPI 1.3:
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello! I have compiled OpenMPI 1.3 with configure --prefix=$HOME/software The compilation works fine, and I can run normal MPI programs. However, I'm using OpenMPI to run a program that we currently develop (http://www.espresso-pp.de). The software uses Python as a front-end language, which loads the MPI-enabled shared library. When I start python with a script using this parallel lib via mpiexec, I get the following error: > mpiexec -n 4 python examples/hello.py python: symbol lookup error: /people/thnfs/homes/lenzo/software.thop/lib/openmpi/mca_paffinity_linux.so: undefined symbol: mca_base_param_reg_int python: symbol lookup error: /people/thnfs/homes/lenzo/software.thop/lib/openmpi/mca_paffinity_linux.so: undefined symbol: mca_base_param_reg_int python: symbol lookup error: /people/thnfs/homes/lenzo/software.thop/lib/openmpi/mca_paffinity_linux.so: undefined symbol: mca_base_param_reg_int python: symbol lookup error: /people/thnfs/homes/lenzo/software.thop/lib/openmpi/mca_paffinity_linux.so: undefined symbol: mca_base_param_reg_int When I compile OpenMPI 1.3 using --enable-shared --enable-static the problem disappears. Note also, that the same program works when I'm using OpenMPI 1.2.x (tested 1.2.6 and 1.2.9). I do believe that the problem is connected with the problem described here: http://www.open-mpi.org/community/lists/devel/2005/09/0359.php I have found a workaround, but I think the problem is worth reporting. Let me know if I can help in debugging the problem. Greetings from Germany Olaf Lenz PS: It is not obvious on the OpenMPI web site where to report bugs. When clicking on "Bug Tracking", which seems most obvious, I'm redirected to the Trac Timeline, and there is no place where I can report bugs or anything. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFJnsPwtQ3riQ3oo/oRAmLfAJ9VdcC1eQCiJyQCoXdXF/UsAgECVgCfXYA+ H3ghX4gj3dGze0io6RQC+KE= =Wu5B -END PGP SIGNATURE-
[OMPI users] Strange problem
Dear OpenMPi developers, i'm running my MPI code compiled with OpenMPI 1.3 over Infiniband and LSF scheduler. But i got the error attached. I suppose that spawning process doesn't works well. The same program under OpenMPI 1.2.5 works well. Could you help me? Thanks in advance. -- Ing. Gabriele Fatigati Parallel programmer CINECA Systems & Tecnologies Department Supercomputing Group Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy www.cineca.itTel: +39 051 6171722 g.fatigati [AT] cineca.it job.196571.err Description: Binary data
Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers
Can you also send a copy of your mpi.h? (OMPI's mpi.h is generated by configure; I want to see what was put into your mpi.h) Finally, what version of icc are you using? I test regularly with icc 9.0, 9.1, 10.0, and 10.1 with no problems. Are you using newer or older? (I don't have immediate access to 11.x or 8.x) On Feb 20, 2009, at 8:09 AM, Jeff Squyres wrote: Can you send your config.log as well? It looks like you forgot to specify FC=ifort on your configure line (i.e., you need to specify F77=ifort for the Fortran 77 *and* FC=ifort for the Fortran 90 compiler -- this is an Autoconf thing; we didn't make it up). That shouldn't be the problem here, but I thought I'd mention it. On Feb 19, 2009, at 12:00 PM, Tamara Rogers wrote: Jeff: You're correct. That was the incorrect config file. I've attached the correct one as per the recommendations in the help page. Thanks for your help --- On Thu, 2/19/09, Jeff Squyres wrote: From: Jeff Squyres Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers To: talmesh...@yahoo.com, "Open MPI Users" Date: Thursday, February 19, 2009, 8:32 AM Your config.log looks incomplete -- it failed saying that your C and C++ compilers were incompatible with each other. This does not seem related to what you described -- are you sure you're sending the right config.log? Specifically, can you send all the information listed here: http://www.open-mpi.org/community/help/ On Feb 17, 2009, at 5:10 PM, Tamara Rogers wrote: > Hello all: > I was unable to compile the latest version (1.3) on my intel 64bit system with the intel compilers (version 9.0). Configuration goes fine, but I get this error when running make: > > ../../ompi/include/mpi.h(203): error: identifier "ptrdiff_t" is undefined > typedef OMPI_PTRDIFF_TYPE MPI_Aint; > > compilation aborted for dt_args.c (cod 21) > > My config line was: > ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=xxx > > I've attached my config.log file. Has anyone encourtered this? I was able to build openmpi on this exact system using the gcc/g++ compilers, however the intel compilers are substantially faster on our system. > > Thanks! > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users --Jeff Squyres Cisco Systems < openmp -1.3_output.tar.gz>___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers
Can you send your config.log as well? It looks like you forgot to specify FC=ifort on your configure line (i.e., you need to specify F77=ifort for the Fortran 77 *and* FC=ifort for the Fortran 90 compiler -- this is an Autoconf thing; we didn't make it up). That shouldn't be the problem here, but I thought I'd mention it. On Feb 19, 2009, at 12:00 PM, Tamara Rogers wrote: Jeff: You're correct. That was the incorrect config file. I've attached the correct one as per the recommendations in the help page. Thanks for your help --- On Thu, 2/19/09, Jeff Squyres wrote: From: Jeff Squyres Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers To: talmesh...@yahoo.com, "Open MPI Users" Date: Thursday, February 19, 2009, 8:32 AM Your config.log looks incomplete -- it failed saying that your C and C++ compilers were incompatible with each other. This does not seem related to what you described -- are you sure you're sending the right config.log? Specifically, can you send all the information listed here: http://www.open-mpi.org/community/help/ On Feb 17, 2009, at 5:10 PM, Tamara Rogers wrote: > Hello all: > I was unable to compile the latest version (1.3) on my intel 64bit system with the intel compilers (version 9.0). Configuration goes fine, but I get this error when running make: > > ../../ompi/include/mpi.h(203): error: identifier "ptrdiff_t" is undefined > typedef OMPI_PTRDIFF_TYPE MPI_Aint; > > compilation aborted for dt_args.c (cod 21) > > My config line was: > ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=xxx > > I've attached my config.log file. Has anyone encourtered this? I was able to build openmpi on this exact system using the gcc/g++ compilers, however the intel compilers are substantially faster on our system. > > Thanks! > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users --Jeff Squyres Cisco Systems < openmp -1.3_output.tar.gz>___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] lammps MD code fails with Open MPI 1.3
Actually, there was a big Fortran bug that crept in after 1.3 that was just fixed on the trunk last night. If you're using Fortran applications with some compilers (e.g., Intel), the 1.3.1 nightly snapshots may have hung in some cases. The problem should be fixed in tonight's 1.3.1 nightly snapshot. On Feb 20, 2009, at 12:46 AM, Nysal Jan wrote: It could be the same bug reported here http://www.open-mpi.org/community/lists/users/2009/02/8010.php Can you try a recent snapshot of 1.3.1 (http://www.open-mpi.org/nightly/v1.3/) to verify if this has been fixed --Nysal On Thu, 2009-02-19 at 16:09 -0600, Jeff Pummill wrote: I built a fresh version of lammps v29Jan09 against Open MPI 1.3 which in turn was built with Gnu compilers v4.2.4 on an Ubuntu 8.04 x86_64 box. This Open MPI build was able to generate usable binaries such as XHPL and NPB, but the lammps binary it generated was not usable. I tried it with a couple of different versions of the lammps source, but to no avail. No errors during the builds and a binary was created, but when executing the job it quickly exits with no messages other than: jpummil@stealth:~$ mpirun -np 4 -hostfile hosts /home/jpummil/lmp_Stealth-OMPI < in.testbench_small LAMMPS (22 Jan 2008) Interestingly, I downloaded Open MPI 1.2.8, built it with the same configure options I had used with 1.3, and it worked. I'm getting by fine with 1.2.8. I just wanted to file a possible bug report on 1.3 and see if others have seen this behavior. Cheers! -- Jeff F. Pummill Senior Linux Cluster Administrator TeraGrid Campus Champion - UofA University of Arkansas ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
[OMPI users] round-robin scheduling question [hostfile]
Hi all, According to FAQ 14 (How do I control how my processes are scheduled across nodes?) [http://www.open-mpi.org/faq/?category=running#mpirun-scheduling], it says that the default scheduling policy is by slot and not by node. I'm curious why the default is "by slot" since I am thinking of explicitly specifying by node but I'm wondering if there is an issue which I haven't considered. I would think that one reason for "by node" is to distribute HDD access across machines [as is the case for me since my program is HDD access intensive]. Or perhaps I am mistaken? I'm now thinking that "by slot" is the default because processes with ranks that are close together might do similar tasks and you would want them on the same node? Is that the reason? Also, at the end of this FAQ, it says "NOTE: This is the scheduling policy in Open MPI because of a long historical precendent..." -- does this "This" refer to "the fact that there are two scheduling policies" or "the fact that 'by slot' is the default"? If the latter, then that explains why "by slot" is the default, I guess... Thank you! Ray
Re: [OMPI users] lammps MD code fails with Open MPI 1.3
It could be the same bug reported here http://www.open-mpi.org/community/lists/users/2009/02/8010.php Can you try a recent snapshot of 1.3.1 (http://www.open-mpi.org/nightly/v1.3/) to verify if this has been fixed --Nysal On Thu, 2009-02-19 at 16:09 -0600, Jeff Pummill wrote: > I built a fresh version of lammps v29Jan09 against Open MPI 1.3 which > in turn was built with Gnu compilers v4.2.4 on an Ubuntu 8.04 x86_64 > box. This Open MPI build was able to generate usable binaries such as > XHPL and NPB, but the lammps binary it generated was not usable. > > I tried it with a couple of different versions of the lammps source, > but to no avail. No errors during the builds and a binary was created, > but when executing the job it quickly exits with no messages other > than: > > jpummil@stealth:~$ mpirun -np 4 -hostfile > hosts /home/jpummil/lmp_Stealth-OMPI < in.testbench_small > LAMMPS (22 Jan 2008) > > Interestingly, I downloaded Open MPI 1.2.8, built it with the same > configure options I had used with 1.3, and it worked. > > I'm getting by fine with 1.2.8. I just wanted to file a possible bug > report on 1.3 and see if others have seen this behavior. > > Cheers! > > -- > Jeff F. Pummill > Senior Linux Cluster Administrator > TeraGrid Campus Champion - UofA > University of Arkansas > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users