Re: [OMPI users] IO performance
Tom/All, In case it is not already obvious, the GPFS Linux kernel module takes care of the interaction between the Linux IO stack, POSIX and the GPFS under layer. MPI-IO interacts with the thusly modified kernel through the POSIX API. Another item that is perhaps slightly off topic, but is something that provides a nice overview of some basic GPFS concepts and compares it to Lustre. It describes the mixed Lustre and GPFS storage architecture in use at NERSC. Hope you find it useful: http://www.cug.org/5-publications/proceedings_attendee_lists/CUG09CD/S09_Proceedings/pages/authors/01-5Monday/3A-Canon/canon-paper.pdf Cheers, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Miracles are delivered to order by great intelligence, or when it is absent, through the passage of time and a series of mere chance events. -- Max Headroom From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Tom Rosmond [rosm...@reachone.com] Sent: Monday, February 06, 2012 11:39 AM To: Open MPI Users Subject: Re: [OMPI users] IO performance Rob Thanks, these are the kind of suggestions I was looking for. I will try them. But I will have to twist some arms to get the 1.5 upgrade. I might just install a private copy for my tests. T. Rosmond On Mon, 2012-02-06 at 10:21 -0600, Rob Latham wrote: > On Fri, Feb 03, 2012 at 10:46:21AM -0800, Tom Rosmond wrote: > > With all of this, here is my MPI related question. I recently added an > > option to use MPI-IO to do the heavy IO lifting in our applications. I > > would like to know what the relative importance of the dedicated MPI > > network vis-a-vis the GPFS network for typical MPIIO collective reads > > and writes. I assume there must be some hand-off of data between the > > networks during the process, but how is it done, and are there any rules > > to help understand it. Any insights would be welcome. > > There's not really a handoff. MPI-IO on GPFS will call a posix read() > or write() system call after possibly doing some data massaging. That > system call sends data over the storage network. > > If you've got a fast communication network but a slow storage network, > then some of the MPI-IO optimizations will need to be adjusted a bit. > Seems like you'd want to really beef up the "cb_buffer_size". > > For GPFS, the big thing MPI-IO can do for you is align writes to > GPFS. see my next point. > > > P.S. I am running with Open-mpi 1.4.2. > > If you upgrade to something in the 1.5 series you will get some nice > ROMIO optimizations that will help you out with writes to GPFS if > you set the "striping_unit" hint to the GPFS block size. > > ==rob > ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Change is in the Air - Smoking in Designated Areas Only in effect.<http://www.csi.cuny.edu/tobaccofree> Tobacco-Free Campus as of July 1, 2012.
Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...
Gotz, Sorry, I was in a rush and missed that. Here is some further information the compiler options used by me for the 1.5.5 build: [richard.walsh@bob linux]$ pwd /share/apps/openmpi-intel/1.5.5/build/opal/mca/memory/linux [richard.walsh@bob linux]$ make -n malloc.o echo " CC" malloc.o;depbase=`echo malloc.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ icc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../orte/include -I../../../../ompi/include -I../../../../opal/mca/hwloc/hwloc122ompi/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc122ompi/hwloc/include/hwloc/autogen -DMALLOC_DEBUG=0 -D_GNU_SOURCE=1 -DUSE_TSD_DATA_HACK=1 -DMALLOC_HOOKS=1 -I./sysdeps/pthread -I./sysdeps/generic -I../../../.. -I/share/apps/openmpi-intel/1.5.5/build/opal/mca/hwloc/hwloc122ompi/hwloc/include -I/usr/include/infiniband -I/usr/include/infiniband -DNDEBUG -g -O2 -finline-functions -fno-strict-aliasing -restrict -pthread -I/share/apps/openmpi-intel/1.5.5/build/opal/mca/hwloc/hwloc122ompi/hwloc/include -MT malloc.o -MD -MP -MF $depbase.Tpo -c -o malloc.o malloc.c &&\ mv -f $depbase.Tpo $depbase.Po The entry point your code crashed in: opal_memory_ptmalloc2_int_malloc is renamed to: rename.h:#define _int_malloc opal_memory_ptmalloc2_int_malloc in the malloc.c routine in 1.5.5. Perhaps you should lower the optimization level to zero and see what you get. Sincerely, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Miracles are delivered to order by great intelligence, or when it is absent, through the passage of time and a series of mere chance events. -- Max Headroom From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Götz Waschk [goetz.was...@gmail.com] Sent: Tuesday, January 31, 2012 3:38 AM To: Open MPI Users Subject: Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ... On Mon, Jan 30, 2012 at 5:11 PM, Richard Walsh <richard.wa...@csi.cuny.edu> wrote: > I have not seen this mpirun error with the OpenMPI version I have built > with Intel 12.1 and the mpicc fix: > openmpi-1.5.5rc1.tar.bz2 Hi, I haven't tried that version yet. I was trying to build a supplementary package to the openmpi 1.5.3 shipped with RHEL6.2, the same source, just built using the Intel compiler. > and from the looks of things, I wonder if your problem is related. The > solution in the original case was to conditionally dial-down optimization > when using the 12.1 compiler to prevent the compiler itself from crashing > during a compile. What you present is a failure during execution. Such > failures might be due to over zealous optimization, but there seems to be > little reason on the face of it to believe that there is a connection between > the former and the latter. Well, the similarity is that it is also a crash in the malloc routine. I don't know if my optflags are too high, I have derived them from Red Hat's, replacing the options unkown to icc: -O2 -g -pipe -Wall -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=pentium4 > Does this failure occur with all attempts to use 'mpirun' whatever the source? > My 'mpicc' problem did. If this is true and If you believe it is an > optimization > level issue you could try turning it off in the failing routine and see if > that > produces a remedy. I would also try things with the very latest release. Yes, the mpicc crash happened every time, I could reproduce that. I have only tested the most basic code, the cpi.c example. The funny thing is, that mpirun -np 8 cpi doesn't always crash, sometimes it finishes just fine. Regards, Götz Waschk ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Change is in the Air - Smoking in Designated Areas Only in effect.<http://www.csi.cuny.edu/tobaccofree> Tobacco-Free Campus as of July 1, 2012.
Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...
Hey Gotz, I have not seen this mpirun error with the OpenMPI version I have built with Intel 12.1 and the mpicc fix: openmpi-1.5.5rc1.tar.bz2 and from the looks of things, I wonder if your problem is related. The solution in the original case was to conditionally dial-down optimization when using the 12.1 compiler to prevent the compiler itself from crashing during a compile. What you present is a failure during execution. Such failures might be due to over zealous optimization, but there seems to be little reason on the face of it to believe that there is a connection between the former and the latter. Does this failure occur with all attempts to use 'mpirun' whatever the source? My 'mpicc' problem did. If this is true and If you believe it is an optimization level issue you could try turning it off in the failing routine and see if that produces a remedy. I would also try things with the very latest release. Those are my thoughts ... good luck. rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Miracles are delivered to order by great intelligence, or when it is absent, through the passage of time and a series of mere chance events. -- Max Headroom From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Götz Waschk [goetz.was...@gmail.com] Sent: Monday, January 30, 2012 10:48 AM To: Open MPI Users Subject: Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ... Hi Richard, On Wed, Jan 4, 2012 at 4:06 PM, Richard Walsh <richard.wa...@csi.cuny.edu> wrote: > Moreover, this problem has been addressed with the following go-around > in the 1.5.5 OpenMPI release with the following fix in > opal/mca/memory/linux/malloc.c: > #ifdef __INTEL_COMPILER_BUILD_DATE > # if __INTEL_COMPILER_BUILD_DATE == 20110811 > #pragma GCC optimization_level 1 > # endif > #endif I have added this patch to openmpi 1.5.3. Previously, every mpicc would crash, now mpicc is fine. However, mpirun still crashes like this: % mpirun -np 8 cpi-openmpi [pax8e:13662] *** Process received signal *** [pax8e:13662] Signal: Segmentation fault (11) [pax8e:13662] Signal code: Address not mapped (1) [pax8e:13662] Failing at address: 0x10 [pax8e:13662] [ 0] /lib64/libpthread.so.0(+0xf4a0) [0x7f348be7b4a0] [pax8e:13662] [ 1] /usr/lib64/openmpi-intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x4b3) [0x7f348c817193] [pax8e:13662] [ 2] /usr/lib64/openmpi-intel/lib/libmpi.so.1(+0xefdd9) [0x7f348c815dd9] [pax8e:13662] [ 3] /usr/lib64/openmpi-intel/lib/libmpi.so.1(opal_class_initialize+0xaa) [0x7f348c8278aa] [pax8e:13662] [ 4] /usr/lib64/openmpi-intel/lib/openmpi/mca_btl_openib.so(+0x1d0af) [0x7f34874350af] [pax8e:13662] [ 5] /lib64/libpthread.so.0(+0x77f1) [0x7f348be737f1] [pax8e:13662] [ 6] /lib64/libc.so.6(clone+0x6d) [0x7f348bbb070d] [pax8e:13662] *** End of error message *** -- mpirun noticed that process rank 6 with PID 13662 on node pax8e.ifh.de exited on signal 11 (Segmentation fault). I am using RHEL6.1 and the affected Intel 12.1 compiler. Regards, Götz Waschk ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Change is in the Air - Smoking in Designated Areas Only in effect.<http://www.csi.cuny.edu/tobaccofree> Tobacco-Free Campus as of July 1, 2012.
Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...
Tim/All, Thanks ... !! ... your response is on target, version 12.1.0.233 of the Intel compiler has a vectorization bug. Moreover, this problem has been addressed with the following go-around in the 1.5.5 OpenMPI release with the following fix in opal/mca/memory/linux/malloc.c: /* With Intel Composer XE V12.1.0, release 2011.6.233, any launch */ /* fails, even before main(), due to a bug in the vectorizer (see */ /* https://svn.open-mpi.org/trac/ompi/changeset/25290). The fix is */ /* to disable vectorization by reducing the optimization level to */ /* -O1 for _int_malloc(). The only reliable method to identify */ /* release 2011.6.233 is the predefined __INTEL_COMPILER_BUILD_DATE */ /* macro, which will have the value 20110811 (Linux, Windows, and */ /* Mac OS X). (The predefined __INTEL_COMPILER macro is nonsense, */ /* , and both the 2011.6.233 and 2011.7.256 releases identify */ /* themselves as V12.1.0 from the -v command line option.) */ #ifdef __INTEL_COMPILER_BUILD_DATE # if __INTEL_COMPILER_BUILD_DATE == 20110811 #pragma GCC optimization_level 1 # endif #endif So, anyone with the NEWEST Intel compiler should either use the 1.5.5 release or add the above section to the malloc.c code. Note earlier releases have a slightly different directory location for the 'memory.c' code, but it is easy to find. Thanks Tim ... !! Sincerely, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Right, as the world goes, is only in question between equals in power, while the strong do what they can and the weak suffer what they must. -- Thucydides, 400 BC From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Tim Carlson [tim.carl...@pnl.gov] Sent: Tuesday, January 03, 2012 4:52 PM To: Open MPI Users Subject: Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ... On Tue, 3 Jan 2012, Richard Walsh wrote: OPAL has problems with the default optimization. See this thread on one of the Intel lists. vi opal/mca/memory/linux/malloc.c add #pragma optimize("", off) http://software.intel.com/en-us/forums/showthread.php?t=87132 > > Gus/All, > > Perhaps there is some confusion as to which 'new' Intel compiler > release/version I > am using. I am not using '12.0' ... I am using '12.1' ... > > OLD one that builds a working opal_wrapper: > > [richard.walsh@athena ~]$ icc -V > Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, > Version 12.0.0.084 Build 20101006 > Copyright (C) 1985-2010 Intel Corporation. All rights reserved. >^ > > NEW one that FAILS to build a working opal_wrapper: > > [root@zeus .libs]# icc -V > Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, > Version 12.1.0.233 Build 20110811 > Copyright (C) 1985-2011 Intel Corporation. All rights reserved. >^ > > This was in my original email. NOTE: that the non-working version is 12.1 > >>NOT<< 12.0 This '12.1' > version was released by Intel JUST BEFORE SC11 in October of 2011. > > Thanks, > > rbw > > > Richard Walsh > Parallel Applications and Systems Manager > CUNY HPC Center, Staten Island, NY > W: 718-982-3319 > M: 612-382-4620 > > Right, as the world goes, is only in question between equals in power, while > the strong do what they can and the weak suffer what they must. -- > Thucydides, 400 BC > > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of > Gustavo Correa [g...@ldeo.columbia.edu] > Sent: Tuesday, January 03, 2012 4:28 PM > To: Open MPI Users > Subject: Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 > Build 20110811) issues ... > > Hi Richard > > I have 1.4.4 built with Intel 12.0. It works. > > Any chance that your Intel-based OpenMPI was built from a source > directory that had been previously used to build the PGI-based OpenMPI, > and no 'make distclean' was issued in between the two builds, > nor a fresh build done from a brand new tarball? > Just a wild guess. > > I hope it helps, > Gus Correa > > On Jan 3, 2012, at 11:23 AM, Richard Walsh wrote: > >> >> Jonathan/All, >> >> Thanks for the information, but I continue to have problems. I dropped the >> 'openib' option to simplify things and focused my attention only on OpenMPI >> version 1.4.4 because you suggested it works. >> >> On the strength of the fact that the PGI 11.10 compiler works fine (all >> systems >> and all versions of OpenMPI), I ran a PGI build of 1.4.4 with
Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...
Gus/All, Perhaps there is some confusion as to which 'new' Intel compiler release/version I am using. I am not using '12.0' ... I am using '12.1' ... OLD one that builds a working opal_wrapper: [richard.walsh@athena ~]$ icc -V Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.0.0.084 Build 20101006 Copyright (C) 1985-2010 Intel Corporation. All rights reserved. ^ NEW one that FAILS to build a working opal_wrapper: [root@zeus .libs]# icc -V Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.0.233 Build 20110811 Copyright (C) 1985-2011 Intel Corporation. All rights reserved. ^ This was in my original email. NOTE: that the non-working version is 12.1 >>NOT<< 12.0 This '12.1' version was released by Intel JUST BEFORE SC11 in October of 2011. Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Right, as the world goes, is only in question between equals in power, while the strong do what they can and the weak suffer what they must. -- Thucydides, 400 BC From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Gustavo Correa [g...@ldeo.columbia.edu] Sent: Tuesday, January 03, 2012 4:28 PM To: Open MPI Users Subject: Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ... Hi Richard I have 1.4.4 built with Intel 12.0. It works. Any chance that your Intel-based OpenMPI was built from a source directory that had been previously used to build the PGI-based OpenMPI, and no 'make distclean' was issued in between the two builds, nor a fresh build done from a brand new tarball? Just a wild guess. I hope it helps, Gus Correa On Jan 3, 2012, at 11:23 AM, Richard Walsh wrote: > > Jonathan/All, > > Thanks for the information, but I continue to have problems. I dropped the > 'openib' option to simplify things and focused my attention only on OpenMPI > version 1.4.4 because you suggested it works. > > On the strength of the fact that the PGI 11.10 compiler works fine (all > systems > and all versions of OpenMPI), I ran a PGI build of 1.4.4 with the '-showme' > option (Intel fails immediately, even with '-showme' ... ). I then > substituted all > the PGI-related strings with Intel-related strings to compile directly and > explicitly > outside the 'opal' wrapper using code and libraries in the Intel build tree > of 1.4.4, > as follows: > > pgcc -o ./hw2.exe hw2.c -I/share/apps/openmpi-pgi/1.4.4/include > -L/share/apps/openmpi-pgi/1.4.4/lib -lmpi -lopen-rte -lopen-pal -ldl > -Wl,--export-dynamic -lnsl -lutil -ldl > > becomes ... > > icc -o ./hw2.exe hw2.c -I/share/apps/openmpi-intel/1.4.4/include > -L/share/apps/openmpi-intel/1.4.4/lib -lmpi -lopen-rte -lopen-pal -ldl > -Wl,--export-dynamic -lnsl -lutil -ldl > > Interestingly, this direct-explicit Intel compile >>WORKS FINE<< (no segment > fault like with the wrapped version) > and the executable produced also >>RUNS FINE<<. So ... it looks to me like > there is something wrong with using > the 'opal' wrappper generated-used in the Intel build. > > Can someone make a suggestion ... ?? I would like to use the wrappers of > course. > > Thanks, > > rbw > > Richard Walsh > Parallel Applications and Systems Manager > CUNY HPC Center, Staten Island, NY > W: 718-982-3319 > M: 612-382-4620 > > Right, as the world goes, is only in question between equals in power, while > the strong do what they can and the weak suffer what they must. -- > Thucydides, 400 BC > > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of > Jonathan Dursi [ljdu...@scinet.utoronto.ca] > Sent: Tuesday, December 20, 2011 4:48 PM > To: Open Users > Subject: Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 > Build 20110811) issues ... > > For what it's worth, 1.4.4 built with the intel 12.1.0.233 compilers has been > the default mpi at our centre for over a month and we haven't had any > problems... > > - jonathan > -- > Jonathan Dursi; SciNet, Compute/Calcul Canada > > -Original Message- > From: Richard Walsh <richard.wa...@csi.cuny.edu> > Sender: users-boun...@open-mpi.org > Date: Tue, 20 Dec 2011 21:14:44 > To: Open MPI Users<us...@open-mpi.org> > Reply-To: Open MPI Users <us...@open-mpi.org> > Subject: Re: [OMPI users] Latest Intel Compilers (ICS, > version 12.1.0.233 Build 20110811) issues ... > > > All, > > I have not heard anything back on the inqu
Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...
Jonathan/All, Thanks for the information, but I continue to have problems. I dropped the 'openib' option to simplify things and focused my attention only on OpenMPI version 1.4.4 because you suggested it works. On the strength of the fact that the PGI 11.10 compiler works fine (all systems and all versions of OpenMPI), I ran a PGI build of 1.4.4 with the '-showme' option (Intel fails immediately, even with '-showme' ... ). I then substituted all the PGI-related strings with Intel-related strings to compile directly and explicitly outside the 'opal' wrapper using code and libraries in the Intel build tree of 1.4.4, as follows: pgcc -o ./hw2.exe hw2.c -I/share/apps/openmpi-pgi/1.4.4/include -L/share/apps/openmpi-pgi/1.4.4/lib -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -ldl becomes ... icc -o ./hw2.exe hw2.c -I/share/apps/openmpi-intel/1.4.4/include -L/share/apps/openmpi-intel/1.4.4/lib -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -ldl Interestingly, this direct-explicit Intel compile >>WORKS FINE<< (no segment fault like with the wrapped version) and the executable produced also >>RUNS FINE<<. So ... it looks to me like there is something wrong with using the 'opal' wrappper generated-used in the Intel build. Can someone make a suggestion ... ?? I would like to use the wrappers of course. Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Right, as the world goes, is only in question between equals in power, while the strong do what they can and the weak suffer what they must. -- Thucydides, 400 BC From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Jonathan Dursi [ljdu...@scinet.utoronto.ca] Sent: Tuesday, December 20, 2011 4:48 PM To: Open Users Subject: Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ... For what it's worth, 1.4.4 built with the intel 12.1.0.233 compilers has been the default mpi at our centre for over a month and we haven't had any problems... - jonathan -- Jonathan Dursi; SciNet, Compute/Calcul Canada -Original Message- From: Richard Walsh <richard.wa...@csi.cuny.edu> Sender: users-boun...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Tue, 20 Dec 2011 21:14:44 To: Open MPI Users<us...@open-mpi.org> Reply-To: Open MPI Users <us...@open-mpi.org> Subject: Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ... All, I have not heard anything back on the inquiry below, so I take it that no one has had any issues with Intel's latest compiler release, or perhaps has not tried it yet. Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Right, as the world goes, is only in question between equals in power, while the strong do what they can and the weak suffer what they must. -- Thucydides, 400 BC From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Richard Walsh [richard.wa...@csi.cuny.edu] Sent: Friday, December 16, 2011 3:12 PM To: Open MPI Users Subject: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ... All, Working through a stock rebuild of OpenMPI 1.5.4 and 1.4.4 with the most current compiler suites from both PGI and Intel: 1. PGI, Version 11.10 2. Intel, Version 12.1.0.233 Build 20110811 My 1.5.4 'config.log' header looks like this for Intel: ./configure CC=icc CXX=icpc F77=ifort FC=ifort --with-openib --prefix=/share/apps/openmpi-intel/1.5.4 --with-tm=/share/apps/pbs/11.1.0.111761 and this for PGI: ./configure CC=pgcc CXX=pgCC F77=pgf77 FC=pgf90 --with-openib --prefix=/share/apps/openmpi-pgi/1.5.4 --with-tm=/share/apps/pbs/11.1.0.111761 This configure line has been used successfully before. Configuration, build, and install for BOTH compilers seems to work OK; however, my 'mpicc' build of my basic test program ONLY works with the PGI built version of 'mpicc' for either the 1.4.4 or the 1.5.4 will compile the code. The Intel 1.4.4 and 1.5.4 'mpicc' wrapper-compilers produce an immediate segmentation fault: .[richard.walsh@bob pbs]$ ./compile_it ./compile_it: line 10: 19163 Segmentation fault /share/apps/openmpi-intel/1.5.4/bin/mpicc -o ./hello_mpi.exe hello_mpi.c [richard.walsh@bob pbs]$ [richard.walsh@bob pbs]$ ./compile_it ./compile_it: line 10: 19515 Segmentation fault /share/apps/openmpi-intel/1.4.4/bin/mpicc -o ./hello_mpi.exe hello_mpi.c This Intel stack is from the most recent release of their ICS released in late October before SC11: [richard.walsh@bob pbs]$ icc -V Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.0.233 Build 20110811 Copyr
Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...
All, I have not heard anything back on the inquiry below, so I take it that no one has had any issues with Intel's latest compiler release, or perhaps has not tried it yet. Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Right, as the world goes, is only in question between equals in power, while the strong do what they can and the weak suffer what they must. -- Thucydides, 400 BC From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Richard Walsh [richard.wa...@csi.cuny.edu] Sent: Friday, December 16, 2011 3:12 PM To: Open MPI Users Subject: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ... All, Working through a stock rebuild of OpenMPI 1.5.4 and 1.4.4 with the most current compiler suites from both PGI and Intel: 1. PGI, Version 11.10 2. Intel, Version 12.1.0.233 Build 20110811 My 1.5.4 'config.log' header looks like this for Intel: ./configure CC=icc CXX=icpc F77=ifort FC=ifort --with-openib --prefix=/share/apps/openmpi-intel/1.5.4 --with-tm=/share/apps/pbs/11.1.0.111761 and this for PGI: ./configure CC=pgcc CXX=pgCC F77=pgf77 FC=pgf90 --with-openib --prefix=/share/apps/openmpi-pgi/1.5.4 --with-tm=/share/apps/pbs/11.1.0.111761 This configure line has been used successfully before. Configuration, build, and install for BOTH compilers seems to work OK; however, my 'mpicc' build of my basic test program ONLY works with the PGI built version of 'mpicc' for either the 1.4.4 or the 1.5.4 will compile the code. The Intel 1.4.4 and 1.5.4 'mpicc' wrapper-compilers produce an immediate segmentation fault: .[richard.walsh@bob pbs]$ ./compile_it ./compile_it: line 10: 19163 Segmentation fault /share/apps/openmpi-intel/1.5.4/bin/mpicc -o ./hello_mpi.exe hello_mpi.c [richard.walsh@bob pbs]$ [richard.walsh@bob pbs]$ ./compile_it ./compile_it: line 10: 19515 Segmentation fault /share/apps/openmpi-intel/1.4.4/bin/mpicc -o ./hello_mpi.exe hello_mpi.c This Intel stack is from the most recent release of their ICS released in late October before SC11: [richard.walsh@bob pbs]$ icc -V Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.0.233 Build 20110811 Copyright (C) 1985-2011 Intel Corporation. All rights reserved. [richard.walsh@bob pbs]$ ifort -V Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.0.233 Build 20110811 Copyright (C) 1985-2011 Intel Corporation. All rights reserved. Has anyone else encountered this problem ... ?? Suggestions ... ?? Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Right, as the world goes, is only in question between equals in power, while the strong do what they can and the weak suffer what they must. -- Thucydides, 400 BC Change is in the Air - Smoking in Designated Areas Only in effect.<http://www.csi.cuny.edu/tobaccofree> Tobacco-Free Campus as of July 1, 2012. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Change is in the Air - Smoking in Designated Areas Only in effect.<http://www.csi.cuny.edu/tobaccofree> Tobacco-Free Campus as of July 1, 2012.
[OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...
All, Working through a stock rebuild of OpenMPI 1.5.4 and 1.4.4 with the most current compiler suites from both PGI and Intel: 1. PGI, Version 11.10 2. Intel, Version 12.1.0.233 Build 20110811 My 1.5.4 'config.log' header looks like this for Intel: ./configure CC=icc CXX=icpc F77=ifort FC=ifort --with-openib --prefix=/share/apps/openmpi-intel/1.5.4 --with-tm=/share/apps/pbs/11.1.0.111761 and this for PGI: ./configure CC=pgcc CXX=pgCC F77=pgf77 FC=pgf90 --with-openib --prefix=/share/apps/openmpi-pgi/1.5.4 --with-tm=/share/apps/pbs/11.1.0.111761 This configure line has been used successfully before. Configuration, build, and install for BOTH compilers seems to work OK; however, my 'mpicc' build of my basic test program ONLY works with the PGI built version of 'mpicc' for either the 1.4.4 or the 1.5.4 will compile the code. The Intel 1.4.4 and 1.5.4 'mpicc' wrapper-compilers produce an immediate segmentation fault: .[richard.walsh@bob pbs]$ ./compile_it ./compile_it: line 10: 19163 Segmentation fault /share/apps/openmpi-intel/1.5.4/bin/mpicc -o ./hello_mpi.exe hello_mpi.c [richard.walsh@bob pbs]$ [richard.walsh@bob pbs]$ ./compile_it ./compile_it: line 10: 19515 Segmentation fault /share/apps/openmpi-intel/1.4.4/bin/mpicc -o ./hello_mpi.exe hello_mpi.c This Intel stack is from the most recent release of their ICS released in late October before SC11: [richard.walsh@bob pbs]$ icc -V Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.0.233 Build 20110811 Copyright (C) 1985-2011 Intel Corporation. All rights reserved. [richard.walsh@bob pbs]$ ifort -V Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.0.233 Build 20110811 Copyright (C) 1985-2011 Intel Corporation. All rights reserved. Has anyone else encountered this problem ... ?? Suggestions ... ?? Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Right, as the world goes, is only in question between equals in power, while the strong do what they can and the weak suffer what they must. -- Thucydides, 400 BC Change is in the Air - Smoking in Designated Areas Only in effect.<http://www.csi.cuny.edu/tobaccofree> Tobacco-Free Campus as of July 1, 2012.
Re: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software?
Jeff, We have 3 Rocks Clusters, while there is a default MPI with each Rocks Release, it is often behind the latest production release as you note. We typically install whatever OpenMPI version we want in a shared space and ignore the default installed with Rocks. Sometimes there standard Linux libraries that can be a bit out of date which may be registered as "can't finds" in the configuration and/or buiild of OpenMPI, but there usually an easy go around. As far as 'closely intertwined' goes, I would say that is an exaggeration. It does mean some extra work for someone ... around here is it me ... ;-) ... rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Reason does give the heart pause; As the heart gives reason fits. Yet, to live where reason always rules; Is to kill one's heart with wits. From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Jeffrey A Cummings [jeffrey.a.cummi...@aero.org] Sent: Tuesday, February 01, 2011 5:02 PM To: us...@open-mpi.org Subject: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software? I use OpenMPI on a variety of platforms: stand-alone servers running Solaris on sparc boxes and Linux (mostly CentOS) on AMD/Intel boxes, also Linux (again CentOS) on large clusters of AMD/Intel boxes. These platforms all have some version of the 1.3 OpenMPI stream. I recently requested an upgrade on all systems to 1.4.3 (for production work) and 1.5.1 (for experimentation). I'm getting a lot of push back from the SysAdmin folks claiming that OpenMPI is closely intertwined with the specific version of the operating system and/or other system software (i.e., Rocks on the clusters). I need to know if they are telling me the truth or if they're just making excuses to avoid the work. To state my question another way: Apparently each release of Linux and/or Rocks comes with some version of OpenMPI bundled in. Is it dangerous in some way to upgrade to a newer version of OpenMPI? Thanks in advance for any insight anyone can provide. - Jeff Think green before you print this email.
Re: [OMPI users] hdf5 build error using openmpi and Intel Fortran
All, Regarding building HD5 ... the OpenMPI 1.4.1 wrapper using the May 2010 release of the Intel Compiler Toolkit Cluster Edition (ICTCE) worked for me. Here is my config.log header: $ ./configure CC=mpicc CXX=mpiCC F77=mpif77 FC=mpif90 --enable-parallel --prefix=/share/apps/hdf5/1.8.4p --with-zlib=/share/apps/zlib/1.2.3/lib --with-szlib=/share/apps/szip/2.1/lib --disable-shared With some tweaking I was able to build the whole WRF and NCAR-NCL stack here. Regards, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Reason does give the heart pause; As the heart gives reason fits. Yet, to live where reason always rules; Is to kill one's heart with wits. From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Gus Correa [g...@ldeo.columbia.edu] Sent: Friday, October 08, 2010 1:58 PM To: Open MPI Users Subject: Re: [OMPI users] hdf5 build error using openmpi and Intel Fortran Jeff Squyres wrote: > On Oct 8, 2010, at 1:00 PM, Götz Waschk wrote: > >> I have solved this problem myself. The trick is not to use the >> compiler wrappers but icc and ifort directly. But in that case you'll >> have to link to libmpi_f77 manually and set the variable RUNPARALLEL >> to a working mpirun command. > > Strange. > > Be sure to see: > http://www.open-mpi.org/faq/?category=mpi-apps#cant-use-wrappers > Hi Jeff Sadly, it is not only HDF5. There are several public domain parallel programs which we use here that refuse to build using the mpi wrappers (from OpenMPI, MPICH2, or MVAPICH2), and require the same "deconstruction" of configuration and make files that Gotz probably had to face. Typically the Makefiles have hardwired items that cannot be overridden (e.g. -lmpich for the MPI library name passed to the linker, compiler options and flags that are vendor-specific, logic that separates MPI, OpenMP, and serial compilation, etc). I have used the FAQ link you sent many times in this regard. I would guess that even more than the end users of MPI, it is the developers of public domain software (and of software projects funded by public grants) who need to be convinced that the right thing to do is to ensure that MPI wrappers will compile their software without problems. My $0.02 Gus Correa ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Think green before you print this email.
Re: [OMPI users] Continued functionality across a SLES10 to SLES11 upgrade ...
Jeff Squyres wrote: >Probably your best bet would be: > >- investigate if there's a missing symbol or library in the current >mca_btl_openib.so (e.g., run nm on mca_btl_openib.so and ensure that all those >libraries are >present in SLES 11) >- if it's a missing library, see if you can supply a dummy library to make > it work (that may involve a little trickery) >- recompile OMPI 1.4.2 under SLES 11 >- copy in the mca_btl_openib.so from that install to your old OMPI install >- run some apps and see if it works >- if it does, relax, have a beer^H^H^H^Hnon-cafinated tea >- if it does not work, you may have to go the recompile-everything route Thanks for the very useful suggestions. I have already attempted a rebuild under SLES 11 and found that there looks to be an IB RPM missing from SGI's default configuration of SLES 11. So, I will get that and try both to rerun under SLES 11 and recompile. The implication of your reply is that if the symbols/libraries are all there then things should work. The idea of dropping in a new: mcs_btl_openib.so built under SLES 11 to get around the problem seems to be the most palatable (and clever idea). Thanks much. I will report back. Regards, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Reason does give the heart pause; As the heart gives reason fits. Yet, to live where reason always rules; Is to kill one's heart with wits. From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Jeff Squyres [jsquy...@cisco.com] Sent: Wednesday, September 22, 2010 9:07 AM To: Open MPI Users Subject: Re: [OMPI users] Continued functionality across a SLES10 to SLES11 upgrade ... On Sep 20, 2010, at 1:20 PM, Richard Walsh wrote: > I was not expecting things to work, and find that codes compiled using > OpenMPI 1.4.1 commands under SLES 10.2 produce the following message > when run under SLES11: > > mca: base: component_find: unable to open > /share/apps/openmpi-intel/1.4.1/lib/openmpi/mca_btl_openib: perhaps a missing > symbol, or compiled for a different version of Open MPI? (ignored) > > This file is in position and is NOT the result of a faulty mixed-release > over-build > (things work great under SLES10.2). > > The message indicates that (as the default is to build OpenMPI dynamically > with share objects) in loading this required IB-related library there must > be a format incompatibility. However, I find that if I force the use of GE > with: > > -mca btl tcp,self > > things seem to run OK under SLES 11. > > Could someone add some detail here on what, if anything, I can expect to > work when we try to run old SLES 10.2 build OpenMPI 1.4.1 binaries under > SLES 11. I would have thought NOTHING, but maybe that is not quite right. I do not have any experience with SLES, so I can't comment for sure. But I'd *guess* that there was a symbol change between 10.2 and 11 in the OpenFabrics libraries such that the openib BTL is unable to find a symbol that it needs. Another possibility is the dependent libraries of libibverbs.so changed (e.g., perhaps libibverbs.so required -lsysfs in 10.2, but then libsysfs.so doesn't exist in 11...?). Does the SLES release notes say anything about binary compatibility (particularly of the OpenFabrics libraries) between SLES 10.2 and 11? I'm quite sure that recompiling all of OMPI should make it work -- I'd be very surprised if the OpenFabrics libraries in SLES 11 were inconsistent such that you couldn't just rebuild and have it work. You may be able to recompile *just the openib BTL module* on SLES 11, drop it in your OMPI 1.4.2 installation, and have it work again. But that's not a guarantee -- other things may have changed such that a recompile may change some struct sizes or somesuch. Probably your best bet would be: - investigate if there's a missing symbol or library in the current mca_btl_openib.so (e.g., run nm on mca_btl_openib.so and ensure that all those libraries are present in SLES 11) - if it's a missing library, see if you can supply a dummy library to make it work (that may involve a little trickery) - recompile OMPI 1.4.2 under SLES 11 - copy in the mca_btl_openib.so from that install to your old OMPI install - run some apps and see if it works - if it does, relax, have a beer^H^H^H^Hnon-cafinated tea - if it does not work, you may have to go the recompile-everything route -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Think green before you print this email.
[OMPI users] Continued functionality across a SLES10 to SLES11 upgrade ...
All, I was not expecting things to work, and find that codes compiled using OpenMPI 1.4.1 commands under SLES 10.2 produce the following message when run under SLES11: mca: base: component_find: unable to open /share/apps/openmpi-intel/1.4.1/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) This file is in position and is NOT the result of a faulty mixed-release over-build (things work great under SLES10.2). The message indicates that (as the default is to build OpenMPI dynamically with share objects) in loading this required IB-related library there must be a format incompatibility. However, I find that if I force the use of GE with: -mca btl tcp,self things seem to run OK under SLES 11. Could someone add some detail here on what, if anything, I can expect to work when we try to run old SLES 10.2 build OpenMPI 1.4.1 binaries under SLES 11. I would have thought NOTHING, but maybe that is not quite right. Perhaps we can run using GE under SLES 11 with the old binaries until I get things recompiled (ugh!) under SLES 11? Thanks, Richard Walsh Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Reason does give the heart pause; As the heart gives reason fits. Yet, to live where reason always rules; Is to kill one's heart with wits. Think green before you print this email.
Re: [OMPI users] Does OpenMPI 1.4.1 support the MPI_IN_PLACE designation ...
Hey Yong, This is very helpful ... I have spent the morning verifying that OCTOPUS 3.2 code is correct and that even other sections of the code that use: MPI_IN_PLACE are compiled without a problem. Both working and the non-working routine properly include "use mpi_h" module which is built from mpi.F90 and includes: #include "mpif.h" An examination of the symbols in mpi_m.mod with: strings mpi_m.mod Shows that MPI_IN_PLACE is "in place" ... ;-) ... I was going to try to figure this out by morphing the working non-working "states_oct.f90" into one of the routines that works but this will save me the trouble. Swapping the two modules as suggested works. Thanks! rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Reason does give the heart pause; As the heart gives reason fits. Yet, to live where reason always rules; Is to kill one's heart with wits. From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Yong Qin [yong...@gmail.com] Sent: Tuesday, August 17, 2010 12:41 PM To: us...@open-mpi.org Subject: Re: [OMPI users] Does OpenMPI 1.4.1 support the MPI_IN_PLACE designation ... Hi Richard, We have reported this to Intel as a bug in 11.1.072. If I understand it correctly you are also compiling Octopus with Intel 11.1.072. As we have tested, Intel compilers 11.1.064 and all the 10.x, GNU, PGI, etc., do not exhibit this issue at all. We are still waiting for words from Intel. But in the mean time, a workaround (revision 6839) has been submitted to the trunk. The workaround is actually fairly simple, you just need to switch the order of "use parser_m" and "use mpi_m" in states.F90. Thanks, Yong Qin > Message: 4 > Date: Mon, 16 Aug 2010 18:55:47 -0400 > From: Richard Walsh <richard.wa...@csi.cuny.edu> > Subject: [OMPI users] Does OpenMPI 1.4.1 support the MPI_IN_PLACE >designation ... > To: Open MPI Users <us...@open-mpi.org> > Message-ID: ><5e9838fe224683419f586d9df46a0e25b049898...@mbox.flas.csi.cuny.edu> > Content-Type: text/plain; charset="us-ascii" > > > All, > > I have a fortran code (Octopus 3.2) that is bombing during a build in a > routine that uses: > > call MPI_Allreduce(MPI_IN_PLACE, rho(1, ispin), np, MPI_DOUBLE_PRECISION, > MPI_SUM, st%mpi_grp%comm, mpi_err) > > with the error message: > > states.F90(1240): error #6404: This name does not have a type, and must have > an explicit type. [MPI_IN_PLACE] >call MPI_Allreduce(MPI_IN_PLACE, rho(1, ispin), np, > MPI_DOUBLE_PRECISION, MPI_SUM, st%mpi_grp%comm, mpi_err) > -------^ > compilation aborted for states_oct.f90 (code 1) > > This suggests that MPI_IN_PLACE is missing from the mpi.h header. > > Any thoughts? > > rbw > > Richard Walsh > Parallel Applications and Systems Manager > CUNY HPC Center, Staten Island, NY > 718-982-3319 > 612-382-4620 > > Reason does give the heart pause; > As the heart gives reason fits. > > Yet, to live where reason always rules; > Is to kill one's heart with wits. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Think green before you print this email.
[OMPI users] Does OpenMPI 1.4.1 support the MPI_IN_PLACE designation ...
All, I have a fortran code (Octopus 3.2) that is bombing during a build in a routine that uses: call MPI_Allreduce(MPI_IN_PLACE, rho(1, ispin), np, MPI_DOUBLE_PRECISION, MPI_SUM, st%mpi_grp%comm, mpi_err) with the error message: states.F90(1240): error #6404: This name does not have a type, and must have an explicit type. [MPI_IN_PLACE] call MPI_Allreduce(MPI_IN_PLACE, rho(1, ispin), np, MPI_DOUBLE_PRECISION, MPI_SUM, st%mpi_grp%comm, mpi_err) ---^ compilation aborted for states_oct.f90 (code 1) This suggests that MPI_IN_PLACE is missing from the mpi.h header. Any thoughts? rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Reason does give the heart pause; As the heart gives reason fits. Yet, to live where reason always rules; Is to kill one's heart with wits. From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Gokhan Kir [g...@iastate.edu] Sent: Monday, August 16, 2010 5:43 PM To: us...@open-mpi.org Subject: [OMPI users] A Problem with RAxML Hi, I am currently using RAxML 7.0, and recently I got a problem. Even though I Googled it, I couldn't find a satisfying answer. I got this message from BATCH_ERRORs file " raxmlHPC-MPI: topologies.c:179: restoreTL: Assertion `n >= 0 && n < rl->max' failed. " Any help is appreciated, Thanks, -- Gokhan Think green before you print this email.
Re: [OMPI users] A Problem with RAxML
Hey Gokhan, The following worked for me with OpenMPI 1.4.1 with the latest Intel compiler (May release) although there have been reports that with full vectorization there are some unexplained inflight failures: # # Parallel Version # service0:/share/apps/raxml/7.0.4/build # make -f Makefile.MPI mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o axml.o axml.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o raxmlParsimony.o raxmlParsimony.c mpicc -c -o rev_functions.o rev_functions.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o optimizeModel.o optimizeModel.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o multiple.o multiple.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o searchAlgo.o searchAlgo.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o topologies.o topologies.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o parsePartitions.o parsePartitions.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o treeIO.o treeIO.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o models.o models.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o bipartitionList.o bipartitionList.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o rapidBootstrap.o rapidBootstrap.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o evaluatePartialGeneric.o evaluatePartialGeneric.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o evaluateGeneric.o evaluateGeneric.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o newviewGeneric.o newviewGeneric.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o makenewzGeneric.o makenewzGeneric.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o evaluateGenericVector.o evaluateGenericVector.c mpicc -O3 -DPARALLEL -fomit-frame-pointer -funroll-loops -c -o categorizeGeneric.o categorizeGeneric.c mpicc -o raxmlHPC-MPI axml.o raxmlParsimony.o rev_functions.o optimizeModel.o multiple.o searchAlgo.o topologies.o parsePartitions.o treeIO.o models.o bipartitionList.o rapidBootstrap.o evaluatePartialGeneric.o evaluateGeneric.o newviewGeneric.o makenewzGeneric.o evaluateGenericVector.o categorizeGeneric.o -lm The lastest PGI-built OpenMPI 1.4.1 release is said to behave correctly with the following flags regardless of the level of optimization. I have both versions installed. Both compile and link without error for me. This is with and IB built OpenMPI. CC = /share/apps/openmpi-pgi/default/bin/mpicc CFLAGS = -O3 -DPARALLEL -Mnoframe -Munroll Hope this is useful ... rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Reason does give the heart pause; As the heart gives reason fits. Yet, to live where reason always rules; Is to kill one's heart with wits. From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Gokhan Kir [g...@iastate.edu] Sent: Monday, August 16, 2010 5:43 PM To: us...@open-mpi.org Subject: [OMPI users] A Problem with RAxML Hi, I am currently using RAxML 7.0, and recently I got a problem. Even though I Googled it, I couldn't find a satisfying answer. I got this message from BATCH_ERRORs file " raxmlHPC-MPI: topologies.c:179: restoreTL: Assertion `n >= 0 && n < rl->max' failed. " Any help is appreciated, Thanks, -- Gokhan Think green before you print this email.
Re: [OMPI users] Address not mapped segmentation fault with1.4.2 ...
Jeff, OK ... I rebuilt without --with-tm= and as predicted my test case runs (I left the IB flags in). I then ran a job with just: pbsdsh hostname on 16 nodes and that also worked. I know that 1.4.1 works although it was build pointing into the old PBS Pro version tree explicitly. I have checked and rechecked the environmental variables and everything else that could lead to some mixed-up version cross referencing. I am tempted to build 1.4.2 with the explicit -with-tm= version path instead of using the symlink to default, but I cannot think of a logical reason why that should do anything. I have also reported this to the PBS Pro support folks. Thanks for the suggestions, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Mighty the Wizard Who found me at sunrise Sleeping, and woke me And learn'd me Magic! From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Jeff Squyres [jsquy...@cisco.com] Sent: Thursday, June 10, 2010 6:34 PM To: Open MPI Users Subject: Re: [OMPI users] Address not mapped segmentation fault with1.4.2 ... On Jun 10, 2010, at 5:49 PM, Richard Walsh wrote: > OK ... so if I follow your lead and build a version without PBS --tm= > integration > and it works, I should be able to report this as an incompatibility bug > between > the latest version of PBS Pro (10.2.0.93147) and the latest version of OpenMPI > (1.4.2). right? Do I report that you to my friends at OpenMPI or my friends > at > PBS Pro (Altair), or both? I'd say both. But it would be quite surprising if tm_init() it wholly broken -- it's the very first function that has to be invoked. I'm not a PBS user, so I don't know/remember the PBS commands offhand, but I have a dim recollection of a few PBS-provided TM-using tools (pbsdsh or somesuch?). You might want to try those, too, and see if they work/fail. If it really is a problem, I'm guessing it'll be a compiler/linker issue somehow... (e.g., how we're compiling/linking is not matching the compilation/linker style of the TM library) That's a SWAG. :-) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Think green before you print this email.
Re: [OMPI users] Address not mapped segmentation fault with1.4.2 ...
Jeff/All, OK ... so if I follow your lead and build a version without PBS --tm= integration and it works, I should be able to report this as an incompatibility bug between the latest version of PBS Pro (10.2.0.93147) and the latest version of OpenMPI (1.4.2). right? Do I report that you to my friends at OpenMPI or my friends at PBS Pro (Altair), or both? Thanks for your help. I will let you know what the result is ... rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Mighty the Wizard Who found me at sunrise Sleeping, and woke me And learn'd me Magic! From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Jeff Squyres [jsquy...@cisco.com] Sent: Thursday, June 10, 2010 11:52 AM To: Open MPI Users Subject: Re: [OMPI users] Address not mapped segmentation fault with1.4.2 ... Not offhand, but just to close the loop on a question from your first mail: this should not be a memory manager issue (i.e., not related to IB). As Ralph noted, this is a segv in the launcher (mpirun, in this case) -- in the tm_init() function call (TM is the launcher helper library in PBS/Torque). Open MPI (mpirun, in this case) calls tm_init() to setup the PBS launcher -- it's the first PBS-specific function call that we make. If tm_init() fails, it may indicate that something fairly basic is busted in that support library. On Jun 10, 2010, at 11:12 AM, Richard Walsh wrote: > > Ralph/Jeff, > > Yes, the change was intentional. I have upgraded PBS as well and built > 1.4.2 pointing to the new PBS via a symbolic link to 'default' which allows > one > to control the actual default without changing the path. I did the same thing > on the non-IB system which seems to be working fine with 1.4.2. This would > suggest that this is not the issue. > > It is possible that the PBS build in the IB system was flawed, but it looked > normal. I could rebuild it. The PBS libraries (as well as MPI) are in a > shared > location that is NFS mounted on the compute nodes so things should be in > sync, but I will verify this. > > Any other suggestions ... ?? > > rbw > > >Richard Walsh >Parallel Applications and Systems Manager >CUNY HPC Center, Staten Island, NY >718-982-3319 >612-382-4620 > >Mighty the Wizard >Who found me at sunrise >Sleeping, and woke me >And learn'd me Magic! > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of > Jeff Squyres [jsquy...@cisco.com] > Sent: Thursday, June 10, 2010 11:00 AM > To: Open MPI Users > Subject: Re: [OMPI users] Address not mapped segmentation fault with 1.4.2 > ... > > On Jun 10, 2010, at 10:57 AM, Ralph Castain wrote: > > > That error would indicate something wrong with the pbs connection - it is > > tm_init that is crashing. I note that you did --with-tm pointing to a > > different location - was that intentional? Could be something wrong with > > that pbs build > > ...and make sure that the support libs for TM/PBS are the same between the > node you're building on and all the nodes where OMPI will be running. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > Think green before you print this email. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Think green before you print this email.
Re: [OMPI users] Address not mapped segmentation fault with 1.4.2 ...
Ralph/Jeff, Yes, the change was intentional. I have upgraded PBS as well and built 1.4.2 pointing to the new PBS via a symbolic link to 'default' which allows one to control the actual default without changing the path. I did the same thing on the non-IB system which seems to be working fine with 1.4.2. This would suggest that this is not the issue. It is possible that the PBS build in the IB system was flawed, but it looked normal. I could rebuild it. The PBS libraries (as well as MPI) are in a shared location that is NFS mounted on the compute nodes so things should be in sync, but I will verify this. Any other suggestions ... ?? rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Mighty the Wizard Who found me at sunrise Sleeping, and woke me And learn'd me Magic! From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Jeff Squyres [jsquy...@cisco.com] Sent: Thursday, June 10, 2010 11:00 AM To: Open MPI Users Subject: Re: [OMPI users] Address not mapped segmentation fault with 1.4.2 ... On Jun 10, 2010, at 10:57 AM, Ralph Castain wrote: > That error would indicate something wrong with the pbs connection - it is > tm_init that is crashing. I note that you did --with-tm pointing to a > different location - was that intentional? Could be something wrong with that > pbs build ...and make sure that the support libs for TM/PBS are the same between the node you're building on and all the nodes where OMPI will be running. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Think green before you print this email.
[OMPI users] Address not mapped segmentation fault with 1.4.2 ...
All, I am upgrading from 1.4.1 to 1.4.2 on both a cluster with IB and one without. I have no problem on the GE cluster without IB which requires no special configure options for the IB. 1.4.2 works perfectly there with both the latest Intel and PGI compiler. On the IB system 1.4.1 has worked fine with the following configure line: ./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm --with-openib --prefix=/share/apps/openmpi-intel/1.4.1 --with-tm=/share/apps/pbs/10.1.0.91350 I have now built 1.4.2. with the almost identical: $ ./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm --with-openib --prefix=/share/apps/openmpi-intel/1.4.2 --with-tm=/share/apps/pbs/default When I run a basic MPI test program with: /share/apps/openmpi-intel/1.4.2/bin/mpirun -np 16 -machinefile $PBS_NODEFILE ./hello_mpi.exe which defaults to using the IB switch, or with: /share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl tcp,self -np 16 -machinefile $PBS_NODEFILE ./hello_mpi.exe which forces the use of GE, I get the same error: [compute-0-3:22515] *** Process received signal *** [compute-0-3:22515] Signal: Segmentation fault (11) [compute-0-3:22515] Signal code: Address not mapped (1) [compute-0-3:22515] Failing at address: 0x3f [compute-0-3:22515] [ 0] /lib64/libpthread.so.0 [0x3639e0e7c0] [compute-0-3:22515] [ 1] /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(discui_+0x84) [0x2b7b546dd3d0] [compute-0-3:22515] [ 2] /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(diswsi+0xc3) [0x2b7b546da9e3] [compute-0-3:22515] [ 3] /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so [0x2b7b546d868c] [compute-0-3:22515] [ 4] /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(tm_init+0x1fe) [0x2b7b546d8978] [compute-0-3:22515] [ 5] /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so [0x2b7b546d791c] [compute-0-3:22515] [ 6] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x404c27] [compute-0-3:22515] [ 7] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403e38] [compute-0-3:22515] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x363961d994] [compute-0-3:22515] [ 9] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403d69] [compute-0-3:22515] *** End of error message *** /var/spool/PBS/mom_priv/jobs/9909.bob.csi.cuny.edu.SC: line 42: 22515 Segmentation fault /share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl tcp,self -np 16 -machinefile $PBS_NODEFILE ./hello_mpi.exe When compiling with the PGI compiler suite I get the same result although the traceback gives less detail. I notice postings that suggest the if I disable the memory-manager I might be able to get around this problem, but that will result in a performance hit on this IB system. Have others seen this? Suggestions? Thanks, Richard Walsh CUNY HPC Center Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Mighty the Wizard Who found me at sunrise Sleeping, and woke me And learn'd me Magic! Think green before you print this email.
[OMPI users] Can NWChem be run with OpenMPI over an InfiniBand interconnect ... ??
All, I have built NWChem successfully, and trying to run it with an Intel built version of OpenMPI 1.4.1. If I force to run over over 1 GigE maintenance interconnect it works, but when I try it over the default InfiniBand communications network it fails with: -- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: gpute-2 (PID 15996) MPI_COMM_WORLD rank: 0 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -- This looks to be a known problem. Is there I go around? I have seen it suggested in some places that I need to use Mellanox's version of MPI, which is not an option and surprises me as they are a big OFED contributor. What are my options ... other than using GigE ... ?? Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Mighty the Wizard Who found me at sunrise Sleeping, and woke me And learn'd me Magic! Think green before you print this email.
[OMPI users] Problem compiling 1.4.0 snap with PGI 10.0-1 and openib flags turned on ...
All, Succeeded in overcoming the 'libtool' failure with PGI using the patched snap (thanks Jeff), but now I am running into a down stream problem compiling for our IB clusters. I am using the latest PGI compiler (10.0-1) and the 12-14-09 snap of OpenMPI of version 1.4.0. My configure line looks like this: $ ./configure CC=pgcc CXX=pgCC F77=pgf77 FC=pgf90 --enable-openib-ibcm --with-openib \ --prefix=/share/apps/openmpi-pgi/1.4.0 --with-tm=/share/apps/pbs/10.1.0.91350 The error I get during the make at about line 8078 is: libtool: compile: pgcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../orte/include -I../../../../ompi/include -I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. -D_REENTRANT -O -DNDEBUG -c connect/btl_openib_connect_xoob.c -fpic -DPIC -o connect/.libs/btl_openib_connect_xoob.o source='connect/btl_openib_connect_ibcm.c' object='connect/btl_openib_connect_ibcm.lo' libtool=yes \ DEPDIR=.deps depmode=none /bin/sh ../../../../config/depcomp \ /bin/sh ../../../../libtool --tag=CC --mode=compile pgcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../orte/include -I../../../../ompi/include -I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. -D_REENTRANT -O -DNDEBUG -c -o connect/btl_openib_connect_ibcm.lo connect/btl_openib_connect_ibcm.c libtool: compile: pgcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../orte/include -I../../../../ompi/include -I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. -D_REENTRANT -O -DNDEBUG -c connect/btl_openib_connect_ibcm.c -fpic -DPIC -o connect/.libs/btl_openib_connect_ibcm.o PGC-S-0040-Illegal use of symbol, __le64 (/usr/include/linux/byteorder/little_endian.h: 43) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/linux/byteorder/little_endian.h: 43) PGC-S-0039-Use of undeclared variable __le64 (/usr/include/linux/byteorder/little_endian.h: 45) PGC-S-0104-Non-numeric operand for multiplicative operator (/usr/include/linux/byteorder/little_endian.h: 45) PGC-S-0040-Illegal use of symbol, __le64 (/usr/include/linux/byteorder/little_endian.h: 47) PGC-S-0040-Illegal use of symbol, __be64 (/usr/include/linux/byteorder/little_endian.h: 67) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/linux/byteorder/little_endian.h: 67) PGC-S-0040-Illegal use of symbol, __be64 (/usr/include/linux/byteorder/little_endian.h: 69) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/linux/byteorder/little_endian.h: 69)PGC-S-0040-Illegal use of symbol, __be64 (/usr/include/linux/byteorder/little_endian.h: 71) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/linux/byteorder/little_endian.h: 71)PGC/x86-64 Linux 10.0-1: compilation completed with severe errors make[2]: *** [connect/btl_openib_connect_ibcm.lo] Error 1 make[2]: Leaving directory `/export/apps/openmpi-pgi/1.4.0/build/ompi/mca/btl/openib' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/export/apps/openmpi-pgi/1.4.0/build/ompi' make: *** [all-recursive] Error 1 Compilation with the latest Intel compilers and these 'openib' options completes without issue. Are my configure options for 'openib' correct? Has anyone else see this? Thanks much, Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY Mighty the Wizard Who found me at sunrise Sleeping, and woke me And learn'd me Magic! Think green before you print this email.