Re: [OMPI users] Performance degradation of OpenMPI 1.10.2 when oversubscribed?
On 3/24/2017 6:10 PM, Reuti wrote: Hi, Am 24.03.2017 um 20:39 schrieb Jeff Squyres (jsquyres): Limiting MPI processes to hyperthreads *helps*, but current generation Intel hyperthreads are not as powerful as cores (they have roughly half the resources of a core), so -- depending on your application and your exact system setup -- you will almost certainly see performance degradation of running N MPI processes across N cores vs. across N hyper threads. You can try it yourself by running the same size application over N cores on a single machine, and then run the same application over N hyper threads (i.e., N/2 cores) on the same machine. […] - Disabling HT in the BIOS means that the one hardware thread left in each core will get all the cores resources (buffers, queues, processor units, etc.). - Enabling HT in the BIOS means that each of the 2 hardware threads will statically be allocated roughly half the core's resources (buffers, queues, processor units, etc.). Do you have a reference for the two topics above (sure, I will try next week on my own)? My knowledge was, that there is no dedicated HT core, and using all cores will not give the result that the real cores get N x 100%, plus the HT ones N x 50% (or alike). But the scheduler inside the CPU will balance the resources between the double face of a single core and both are equal. […] Spoiler alert: many people have looked at this. In *most* (but not all) cases, using HT is not a performance win for MPI/HPC codes that are designed to run processors at 100%. I think it was also on this mailing list, that someone mentioned that the pipelines in the CPU are reorganized in case you switch HT off, as only half of them would be needed and these resources are then bound to the real cores too, extending their performance. Similar, but not exactly what Jeff mentiones above. Another aspect is, that even if they are not really doubling the performance, one might get 150%. And if you pay per CPU hours, it can be worth to have it switched on. My personal experience is, that it depends not only application, but also on the way how you oversubscribe. Using all cores for a single MPI application leads to the effect, that all processes are doing the same stuff at the same time (at least often) and fight for the same part of the CPU, essentially becoming a bottleneck. But using each half of a CPU for two (or even more) applications will allow a better interleaving in the demand for resources. To allow this in the best way: no taskset or binding to cores, let the Linux kernel and CPU do their best - YMMV. -- Reuti ___ HT implementations vary in some of the details to which you refer. The most severe limitation in disabling HT on Intel CPUs of the last 5 years has been that half of the hardware ITLB entries remain inaccessible. This was supposed not to be a serious limitation for many HPC applications. Applications where each thread needs all of L1 or fill (cache lines pending update) buffers aren't so suitable for HT. Intel compilers have some ability at -O3 to adjust automatic loop fission and fusion for applications with high fill buffer demand, requiring that there be just 1 thread using those buffers. HT threading actually reduces in practice the rate at which FPU instructions may be issued on Intel "big core" CPUs. HT together with MPI usually requires effective HT-aware pinning. It seems unusual for MPI ranks to share cores effectively simply under control of kernel scheduling (although linux is more capable than Windows). Agree that explicit use of taskset under MPI should have been superseded by the options implemented by several MPI including openmpi. -- Tim Prince ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Rounding errors and MPI
You might try inserting parentheses so as to specify your preferred order of evaluation. If using ifort, you would need -assume protect-parens . Sent via the ASUS PadFone X mini, an AT 4G LTE smartphone Original Message From:Oscar MojicaSent:Mon, 16 Jan 2017 08:28:05 -0500 To:Open MPI User's List Subject:[OMPI users] Rounding errors and MPI >___ >users mailing list >users@lists.open-mpi.org >https://rfd.newmexicoconsortium.org/mailman/listinfo/users___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] openmpi-2.0.1
On 11/17/2016 8:45 AM, Professor W P Jones wrote: > Hi > > I am trying to install openmpi-2.0.1 togeter with the version 14.0.2 > intel compilers and I an having problems. The configure script with > CC=icc CXX=icpc and FC=ifort runs successfully but when i issue make > all install this fails with the output: > > > Making all in tools/ompi_info > make[2]: Entering directory > `/usr/local/src/openmpi-2.0.1/ompi/tools/ompi_info' > CCLD ompi_info > ld: warning: libimf.so, needed by ../../../ompi/.libs/libmpi.so, not > found (try using -rpath or -rpath-link) > ld: warning: libsvml.so, needed by ../../../ompi/.libs/libmpi.so, not > found (try using -rpath or -rpath-link) > ld: warning: libirng.so, needed by ../../../ompi/.libs/libmpi.so, not > found (try using -rpath or -rpath-link) > ld: warning: libintlc.so.5, needed by ../../../ompi/.libs/libmpi.so, > not found (try using -rpath or -rpath-link) > ld: .libs/ompi_info: hidden symbol `__intel_cpu_features_init_x' in > /opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64//libirc.a(cpu_feature_disp.o) > is referenced by DSO > ld: final link failed: Bad value > make[2]: *** [ompi_info] Error 1 > make[2]: Leaving directory > `/usr/local/src/openmpi-2.0.1/ompi/tools/ompi_info' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory `/usr/local/src/openmpi-2.0.1/ompi' > make: *** [all-recursive] Error 1 > Do you have the Intel compilervars.[c]sh sourced (and associated library files visible) on each node where you expect to install? -- Tim Prince ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Problems in compiling a code with dynamic linking
On 3/24/2016 12:01 AM, Gilles Gouaillardet wrote: > Elio, > > usually, /opt is a local filesystem, so it is possible /opt/intel is > only available on your login nodes. > > your best option is to ask your sysadmin where the mkl libs are on the > compute nodes, and/or how to use mkl in your jobs. > > feel free to submit a dumb pbs script > ls -l /opt > ls -l /opt/intel > ls -l /opt/intel/mkl > so you can hopefully find that by yourself. > > an other option is to use the static mkl libs if they are available > for example, your LIB line could be > > LIB = -static -L/opt/intel/composer_xe_2013_sp1/mkl/lib/intel64 > -lmkl_blas95_lp64 -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_core > -lmkl_sequential -dynamic > No, refer to the on-line advisor at https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor -- Tim Prince
Re: [OMPI users] How to run OpenMPI C code under Windows 7
On 11/22/2015 5:04 PM, Philip Bitar wrote: > *How to run OpenMPI C code under Windows 7* > > I'm trying to get OpenMPI C code to run under Windows 7 any way that I > can. Evidently there is no current support for running OpenMPI > directly under Windows 7, so I installed Cygwin. Is there a better way > to run OpenMPI C code under Windows 7? > > Under Cygwin, I installed a GCC C compiler, which works. > > I also installed an OpenMPI package. Here is a link to a list of the > files in the Cygwin OpenMPI package: > > https://cygwin.com/cgi-bin2/package-cat.cgi?file=x86%2Flibopenmpi%2Flibopenmpi-1.8.6-1=openmpi > > My PATH variable is as follows: > > /usr/local/bin:/usr/bin > > mpicc will compile, but it won't link. It can't find the following: > > -lmpi > -lopen-rte > -lopen-pal > > The test program includes stdio.h and is nothing more than printf > hello world. I can compile and run it using the GCC C compiler. > > Presumably I need to update the PATH variable so that the link step > will find the missing components. Are those components file names or > info contained in some other files? Can I verify that the needed files > have been installed? > > I would also be pleased to obtain a link to material that explains the > OpenMPI system, in general, and the OpenMPI C functions, in > particular, so that I can write C programs to use the OpenMPI system. > > I looked for this kind of info on the web, but I haven't found it yet. > Maybe it's on the OpenMPI site, and I missed it. > > You probably want the libopenmpi-devel package from cygwin setup.exe as well. If you have windows 7 X64, the x86_64 cygwin is probably preferable to 32-bit (can't see which you started with). An alternative, with a build of mingw x86-64, is Walt Brainerd's CAF build. If this wasn't discussed in the OpenMPI archives, but has not been withdrawn, you might ask the author, e.g. https://groups.google.com/forum/#!searchin/comp.lang.fortran/coarray$20fortran/comp.lang.fortran/P5si9Fj1yIY/ptjM8DMUUzUJ It's a little difficult to use if you have another MPI installed, as Windows MPI (like the MPI which comes with linux distros) don't observe normal methods for keeping distinct paths. I doubt there is a separate version of OpenMPI docs specific to Windows. -- Tim Prince
Re: [OMPI users] Binding to hardware thread
On 9/27/2015 6:02 PM, Saliya Ekanayake wrote: > > I couldn't find any option in OpenMPI to bind a process to a hardware > thread. I am assuming this is not yet supported through binding > options. Could specifying a rank file be used as a workaround for this? > > Why not start with the FAQ? https://www.open-mpi.org/faq/?category=openfabrics Don't go by what the advertisements of other MPI implementations said based on past defaults. -- Tim Prince
Re: [OMPI users] Anyone successfully running Abaqus with OpenMPI?
On 6/22/2015 6:06 PM, Belgin, Mehmet wrote: > > > Abaqus documentation suggests that it may be possible to run it using > an external MPI stack, and I am hoping to make it work with our stock > openmpi/1.8.4 that knows how to talk with the scheduler's hwloc. > Unfortunately, however, all of my attempts failed miserably so far (no > specific instructions for openmpi). > > I was wondering if anyone had success with getting Abaqus running with > openmpi. Even the information of whether it is possible or not will > help us a great deal. > > Data types encodings are incompatible between openmpi and mpich derivatives, and, I think, with the HP or Platform-MPI normally used by past Abaqus releases. You should be looking at Abaqus release notes for your version. Comparing include files between the various MPI families should give you a clue about type encoding compatibility. Lack of instructions for openmpi probably means something. -- Tim Prince
Re: [OMPI users] mpirun
I don't recall Walt 's cases taking all of 5 seconds to start. More annoying is the hang after completion. Sent via the ASUS PadFone X mini, an AT 4G LTE smartphone Original Message From:Ralph CastainSent:Fri, 29 May 2015 15:35:15 -0400 To:Open MPI Users Subject:Re: [OMPI users] mpirun >I assume you mean on cygwin? Or is this an older version that supported native >Windows? > >> On May 29, 2015, at 12:34 PM, Walt Brainerd wrote: >> >> On Windows, mpirun appears to take about 5 seconds >> to start. I can't try it on Linux. Intel takes no time to >> start executing its version. >> >> Is this expected? >> >> -- >> Walt Brainerd >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/05/26988.php > >___ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2015/05/26989.php
Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently
Check by ldd in case you didn't update .so path Sent via the ASUS PadFone X mini, an AT 4G LTE smartphone Original Message From:John BraySent:Mon, 17 Nov 2014 11:41:32 -0500 To:us...@open-mpi.org Subject:[OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently >___ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2014/11/25823.php
Re: [OMPI users] Multiple threads for an mpi process
On 9/12/2014 9:22 AM, JR Cary wrote: On 9/12/14, 7:27 AM, Tim Prince wrote: On 9/12/2014 6:14 AM, JR Cary wrote: This must be a very old topic. I would like to run mpi with one process per node, e.g., using -cpus-per-rank=1. Then I want to use openmp inside of that. But other times I will run with a rank on each physical core. Inside my code I would like to detect which situation I am in. Is there an openmpi api call to determine that? omp_get_num_threads() should work. Unless you want to choose a different non-parallel algorithm for this case, a single thread omp parallel region works fine. You should soon encounter cases where you want intermediate choices, such as 1 rank per CPU package and 1 thread per core, even if you stay away from platforms with more than 12 cores per CPU. I may not understand, so I will try to ask in more detail. Suppose I am running on a four-core processor (and my code likes one thread per core). In case 1 I do mpiexec -np 2 myexec and I want to know that each mpi process should use 2 threads. If instead I did mpiexec -np 4 myexec I want to know that each mpi process should use one thread. Will omp_get_num_threads() should return a different value for those two cases? Perhaps I am not invoking mpiexec correctly. I use MPI_Init_thread(, , MPI_THREAD_FUNNELED, ), and regardless of what how I invoke mpiexec (-n 1, -n 2, -n 4), I see 2 openmp processes and 1 openmp threads (have not called omp_set_num_threads). When I run serial, I see 8 openmp processes and 1 openmp threads. So I must be missing an arg to mpiexec? This is a 4-core haswell with hyperthreading to get 8. Sorry, I assumed you were setting OMP_NUM_THREADS for your runs. If you don't do that, each instance of OpenMP will try to run 8 threads, where you probably want just 1 thread per core. I turn off hyperthreading in BIOS on my machines, as I never run anything which would benefit from it.
Re: [OMPI users] Multiple threads for an mpi process
On 9/12/2014 6:14 AM, JR Cary wrote: This must be a very old topic. I would like to run mpi with one process per node, e.g., using -cpus-per-rank=1. Then I want to use openmp inside of that. But other times I will run with a rank on each physical core. Inside my code I would like to detect which situation I am in. Is there an openmpi api call to determine that? omp_get_num_threads() should work. Unless you want to choose a different non-parallel algorithm for this case, a single thread omp parallel region works fine. You should soon encounter cases where you want intermediate choices, such as 1 rank per CPU package and 1 thread per core, even if you stay away from platforms with more than 12 cores per CPU.
Re: [OMPI users] openMP and mpi problem
On 7/4/2014 11:22 AM, Timur Ismagilov wrote: 1. Intell mpi is located here: /opt/intel/impi/4.1.0/intel64/lib. I have added OMPI path at the start and got the same output. If you can't read your own thread due to your scrambling order of posts, I'll simply reiterate what was mentioned before: ifort has its own mpiexec in the compiler install path to support co-array (not true MPI), so your MPI path entries must precede the ifort ones. Thus, it remains important to try checks such as 'which mpiexec' and assure that you are running the intended components. ifort co-arrays will not cooperate with presence of OpenMPI. -- Tim Prince
Re: [OMPI users] openmpi linking problem
On 6/9/2014 1:14 PM, Sergii Veremieiev wrote: Dear Sir/Madam, I'm trying to link a C/FORTRAN code on Cygwin with Open MPI 1.7.5 and GCC 4.8.2: mpicxx ./lib/Multigrid.o ./lib/GridFE.o ./lib/Data.o ./lib/GridFD.o ./lib/Parameters.o ./lib/MtInt.o ./lib/MtPol.o ./lib/MtDob.o -o Test_cygwin_openmpi_gcc -L./external/MUMPS/lib -ldmumps_cygwin_openmpi_gcc -lmumps_common_cygwin_openmpi_gcc -lpord_cygwin_openmpi_gcc -L./external/ParMETIS -lparmetis_cygwin_openmpi_gcc -lmetis_cygwin_openmpi_gcc -L./external/SCALAPACK -lscalapack_cygwin_openmpi_gcc -L./external/BLACS/LIB -lblacs-0_cygwin_openmpi_gcc -lblacsF77init-0_cygwin_openmpi_gcc -lblacsCinit-0_cygwin_openmpi_gcc -lblacs-0_cygwin_openmpi_gcc -L./external/BLAS -lblas_cygwin_openmpi_gcc -lmpi -lgfortran The following error messages are returned: ./external/MUMPS/lib/libdmumps_cygwin_openmpi_gcc.a(dmumps_part3.o): In function `dmumps_127_': /cygdrive/d/Sergey/Research/Codes/Thinfilmsolver/external/MUMPS/src/dmumps_part3.F:6068: undefined reference to `mpi_send_' You appear to need the MPI Fortran libraries (built with your version of gfortran) corresponding to mpif.h or use mpi... If you can use mpifort to link, you would use -lstdc++ in place of -lmpi -lgfortran . -- Tim Prince
Re: [OMPI users] intel compiler and openmpi 1.8.1
On 05/29/2014 07:11 AM, Lorenzo Donà wrote: I compiled openmpi 1.8.1 with intel compiler with this conf. ./configure FC=ifort CC=icc CXX=icpc --prefix=/Users/lorenzodona/Documents/openmpi-1.8.1/ but when i write mpif90 -v i found: Using built-in specs. COLLECT_GCC=/opt/local/bin/gfortran-mp-4.8 COLLECT_LTO_WRAPPER=/opt/local/libexec/gcc/x86_64-apple-darwin13/4.8.2/lto-wrapper Target: x86_64-apple-darwin13 Configured with: /opt/local/var/macports/build/_opt_mports_dports_lang_gcc48/gcc48/work/gcc-4.8.2/configure --prefix=/opt/local --build=x86_64-apple-darwin13 --enable-languages=c,c++,objc,obj-c++,lto,fortran,java --libdir=/opt/local/lib/gcc48 --includedir=/opt/local/include/gcc48 --infodir=/opt/local/share/info --mandir=/opt/local/share/man --datarootdir=/opt/local/share/gcc-4.8 --with-local-prefix=/opt/local --with-system-zlib --disable-nls --program-suffix=-mp-4.8 --with-gxx-include-dir=/opt/local/include/gcc48/c++/ --with-gmp=/opt/local --with-mpfr=/opt/local --with-mpc=/opt/local --with-cloog=/opt/local --enable-cloog-backend=isl --disable-cloog-version-check --enable-stage1-checking --disable-multilib --enable-lto --enable-libstdcxx-time --with-as=/opt/local/bin/as --with-ld=/opt/local/bin/ld --with-ar=/opt/local/bin/ar --with-bugurl=https://trac.macports.org/newticket --with-pkgversion='MacPorts gcc48 4.8.2_0' Thread model: posix gcc version 4.8.2 (MacPorts gcc48 4.8.2_0) and version i found: GNU Fortran (MacPorts gcc48 4.8.2_0) 4.8.2 Copyright (C) 2013 Free Software Foundation, Inc. GNU Fortran comes with NO WARRANTY, to the extent permitted by law. You may redistribute copies of GNU Fortran under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING So I think that is not compiled with intel compiler please can you help me. thanks thanks a lot for your patience and to help me Perhaps you forgot to make the Intel compilers active in your configure session. Normally this would be done by command such as source /opt/intel/composer_xe_2013/bin/compilervars.sh intel64 In such a case, if you would examine the configure log, you would expect to see a failed attempt to reach ifort, falling back to your gfortran. On the C and C++ side, the MPI libraries should be compatible between gnu and Intel compilers, but the MPI Fortran library would not be compatible between gfortran and ifort.
Re: [OMPI users] openMPI in 64 bit
On 5/15/2014 3:13 PM, Ajay Nair wrote: I have been using openMPI for my application with intel visual fortran. The version that I am currently using is openMPI-1.6.2. It works fine iwth fortran code compiled in 32bit and run it with openMPI 32 bit files. However recently I moved to a 64 bit machine and even though I could compile the code successfully with intel fortran 64 bit and also pointing the openMPI to the corresponding 64 bit files, the exe would not start and threw the error: *the application was unable to start correctly (0x7b)* * * This is because the msvcr100d.dll file (this is required by openMPI even when I run in 32bit mode) is a 32 bit dll file and it probably requires 64 bit equivalent. I could not find any 64 bit equivalent for this dll. My question is why is openMPI looking for this dll file (even in case of 32bit compilation). Can i do away with this dependency or is there any way I can run it in 64 bit? 64-bit Windows of course includes full 32-bit support, so you might still run your 32-bit MPI application. You would need a full 64-bit build of the MPI libraries for compatibility with your 64-bit application. I haven't seen any indication that anyone is supporting openmpi for ifort Windows 64-bit. The closest openmpi thing seems to be the cygwin (gcc/gfortran) build. Windows seems to be too crowded for so many MPI versions to succeed. -- Tim Prince
Re: [OMPI users] busy waiting and oversubscriptions
On 3/26/2014 6:45 AM, Andreas Schäfer wrote: On 10:27 Wed 26 Mar , Jeff Squyres (jsquyres) wrote: Be aware of a few facts, though: 1. There is a fundamental difference between disabling hyperthreading in the BIOS at power-on time and simply running one MPI process per core. Disabling HT at power-on allocates more hardware resources to the remaining HT that is left is each core (e.g., deeper queues). Oh, I didn't know that. That's interesting! Do you have any links with in-depth info on that? On certain Intel CPUs, the full size instruction TLB was available to a process when HyperThreading was disabled on the BIOS setup menu, and that was the only way to make all the Write Combine buffers available to a single process. Those CPUs are no longer in widespread use. At one time, at Intel, we did a study to evaluate the net effect (on a later CPU where this did not recover ITLB size). The result was buried afterwards; possibly it didn't meet an unspecified marketing goal. Typical applications ran 1% faster with HyperThreading disabled by BIOS menu even with affinities carefully set to use just one process per core. Not all applications showed a loss on all data sets when leaving HT enabled. There are a few MPI applications with specialized threading which could gain 10% or more by use of HT. In my personal opinion, SMT becomes less interesting as the number of independent cores increases. Intel(r) Xeon Phi(tm) is an exception, as the vector processing unit issues instructions from a single thread only on alternate cycles. This capability is used more effectively by running OpenMP threads under MPI, e.g. 6 ranks per coprocessor of 30 threads each, spread across 10 cores per rank (exact optimum depending on the application; MKL libraries use all available hardware threads for sufficiently large data sets). -- Tim Prince
Re: [OMPI users] linking with openmpi version 1.6.1
On 2/24/2014 4:45 PM, Jeff Squyres (jsquyres) wrote: This is not an issue with Open MPI; it's an issue with how the Fortran compiler works on your Linux system. It's choosing to put suffix it Fortran symbols with "_" (and possibly in some cases [with long past compilers], "__") , whereas the C compiler is not. FWIW, this is a fairly common Fortran Linux compiler convention. Or you can use the new Fortran'08 C interop stuff (BIND(C)), in which you can specify the C symbol name in the Fortran code. Be aware that while this is supported in some Fortran compilers, it is not yet necessarily supported in the version of gfortran that you may be using. iso_c_binding was introduced in Fortran 03, and supported in gfortran at least since version 4.4, which is about as old a version as you have any business trying (no older ones have adequate documentation remaining on line). Also, FWIW, OMPI 1.6.1 is ancient. Can you upgrade to the latest 1.6.x version of Open MPI: 1.6.5? -- Tim Prince
Re: [OMPI users] Use of __float128 with openmpi
On 02/01/2014 12:42 PM, Patrick Boehl wrote: Hi all, I have a question on datatypes in openmpi: Is there an (easy?) way to use __float128 variables with openmpi? Specifically, functions like MPI_Allreduce seem to give weird results with __float128. Essentially all I found was http://beige.ucs.indiana.edu/I590/node100.html where they state MPI_LONG_DOUBLE This is a quadruple precision, 128-bit long floating point number. But as far as I have seen, MPI_LONG_DOUBLE is only used for long doubles. The Open MPI Version is 1.6.3 and gcc is 4.7.3 on a x86_64 machine. It seems unlikely that 10 year old course notes on an unspecified MPI implementation (hinted to be IBM power3) would deal with specific details of openmpi on a different architecture. Where openmpi refers to "portable C types" I would take long double to be the 80-bit hardware format you would have in a standard build of gcc for x86_64. You should be able to gain some insight by examining your openmpi build logs to see if it builds for both __float80 and __float128 (or neither). gfortran has a 128-bit data type (software floating point real(16), corresponding to __float128); you should be able to see in the build logs whether that data type was used.
Re: [OMPI users] Use of __float128 with openmpi
On 02/01/2014 12:42 PM, Patrick Boehl wrote: Hi all, I have a question on datatypes in openmpi: Is there an (easy?) way to use __float128 variables with openmpi? Specifically, functions like MPI_Allreduce seem to give weird results with __float128. Essentially all I found was http://beige.ucs.indiana.edu/I590/node100.html where they state MPI_LONG_DOUBLE This is a quadruple precision, 128-bit long floating point number. But as far as I have seen, MPI_LONG_DOUBLE is only used for long doubles. The Open MPI Version is 1.6.3 and gcc is 4.7.3 on a x86_64 machine. It seems unlikely that 10 year old course notes on an unspecified MPI implementation (hinted to be IBM power3) would deal with specific details of openmpi on a different architecture. Where openmpi refers to "portable C types" I would take long double to be the 80-bit hardware format you would have in a standard build of gcc for x86_64. You should be able to gain some insight by examining your openmpi build logs to see if it builds for both __float80 and __float128 (or neither). gfortran has a 128-bit data type (software floating point real(16), corresponding to __float128); you should be able to see in the build logs whether that data type was used.
Re: [OMPI users] Running on two nodes slower than running on one node
On 1/29/2014 11:30 PM, Ralph Castain wrote: On Jan 29, 2014, at 7:56 PM, Victor <victor.ma...@gmail.com <mailto:victor.ma...@gmail.com>> wrote: Thanks for the insights Tim. I was aware that the CPUs will choke beyond a certain point. From memory on my machine this happens with 5 concurrent MPI jobs with that benchmark that I am using. My primary question was about scaling between the nodes. I was not getting close to double the performance when running MPI jobs acros two 4 core nodes. It may be better now since I have Open-MX in place, but I have not repeated the benchmarks yet since I need to get one simulation job done asap. Some of that may be due to expected loss of performance when you switch from shared memory to inter-node transports. While it is true about saturation of the memory path, what you reported could be more consistent with that transition - i.e., it isn't unusual to see applications perform better when run on a single node, depending upon how they are written, up to a certain size of problem (which your code may not be hitting). Regarding your mention of setting affinities and MPI ranks do you have a specific (as in syntactically specific since I am a novice and easily confused...) examples how I may want to set affinities to get the Westmere node performing better? mpirun --bind-to-core -cpus-per-rank 2 ... will bind each MPI rank to 2 cores. Note that this will definitely *not* be a good idea if you are running more than two threads in your process - if you are, then set --cpus-per-rank to the number of threads, keeping in mind that you want things to break evenly across the sockets. In other words, if you have two 6 core/socket Westmere's on the node, then you either want to run 6 process at cpus-per-rank=2 if each process runs 2 threads, or 4 processes with cpus-per-rank=3 if each process runs 3 threads, or 2 processes with no cpus-per-rank but --bind-to-socket instead of --bind-to-core for any other thread number > 3. You would not want to run any other number of processes on the node or else the binding pattern will cause a single process to split its threads across the sockets - which will definitely hurt performance. -cpus-per-rank 2 is an effective choice for this platform. As Ralph said, it should work automatically for 2 threads per rank. Ralph's point about not splitting a process across sockets is an important one. Even splitting a process across internal busses, which would happen with 3 threads per process, seems problematical. -- Tim Prince
Re: [OMPI users] Running on two nodes slower than running on one node
On 1/29/2014 10:56 PM, Victor wrote: Thanks for the insights Tim. I was aware that the CPUs will choke beyond a certain point. From memory on my machine this happens with 5 concurrent MPI jobs with that benchmark that I am using. Regarding your mention of setting affinities and MPI ranks do you have a specific (as in syntactically specific since I am a novice and easily ...) examples how I may want to set affinities to get the Westmere node performing better? ompi_info returns this: MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5) I haven't worked with current OpenMPI on Intel Westmere, although I do have a Westmere as my only dual CPU platform. Ideally, the current scheme OpenMPI uses for MPI/OpenMP hybrid affinity will make it easy to allocate adjacent pairs of cores to ranks: [0,1], [2,3],[4,5], hwloc will not be able to see whether cores [0,1] and [2,3] are actually the pairs sharing internal cache buss, and Intel never guaranteed it, but that is the only way I've seen it done (presumably controlled by BIOS). If you had a requirement to run 1 rank per CPU, with 4 threads per CPU, you would pin a thread to the each of the core pairs [0,1] and [2,3] (and [6,7],[8,9]. If required to run 8 threads per CPU, using HyperThreading, you would pin 1 thread to each of the first 4 cores on each CPU and 2 threads each to the remaining cores (the ones which don't share cache paths). Likewise, when you are testing pure MPI scaling, you would take care not to place a 2nd rank on a core pair wich shares an internal buss until you are using all 4 internal buss resources, and you would load up the 2 CPUs symmetrically. You might find that 8 ranks with optimized placement gave nearly the performance of 12 ranks, and that you need an effective hybrid MPI/OpenMP to get perhaps 25% additional performance by using the remaining cores. I've never seen an automated scheme to deal with this. If you ignored the placement requirements, you would find that 8 ranks on the 12 core platform didn't perform as well as on the similar 8 core platform. Needless to say, these special requirements of this CPU model have eluded even experts, and led to it not being used to full effectiveness. The reason we got into this is your remark that it seemed strange to you that you didn't gain performance when you added a rank, presumably a 2nd rank on a core pair sharing an internal buss. You seem to have the impression that MPI performance scaling could be linear with the number of cores in use. Such an expectation is unrealistic given that the point of multi-core platforms is to share memory and other resources and support more ranks without a linear increase in cost. In your efforts to make an effective cluster out of nodes of dissimilar performance levels, you may need to explore means of evening up the performance per rank, such as more OpenMP threads per rank on the lower performance CPUs. It really doesn't look like a beginner's project. -- Tim Prince
Re: [OMPI users] Running on two nodes slower than running on one node
On 1/29/2014 8:02 AM, Reuti wrote: Quoting Victor <victor.ma...@gmail.com>: Thanks for the reply Reuti, There are two machines: Node1 with 12 physical cores (dual 6 core Xeon) and Do you have this CPU? http://ark.intel.com/de/products/37109/Intel-Xeon-Processor-X5560-8M-Cache-2_80-GHz-6_40-GTs-Intel-QPI -- Reuti It's expected on the Xeon Westmere 6-core CPUs to see MPI performance saturating when all 4 of the internal buss paths are in use. For this reason, hybrid MPI/OpenMP with 2 cores per MPI rank, with affinity set so that each MPI rank has its own internal CPU buss, could out-perform plain MPI on those CPUs. That scheme of pairing cores on selected internal buss paths hasn't been repeated. Some influential customers learned to prefer the 4-core version of that CPU, given a reluctance to adopt MPI/OpenMP hybrid with affinity. If you want to talk about "downright strange," start thinking about the schemes to optimize performance of 8 threads with 2 threads assigned to each internal CPU buss on that CPU model. Or your scheme of trying to balance MPI performance between very different CPU models. Tim Node2 with 4 physical cores (i5-2400). Regarding scaling on the single 12 core node, not it is also not linear. In fact it is downright strange. I do not remember the numbers right now but 10 jobs are faster than 11 and 12 are the fastest with peak performance of approximately 66 Msu/s which is also far from triple the 4 core performance. This odd non-linear behaviour also happens at the lower job counts on that 12 core node. I understand the decrease in scaling with increase in core count on the single node as the memory bandwidth is an issue. On the 4 core machine the scaling is progressive, ie. every additional job brings an increase in performance. Single core delivers 8.1 Msu/s while 4 cores deliver 30.8 Msu/s. This is almost linear. Since my original email I have also installed Open-MX and recompiled OpenMPI to use it. This has resulted in approximately 10% better performance using the existing GbE hardware. On 29 January 2014 19:40, Reuti <re...@staff.uni-marburg.de> wrote: Am 29.01.2014 um 03:00 schrieb Victor: > I am running a CFD simulation benchmark cavity3d available within http://www.palabos.org/images/palabos_releases/palabos-v1.4r1.tgz > > It is a parallel friendly Lattice Botlzmann solver library. > > Palabos provides benchmark results for the cavity3d on several different platforms and variables here: http://wiki.palabos.org/plb_wiki:benchmark:cavity_n400 > > The problem that I have is that the benchmark performance on my cluster does not scale even close to a linear scale. > > My cluster configuration: > > Node1: Dual Xeon 5560 48 Gb RAM > Node2: i5-2400 24 Gb RAM > > Gigabit ethernet connection on eth0 > > OpenMPI 1.6.5 on Ubuntu 12.04.3 > > > Hostfile: > > Node1 -slots=4 -max-slots=4 > Node2 -slots=4 -max-slots=4 > > MPI command: mpirun --mca btl_tcp_if_include eth0 --hostfile /home/mpiuser/.mpi_hostfile -np 8 ./cavity3d 400 > > Problem: > > cavity3d 400 > > When I run mpirun -np 4 on Node1 I get 35.7615 Mega site updates per second > When I run mpirun -np 4 on Node2 I get 30.7972 Mega site updates per second > When I run mpirun --mca btl_tcp_if_include eth0 --hostfile /home/mpiuser/.mpi_hostfile -np 8 ./cavity3d 400 I get 47.3538 Mega site updates per second > > I understand that there are latencies with GbE and that there is MPI overhead, but this performance scaling still seems very poor. Are my expectations of scaling naive, or is there actually something wrong and fixable that will improve the scaling? Optimistically I would like each node to add to the cluster performance, not slow it down. > > Things get even worse if I run asymmetric number of mpi jobs in each node. For instance running -np 12 on Node1 Isn't this overloading the machine with only 8 real cores in total? > is significantly faster than running -np 16 across Node1 and Node2, thus adding Node2 actually slows down the performance. The i5-2400 has only 4 cores and no threads. It depends on the algorithm how much data has to be exchanged between the processes, and this can indeed be worse when used across a network. Also: is the algorithm scaling linear when used on node1 only with 8 cores? When it's "35.7615 " with 4 cores, what result do you get with 8 cores on this machine. -- Reuti ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users _______ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Tim Prince
Re: [OMPI users] compilation aborted for Handler.cpp (code 2)
On 1/28/2014 10:44 AM, Abdul Rahman Riza wrote: -Original Message- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Syed Ahsan Ali Sent: Sunday, September 22, 2013 9:41 PM To: Open MPI Users Subject: Re: [OMPI users] compilation aborted for Handler.cpp (code 2) Its ok Jeff. I am not sure about other C++ codes and STL with icpc because it never happened and I don't know anything about STL.(pardon my less knowledge). What do you suggest in this case? installation of different version of openmpi or intel compilers? or any other solution. On Fri, Sep 20, 2013 at 8:35 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: Sorry for the delay replying -- I actually replied on the original thread yesterday, but it got hung up in my outbox and I didn't notice that it didn't actually go out until a few moments ago. :-( I'm *guessing* that this is a problem with your local icpc installation. Can you compile / run other C++ codes that use the STL with icpc? On Sep 20, 2013, at 6:59 AM, Syed Ahsan Ali <ahsansha...@gmail.com> wrote: Output of make V=1 is attached. Again same error. If intel compiler is using C++ headers from gfortran then how can we avoid this. On Fri, Sep 20, 2013 at 11:07 AM, Bert Wesarg <bert.wes...@googlemail.com> wrote: Hi, On Fri, Sep 20, 2013 at 4:49 AM, Syed Ahsan Ali <ahsansha...@gmail.com> wrote: I am trying to compile openmpi-1.6.5 on fc16.x86_64 with icc and ifort but getting the subject error. config.out and make.out is attached. Following command was used for configure ./configure CC=icc CXX=icpc FC=ifort F77=ifort F90=ifort --prefix=/home/openmpi_gfortran -enable-mpi-f90 --enable-mpi-f77 |& tee config.out could you also run make with 'make V=1' and send the output. Anyway it looks like the intel compiler uses the C++ headers from GCC 4.6.3 and I don't know if this is supported. Bert icpc expects to pick up headers and libraries, including libstdc++, from a simultaneously active g++ installation (normally the g++ which is on PATH and LD_LIBRARY_PATH). g++ 4.7 or 4.8 (with not all the latest features supported by icpc) are probably better with the recent icpc 13.1 and 14.0, but I hope the OpenMP build doesn't depend on c++11. If you do use c++11, you need versions of icpc and g++ both supporting it via -std=c++11 (where g++ 4.6 may need c++0x). You could run into cluster configuration issues if you don't have consistent g++ as well as icpc run-times on LD_LIBRARY_PATH everywhere. You can't mix support for gfortran with support for ifort; for C and C++ you should be able to use gcc/g++ and icc/icpc interchangeably, so you could configure for gcc and g++ along with ifort and still use icc and icpc as you choose. -- Tim Prince
Re: [OMPI users] [EXTERNAL] MPI_THREAD_SINGLE vs. MPI_THREAD_FUNNELED
On 10/23/2013 01:02 PM, Barrett, Brian W wrote: On 10/22/13 10:23 AM, "Jai Dayal"> wrote: I, for the life of me, can't understand the difference between these two init_thread modes. MPI_THREAD_SINGLE states that "only one thread will execute", but MPI_THREAD_FUNNELED states "The process may be multi-threaded, but only the main thread will make MPI calls (all MPI calls are funneled to the main thread)." If I use MPI_THREAD_SINGLE, and just create a bunch of pthreads that dumbly loop in the background, the MPI library will have no way of detecting this, nor should this have any affects on the machine. This is exactly the same as MPI_THREAD_FUNNELED. What exactly does it mean with "only one thread will execute?" The openmpi library has absolutely zero way of knowng I've spawned other pthreads, and since these pthreads aren't actually doing MPI communication, I fail to see how this would interfere. Technically, if you call MPI_INIT_THREAD with MPI_THREAD_SINGLE, you have made a promise that you will not create any other threads in your application. There was a time where OSes shipped threaded and non-threaded malloc, for example, so knowing that might be important for that last bit of performance. There are also some obscure corner cases of the memory model of some architectures where you might get unexpected results if you made an MPI Receive call in an thread and accessed that buffer later from another thread, which may require memory barriers inside the implementation, so there could be some differences between SINGLE and FUNNELED due to those barriers. In Open MPI, we'll handle those corner cases whether you init for SINGLE or FUNNELED, so there's really no practical difference for Open MPI, but you're then slightly less portable. I'm asking because I'm using an open_mpi build ontop of infiniband, and the maximum thread mode is MPI_THREAD_SINGLE. That doesn't seem right; which version of Open MPI are you using? Brian As Brian said, you aren't likely to be running on a system like Windows 98 where non-thread-safe libraries were preferred. My colleagues at NASA insist that any properly built MPI will support MPI_THREAD_FUNNELED by default, even when the documentation says explicit setting in MPI_Init_thread() is mandatory. The statement which I see in OpenMPI doc says all MPI calls must be made by the thread which calls MPI_Init_thread. Apparently it will work if plain MPI_Init is used instead. This theory appears to hold up for all the MPI implementations of interest. The additional threads referred to are "inside the MPI rank," although I suppose additional application threads not involved with MPI are possible.
Re: [OMPI users] EXTERNAL: Re: basic questions about compiling OpenMPI
On 5/25/2013 8:26 AM, Jeff Squyres (jsquyres) wrote: On May 23, 2013, at 9:50 AM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com> wrote: Excellent. Now I've read the FAQ and noticed that it doesn't mention the issue with the Fortran 90 .mod signatures. Our applications are Fortran. So your replies are very helpful -- now I know it really isn't practical for us to use the default OpenMPI shipped with RHEL6 since we use both Intel and PGI compilers and have several applications to accommodate. Presumably if all the applications did INCLUDE 'mpif.h' instead of 'USE MPI' then we could get things working, but it's not a great workaround. No, not even if they use mpif.h. Here's a chunk of text from the v1.6 README: - While it is possible -- on some platforms -- to configure and build Open MPI with one Fortran compiler and then build MPI applications with a different Fortran compiler, this is not recommended. Subtle problems can arise at run time, even if the MPI application compiled and linked successfully. Specifically, the following two cases may not be portable between different Fortran compilers: 1. The C constants MPI_F_STATUS_IGNORE and MPI_F_STATUSES_IGNORE will only compare properly to Fortran applications that were created with Fortran compilers that that use the same name-mangling scheme as the Fortran compiler with which Open MPI was configured. 2. Fortran compilers may have different values for the logical .TRUE. constant. As such, any MPI function that uses the Fortran LOGICAL type may only get .TRUE. values back that correspond to the the .TRUE. value of the Fortran compiler with which Open MPI was configured. Note that some Fortran compilers allow forcing .TRUE. to be 1 and .FALSE. to be 0. For example, the Portland Group compilers provide the "-Munixlogical" option, and Intel compilers (version >= 8.) provide the "-fpscomp logicals" option. You can use the ompi_info command to see the Fortran compiler with which Open MPI was configured. Even when the name mangling obstacle doesn't arise (it shouldn't for the cited case of gfortran vs. ifort), run-time library function usage is likely to conflict between the compiler used to build the MPI Fortran library and the compiler used to build the application. So there really isn't a good incentive to retrogress away from the USE files simply to avoid one aspect of mixing incompatible compilers. -- Tim Prince
Re: [OMPI users] basic questions about compiling OpenMPI
On 5/22/2013 11:34 AM, Paul Kapinos wrote: On 05/22/13 17:08, Blosch, Edwin L wrote: Apologies for not exploring the FAQ first. No comments =) If I want to use Intel or PGI compilers but link against the OpenMPI that ships with RedHat Enterprise Linux 6 (compiled with g++ I presume), are there any issues to watch out for, during linking? At least, the Fortran-90 bindings ("use mpi") won't work at all (they're compiler-dependent. So, our way is to compile a version of Open MPI with each compiler. I think this is recommended. Note also that the version of Open MPI shipped with Linux is usuallu a bit dusty. The gfortran build of Fortran library, as well as the .mod USE files, won't work with ifort or PGI compilers. g++ built libraries ought to work with sufficiently recent versions of icpc. As noted above, it's worth while to rebuild yourself, even if you use a (preferably more up to date version of) gcc, which you can use along with one of the commercial Fortran compilers for linux. -- Tim Prince
Re: [OMPI users] Configuration with Intel C++ Composer 12.0.2 on OSX 10.7.5
On 05/16/2013 10:13 PM, Tim Prince wrote: On 5/16/2013 2:16 PM, Geraldine Hochman-Klarenberg wrote: Maybe I should add that my Intel C++ and Fortran compilers are different versions. C++ is 12.0.2 and Fortran is 13.0.2. Could that be an issue? Also, when I check for the location of ifort, it seems to be in usr/bin - which is different than the C compiler (even though I have folders /opt/intel/composer_xe_2013 and /opt/intel/composer_xe_2013.3.171 etc.). And I have tried /source /opt/intel/bin/ifortvars.sh intel64/ too. Geraldine On May 16, 2013, at 11:57 AM, Geraldine Hochman-Klarenberg wrote: I am having trouble configuring OpenMPI-1.6.4 with the Intel C/C++ composer (12.0.2). My OS is OSX 10.7.5. I am not a computer whizz so I hope I can explain what I did properly: 1) In bash, I did /source /opt/intel/bin/compilervars.sh intel64/ and then /echo PATH/ showed: //opt/intel/composerxe-2011.2.142/bin/intel64:/opt/intel/composerxe-2011.2.142/mpirt/bin/intel64:/opt/intel/composerxe-2011.2.142/bin:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:.:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin/ / / 2)/which icc /and /which icpc /showed: //opt/intel/composerxe-2011.2.142/bin/intel64/icc/ and //opt/intel/composerxe-2011.2.142/bin/intel64/icpc/ / / So that all seems okay to me. Still when I do /./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/opt/openmpi-1.6.4/ from the folder in which the extracted OpenMPI files sit, I get // /== Configuring Open MPI/ // / / /*** Startup tests/ /checking build system type... x86_64-apple-darwin11.4.2/ /checking host system type... x86_64-apple-darwin11.4.2/ /checking target system type... x86_64-apple-darwin11.4.2/ /checking for gcc... icc/ /checking whether the C compiler works... no/ /configure: error: in `/Users/geraldinehochman-klarenberg/Projects/openmpi-1.6.4':/ /configure: error: C compiler cannot create executables/ /See `config.log' for more details/ / / You do need to examine config.log and show it to us if you don't understand it. Attempting to use the older C compiler and libraries to link .o files made by the newer Fortran is likely to fail. If you wish to attempt this, assuming the Intel compilers are installed in default directories, I would suggest you source the environment setting for the older compiler, then the newer one, so that the newer libraries will be found first and the older ones used only when they aren't duplicated by the newer ones. You also need the 64-bit g++ active. It's probably unnecessary to use icpc at all when building OpenMPI. icpc is compatible with gcc/g++ built objects, -- Tim Prince
Re: [OMPI users] Configuration with Intel C++ Composer 12.0.2 on OSX 10.7.5
On 5/16/2013 2:16 PM, Geraldine Hochman-Klarenberg wrote: Maybe I should add that my Intel C++ and Fortran compilers are different versions. C++ is 12.0.2 and Fortran is 13.0.2. Could that be an issue? Also, when I check for the location of ifort, it seems to be in usr/bin - which is different than the C compiler (even though I have folders /opt/intel/composer_xe_2013 and /opt/intel/composer_xe_2013.3.171 etc.). And I have tried /source /opt/intel/bin/ifortvars.sh intel64/ too. Geraldine On May 16, 2013, at 11:57 AM, Geraldine Hochman-Klarenberg wrote: I am having trouble configuring OpenMPI-1.6.4 with the Intel C/C++ composer (12.0.2). My OS is OSX 10.7.5. I am not a computer whizz so I hope I can explain what I did properly: 1) In bash, I did /source /opt/intel/bin/compilervars.sh intel64/ and then /echo PATH/ showed: //opt/intel/composerxe-2011.2.142/bin/intel64:/opt/intel/composerxe-2011.2.142/mpirt/bin/intel64:/opt/intel/composerxe-2011.2.142/bin:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:.:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin/ / / 2)/which icc /and /which icpc /showed: //opt/intel/composerxe-2011.2.142/bin/intel64/icc/ and //opt/intel/composerxe-2011.2.142/bin/intel64/icpc/ / / So that all seems okay to me. Still when I do /./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/opt/openmpi-1.6.4/ from the folder in which the extracted OpenMPI files sit, I get // /== Configuring Open MPI/ // / / /*** Startup tests/ /checking build system type... x86_64-apple-darwin11.4.2/ /checking host system type... x86_64-apple-darwin11.4.2/ /checking target system type... x86_64-apple-darwin11.4.2/ /checking for gcc... icc/ /checking whether the C compiler works... no/ /configure: error: in `/Users/geraldinehochman-klarenberg/Projects/openmpi-1.6.4':/ /configure: error: C compiler cannot create executables/ /See `config.log' for more details/ / / You do need to examine config.log and show it to us if you don't understand it. Attempting to use the older C compiler and libraries to link .o files made by the newer Fortran is likely to fail. If you wish to attempt this, assuming the Intel compilers are installed in default directories, I would suggest you source the environment setting for the older compiler, then the newer one, so that the newer libraries will be found first and the older ones used only when they aren't duplicated by the newer ones. You also need the 64-bit g++ active. -- Tim Prince
Re: [OMPI users] memory per core/process
On 03/30/2013 06:36 AM, Duke Nguyen wrote: On 3/30/13 5:22 PM, Duke Nguyen wrote: On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran fine!!! So it looks like stack limit is the problem. Questions are: * how do I set this automatically (and permanently)? * should I set all other ulimits to be unlimited? In our environment, the only solution we found is to have mpirun run a script on each node which sets ulimit (as well as environment variables which are more convenient to set there than in the mpirun), before starting the executable. We had expert recommendations against this but no other working solution. It seems unlikely that you would want to remove any limits which work at default. Stack size unlimited in reality is not unlimited; it may be limited by a system limit or implementation. As we run up to 120 threads per rank and many applications have threadprivate data regions, ability to run without considering stack limit is the exception rather than the rule. -- Tim Prince
Re: [OMPI users] mpivars.sh - Intel Fortran 13.1 conflict with OpenMPI 1.6.3
On 01/24/2013 12:40 PM, Michael Kluskens wrote: This is for reference and suggestions as this took me several hours to track down and the previous discussion on "mpivars.sh" failed to cover this point (nothing in the FAQ): I successfully build and installed OpenMPI 1.6.3 using the following on Debian Linux: ./configure --prefix=/opt/openmpi/intel131 --disable-ipv6 --with-mpi-f90-size=medium --with-f90-max-array-dim=4 --disable-vt F77=/opt/intel/composer_xe_2013.1.117/bin/intel64/ifort FC=/opt/ intel/composer_xe_2013.1.117/bin/intel64/ifort CXXFLAGS=-m64 CFLAGS=-m64 CC=gcc CXX=g++ (disable-vt was required because of an error finding -lz which I gave up on). My .tcshrc file HAD the following: set path = (/opt/openmpi/intel131/bin $path) setenv LD_LIBRARY_PATH /opt/openmpi/intel131/lib:$LD_LIBRARY_PATH setenv MANPATH /opt/openmpi/intel131/share/man:$MANPATH alias mpirun "mpirun --prefix /opt/openmpi/intel131 " source /opt/intel/composer_xe_2013.1.117/bin/compilervars.csh intel64 For years I have used these procedures on Debian Linux and OS X with earlier versions of OpenMPI and Intel Fortran. However, at some point Intel Fortran started including "mpirt", including: /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpirun So even through I have the alias set for mpirun, I got the following error: mpirun -V .: 131: Can't open /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpivars.sh Part of the confusion is that OpenMPI source does include a reference to "mpivars" in "contrib/dist/linux/openmpi.spec" The solution only occurred as I was writing this up, source intel setup first: source /opt/intel/composer_xe_2013.1.117/bin/compilervars.csh intel64 set path = (/opt/openmpi/intel131/bin $path) setenv LD_LIBRARY_PATH /opt/openmpi/intel131/lib:$LD_LIBRARY_PATH setenv MANPATH /opt/openmpi/intel131/share/man:$MANPATH alias mpirun "mpirun --prefix /opt/openmpi/intel131 " Now I finally get: mpirun -V mpirun (Open MPI) 1.6.3 The mpi runtime should be in the redistributable for their MPI compiler not in the base compiler. The question is how much of /opt/intel/composer_xe_2013.1.117/mpirt can I eliminate safely and should I ( multi-user machine were each user has their own Intel license, so I don't wish to trouble shoot this in the future ) ? ifort mpirt is a run-time to support co-arrays, but not full MPI. This version of the compiler checks in its path setting scripts whether Intel MPI is already on LD_LIBRARY_PATH, and so there is a conditional setting of the internal mpivars. I assume the co-array feature would be incompatible with OpenMPI and you would want to find a way to avoid any reference to that library, possibly by avoiding sourcing that part of ifort's compilervars. If you want a response on this subject from the Intel support team, their HPC forum might be a place to bring it up: http://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology -- Tim Prince
Re: [OMPI users] Compiling 1.6.1 with cygwin 1.7 and gcc
On 9/24/2012 1:02 AM, Roy Hogan wrote: I’m trying to build version 1.6.1 on Cygwin (1.7), using the gcc 4.5.3 compilers. I need to use the Cygwin linux environment specifically so I’m not interested in the cmake option on the windows side. I’ve searched the archives, but don’t find much on the Cygwin build option over the last couple of years. I’ve attached the logs for my “configure” and “make all” steps. Our email filter will not allow me to send zipped files, so I’ve attached the two log files. I’d appreciate any advice. Perhaps you mean cygwin posix environment. Evidently, your Microsoft-specific macros required in windows.c aren't handled by configury under cygwin, at least not if you don't specify that you want them. As you hinted, cygwin supports a more linux-like environment, although many of those macros should be handled by #include "windows.h". Do you have a reason for withholding information such as which Windows version you want to support, and your configure commands? -- Tim Prince
Re: [OMPI users] 转发:lwkmpi
On 8/28/2012 5:11 AM, 清风 wrote: -- 原始邮 件 -- *发件人:* "295187383"<295187...@qq.com>; *发送时间:* 2012年8月28日(星期二) 下午4:13 *收件人:* "users"<us...@open-mpi.org>; *主题:* lwkmpi Hi everybody, I'm trying compile openmpi with intel compiler11.1.07 on ubuntu . I compiled openmpi many times and I could always find a problem. But the error that I'm getting now, gives me no clues where to even search for the problem. It seems I have succeed to configure.While I try "make all",it always show problems below: make[7]: 正在进入目录 `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool' /opt/intel/Compiler/11.1/072/bin/ia32/icpc -DHAVE_CONFIG_H -I. -I../../.. -DINSIDE_OPENMPI -I/home/lwk/桌面/mnt/Software/openmpi- 1.6.1/opal/mca/hwloc/hwloc132/hwloc /include -I/usr/include/infiniband -I/usr/include/infiniband -DOPARI_VT -O3 -DNDEBUG -finline-functions -pthread -MT opari-ompragma_c.o -MD -MP -MF .deps/opari-ompragma_c.Tpo -c -o opari-ompragma_c.o `test -f 'ompragma_c.cc' || echo './'`ompragma_c.cc /usr/include/c++/4.5/iomanip(64): error: expected an expression { return { __mask }; } ^ Looks like your icpc is too old to work with your g++. If you want to build with C++ support, you'll need better matching versions of icpc and g++. icpc support for g++4.7 is expected to release within the next month; icpc 12.1 should be fine with g++ 4.5 and 4.6. -- Tim Prince
Re: [OMPI users] mpi.h incorrect format error?
On 08/06/2012 07:35 AM, PattiMichelle wrote: mpicc -DFSEEKO64_OK -w -O3 -c -DLANDREAD_STUB -DDM_PARALLEL -DMAX_HISTORY=25 -c buf_for_proc.c You might need to examine the pre-processed source (mpicc -E buf_for_proc.c > buf_for_proc.i) to see what went wrong in pre-processing at the point where the compiler (gcc?) complains. I suppose you must have built mpicc yourself; you would need to assure that the mpicc on PATH is the one built with the C compiler on PATH. -- Tim Prince
Re: [OMPI users] compilation on windows 7 64-bit
On 07/27/2012 12:23 PM, Sayre, Alan N wrote: During compilation I get warning messages such as : c:\program files (x86)\openmpi_v1.6-x64\include\openmpi/ompi/mpi/cxx/op_inln.h(148): warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning) cmsolver.cpp Which indicates that the openmpi version "openmpi_v1.6-x64" is 64 bit. And I'm sure that I installed the 64 bit version. I am compiling on a 64 bit version of Windows 7. setting X64 compiler project options? -- Tim Prince
Re: [OMPI users] undefined reference to `netcdf_mp_nf90_open_'
On 6/26/2012 9:20 AM, Jeff Squyres wrote: Sorry, this looks like an application issue -- i.e., the linker error you're getting doesn't look like it's coming from Open MPI. Perhaps it's a missing application/middleware library. More specifically, you can take the mpif90 command that is being used to generate these errors and add "--showme" to the end of it, and you'll see what underlying compiler command is being executed under the covers. That might help you understand exactly what is going on. On Jun 26, 2012, at 7:13 AM, Syed Ahsan Ali wrote: Dear All I am getting following error while compilation of an application. Seems like something related to netcdf and mpif90. Although I have compiled netcdf with mpif90 option, dont why this error is happening. Any hint would be highly appreciated. /home/pmdtest/cosmo/source/cosmo_110525_4.18/obj/src_obs_proc_cdf.o: In function `src_obs_proc_cdf_mp_obs_cdf_read_org_': /home/pmdtest/cosmo/source/cosmo_110525_4.18/src/src_obs_proc_cdf.f90:(.text+0x17aa): undefined reference to `netcdf_mp_nf90_open_' If your mpif90 is properly built and set up with the same Fortran compiler you are using, it appears that either you didn't build the netcdf Fortran 90 modules with that compiler, or you didn't set the include path for the netcdf modules. This would work the same with mpif90 as with the underlying Fortran compiler. -- Tim Prince
Re: [OMPI users] Cannot compile code with gfortran + OpenMPI when OpenMPI was built with latest intl compilers
On 5/19/2012 2:20 AM, Sergiy Bubin wrote: I built OpenMPI with that set of intel compilers. Everything seems to be fine and I can compile my fortran+MPI code with no problem when I invoke ifort. I should say that I do not actually invoke the "wrapper" mpi compiler. I normally just add flags as MPICOMPFLAGS=$(shell mpif90 --showme:compile) and MPILINKFLAGS=$(shell mpif90 --showme:link) in my makefile. I know it is not the recommended way of doing things but the reason I do that is that I absolutely need to be able to use different fortran compilers to build my fortran code. Avoiding the use of mpif90 accomplishes nothing for changing between incompatible Fortran compilers. Run-time libraries are incompatible among ifort, gfortran, and Oracle Fortran, so you can't link a mixture of objects compiled by incompatible Fortran compilers except in limited circumstances. This includes the MPI Fortran library. I don't see how it is too great an inconvenience for your Makefile to set PATH and LD_LIBRARY_PATH to include the mpif90 corresponding to the chosen Fortran compiler. You may need to build your own mpif90 for gfortran as well as the other compilers, so as to configure it to keep it off the default PATHs (e.g. --prefix=/opt/ompi1.4gf/), if you can't move the Ubuntu ompi. Surely most of this is implied in the OpenMPI instructions. -- Tim Prince
Re: [OMPI users] redirecting output
On 03/30/2012 10:41 AM, tyler.bal...@huskers.unl.edu wrote: I am using the command mpirun -np nprocs -machinefile machines.arch Pcrystal and my output strolls across my terminal I would like to send this output to a file and I cannot figure out how to do soI have tried the general > FILENAME and > log & these generate files however they are empty.any help would be appreciated. If you run under screen your terminal output should be collected in screenlog. Beats me why some sysadmins don't see fit to install screen. -- Tim Prince
Re: [OMPI users] [EXTERNAL] Possible to build ompi-1.4.3 or 1.4.5 without a C++ compiler?
On 03/20/2012 08:35 AM, Gunter, David O wrote: I wish it were that easy. When I go that route, I get error messages like the following when trying to compile the parallel code with Intel: libmpi.so: undefined reference to `__intel_sse2_strcpy' and other messages for every single Intel-implemented standard C-function. -david -- There was a suggestion in the snipped portion which suggested you use gcc/g++ together with ifort; that doesn't appear to be what you mean by "that route." (unless you forgot to recompile your .c files by gcc) You have built some objects with an Intel compiler (either ifort or icc/icpc) which is referring to this Intel library function, but you apparently didn't link against the library which provides it. If you use one of those Intel compilers to drive the link, and your environment paths are set accordingly, the Intel libraries would be linked automatically. There was a single release of the compiler several years ago (well out of support now) where that sse2 library was omitted, although the sse3 version was present. -- Tim Prince
Re: [OMPI users] parallelising ADI
On 03/06/2012 03:59 PM, Kharche, Sanjay wrote: Hi I am working on a 3D ADI solver for the heat equation. I have implemented it as serial. Would anybody be able to indicate the best and more straightforward way to parallelise it. Apologies if this is going to the wrong forum. If it's to be implemented in parallelizable fashion (not SSOR style where each line uses updates from the previous line), it should be feasible to divide the outer loop into an appropriate number of blocks, or decompose the physical domain and perform ADI on individual blocks, then update and repeat. -- Tim Prince
Re: [OMPI users] openmpi - gfortran and ifort conflict
On 12/14/2011 12:52 PM, Micah Sklut wrote: Hi Gustavo, Here is the output of : barells@ip-10-17-153-123:~> /opt/openmpi/intel/bin/mpif90 -showme gfortran -I/usr/lib64/mpi/gcc/openmpi/include -pthread -I/usr/lib64/mpi/gcc/openmpi/lib64 -L/usr/lib64/mpi/gcc/openmpi/lib64 -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl This points to gfortran. I do see what you are saying about the 1.4.2 and 1.4.4 components. I'm not sure why that is, but there seems to be some conflict with the existing openmpi, before recently installed 1.4.4 and trying to install with ifort. This is one of the reasons for recommending complete removal (rpm -e if need be) of any MPI which is on a default path (and setting a clean path) before building a new one, as well as choosing a unique install path for the new one. -- Tim Prince
Re: [OMPI users] openmpi - gfortran and ifort conflict
On 12/14/2011 1:20 PM, Fernanda Oliveira wrote: Hi Micah, I do not know if it is exactly what you need but I know that there are environment variables to use with intel mpi. They are: I_MPI_CC, I_MPI_CXX, I_MPI_F77, I_MPI_F90. So, you can set this using 'export' for bash, for instance or directly when you run. I use in my bashrc: export I_MPI_CC=icc export I_MPI_CXX=icpc export I_MPI_F77=ifort export I_MPI_F90=ifort Let me know if it helps. Fernanda Oliveira I didn't see any indication that Intel MPI was in play here. Of course, that's one of the first thoughts, as under Intel MPI, mpif90 uses gfortran mpiifort uses ifort mpicc uses gcc mpiCC uses g++ mpiicc uses icc mpiicpc uses icpc and all the Intel compilers use g++ to find headers and libraries. The advice to try 'which mpif90' would show whether you fell into this bunker. If you use Intel cluster checker, you will see noncompliance if anyone's MPI is on the default paths. You must set paths explicitly according to the MPI you want. Admittedly, that tool didn't gain a high level of adoption. -- Tim Prince
Re: [OMPI users] openmpi - gfortran and ifort conflict
On 12/14/2011 9:49 AM, Micah Sklut wrote: I have installed openmpi for gfortran, but am now attempting to install openmpi as ifort. I have run the following configuration: ./configure --prefix=/opt/openmpi/intel CC=gcc CXX=g++ F77=ifort FC=ifort The install works successfully, but when I run /opt/openmpi/intel/bin/mpif90, it runs as gfortran. Oddly, when I am user: root, the same mpif90 runs as ifort. Can someone please alleviate my confusion as to why I mpif90 is not running as ifort? You might check your configure logs to be certain that ifort was found before gfortran at all stages (did you set paths according to sourcing the ifortvars or compilervars scripts which come with ifort?). 'which mpif90' should tell you whether you are executing the one from your installation. You may have another mpif90 coming first on your PATH. You won't be able to override your PATH and LD_LIBRARY_PATH correctly simply by specifying absolute path to mpif90. -- Tim Prince
Re: [OMPI users] How to justify the use MPI codes on multicore systems/PCs?
On 12/11/2011 12:16 PM, Andreas Schäfer wrote: Hey, on an SMP box threaded codes CAN always be faster than their MPI equivalents. One reason why MPI sometimes turns out to be faster is that with MPI every process actually initializes its own data. Therefore it'll end up in the NUMA domain to which the core running that process belongs. A lot of threaded codes are not NUMA aware. So, for instance the initialization is done sequentially (because it may not take a lot of time), and Linux' first touch policy makes all memory pages belong to a single domain. In essence, those codes will use just a single memory controller (and its bandwidth). Many applications require significant additional RAM and message passing communication per MPI rank. Where those are not adverse issues, MPI is likely to out-perform pure OpenMP (Andreas just quoted some of the reasons), and OpenMP is likely to be favored only where it is an easier development model. The OpenMP library also should implement a first-touch policy, but it's very difficult to carry out fully in legacy applications. OpenMPI has had effective shared memory message passing from the beginning, as did its predecessor (LAM) and all current commercial MPI implementations I have seen, so you shouldn't have to beat on an issue which was dealt with 10 years ago. If you haven't been watching this mail list, you've missed some impressive reporting of new support features for effective pinning by CPU, cache, etc. When you get to hundreds of nodes, depending on your application and interconnect performance, you may need to consider "hybrid" (OpenMP as the threading model for MPI_THREAD_FUNNELED mode), if you are running a single application across the entire cluster. The biggest cluster in my neighborhood, which ranked #54 on the recent Top500, gave best performance in pure MPI mode for that ranking. It uses FDR infiniband, and ran 16 ranks per node, for 646 nodes, with DGEMM running in 4-wide vector parallel. Hybrid was tested as well, with each multiple-thread rank pinned to a single L3 cache. All 3 MPI implementations which were tested have full shared memory message passing and pinning to local cache within each node (OpenMPI and 2 commercial MPIs). -- Tim Prince
Re: [OMPI users] EXTERNAL: Re: Question about compilng with fPIC
On 9/21/2011 12:22 PM, Blosch, Edwin L wrote: Thanks Tim. I'm compiling source units and linking them into an executable. Or perhaps you are talking about how OpenMPI itself is built? Excuse my ignorance... The source code units are compiled like this: /usr/mpi/intel/openmpi-1.4.3/bin/mpif90 -D_GNU_SOURCE -traceback -align -pad -xHost -falign-functions -fpconstant -O2 -I. -I/usr/mpi/intel/openmpi-1.4.3/include -c ../code/src/main/main.f90 The link step is like this: /usr/mpi/intel/openmpi-1.4.3/bin/mpif90 -D_GNU_SOURCE -traceback -align -pad -xHost -falign-functions -fpconstant -static-intel -o ../bin/ -lstdc++ OpenMPI itself was configured like this: ./configure --prefix=/release/cfd/openmpi-intel --without-tm --without-sge --without-lsf --without-psm --without-portals --without-gm --without-elan --without-mx --without-slurm --without-loadleveler --enable-mpirun-prefix-by-default --enable-contrib-no-build=vt --enable-mca-no-build=maffinity --disable-per-user-config-files --disable-io-romio --with-mpi-f90-size=small --enable-static --disable-shared CXX=/appserv/intel/Compiler/11.1/072/bin/intel64/icpc CC=/appserv/intel/Compiler/11.1/072/bin/intel64/icc 'CFLAGS= -O2' 'CXXFLAGS= -O2' F77=/appserv/intel/Compiler/11.1/072/bin/intel64/ifort 'FFLAGS=-D_GNU_SOURCE -traceback -O2' FC=/appserv/intel/Compiler/11.1/072/bin/intel64/ifort 'FCFLAGS=-D_GNU_SOURCE -traceback -O2' 'LDFLAGS= -static-intel' ldd output on the final executable gives: linux-vdso.so.1 => (0x7fffb77e7000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x2b2e2b652000) libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x2b2e2b95e000) libdl.so.2 => /lib64/libdl.so.2 (0x2b2e2bb6d000) libnsl.so.1 => /lib64/libnsl.so.1 (0x2b2e2bd72000) libutil.so.1 => /lib64/libutil.so.1 (0x2b2e2bf8a000) libm.so.6 => /lib64/libm.so.6 (0x2b2e2c18d000) libpthread.so.0 => /lib64/libpthread.so.0 (0x2b2e2c3e4000) libc.so.6 => /lib64/libc.so.6 (0x2b2e2c60) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b2e2c959000) /lib64/ld-linux-x86-64.so.2 (0x2b2e2b433000) Do you see anything that suggests I should have been compiling the application and/or OpenMPI with -fPIC? If you were building any OpenMPI shared libraries, those should use -fPIC. configure may have made the necessary additions. If your application had shared libraries, you would require -fPIC, but apparently you had none. The shared libraries you show presumably weren't involved in your MPI or application build, and you must have linked in static versions of your MPI libraries, where -fPIC wouldn't be required. -- Tim Prince
Re: [OMPI users] Question about compilng with fPIC
On 9/21/2011 11:44 AM, Blosch, Edwin L wrote: Follow-up to a mislabeled thread: "How could OpenMPI (or MVAPICH) affect floating-point results?" I have found a solution to my problem, but I would like to understand the underlying issue better. To rehash: An Intel-compiled executable linked with MVAPICH runs fine; linked with OpenMPI fails. The earliest symptom I could see was some strange difference in numerical values of quantities that should be unaffected by MPI calls. Tim's advice guided me to assume memory corruption. Eugene's advice guided me to explore the detailed differences in compilation. I observed that the MVAPICH mpif90 wrapper adds -fPIC. I tried adding -fPIC and -mcmodel=medium to the compilation of the OpenMPI-linked executable. Now it works fine. I haven't tried without -mcmodel=medium, but my guess is -fPIC did the trick. Does anyone know why compiling with -fPIC has helped? Does it suggest an application problem or an OpenMPI problem? To note: This is an Infiniband-based cluster. The application does pretty basic MPI-1 operations: send, recv, bcast, reduce, allreduce, gather, gather, isend, irecv, waitall. There is one task that uses iprobe with MPI_ANY_TAG, but this task is only involved in certain cases (including this one). Conversely, cases that do not call iprobe have not yet been observed to crash. I am deducing that this function is the problem. If you are making a .so, the included .o files should be built with -fPIC or similar. Ideally, the configure and build tools would enforce this. -- Tim Prince
Re: [OMPI users] Building with thread support on Windows?
On 9/21/2011 11:18 AM, Björn Regnström wrote: Hi, I am trying to build Open MPI 1.4.3 with thread support on Windows. A trivial test program runs if it calls MPI_Init or MP_Init_thread(int *argc, char ***argv, int required, int *provide) with reguired=0 but hangs if required>0. ompi_info for my build reports that there is no thread support but MPI_Init_thread returns provide==required. The only change in the CMake configuration was to check OMPI_ENABLE_MPI_THREADS. Is there anything else that needs to be done with the configuration? I have built 1.4.3 with thread support on several linuxes and mac and it works fine there. Not all Windows compilers work well enough with all threading models that you could expect satisfactory results; in particular, the compilers and thread libraries you use on linux may not be adequate for Windows thread support. -- Tim Prince
Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?
On 9/20/2011 10:50 AM, Blosch, Edwin L wrote: It appears to be a side effect of linkage that is able to change a compute-only routine's answers. I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind of corruption may be going on. Those intrinsics have direct instruction set translations which shouldn't vary from -O1 on up nor with linkage options nor be affected by MPI or insertion of WRITEs. -- Tim Prince
Re: [OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?
On 9/20/2011 7:25 AM, Reuti wrote: Hi, Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L: I am observing differences in floating-point results from an application program that appear to be related to whether I link with OpenMPI 1.4.3 or MVAPICH 1.2.0. Both packages were built with the same installation of Intel 11.1, as well as the application program; identical flags passed to the compiler in each case. I’ve tracked down some differences in a compute-only routine where I’ve printed out the inputs to the routine (to 18 digits) ; the inputs are identical. The output numbers are different in the 16th place (perhaps a few in the 15th place). These differences only show up for optimized code, not for –O0. My assumption is that some optimized math intrinsic is being replaced dynamically, but I do not know how to confirm this. Anyone have guidance to offer? Or similar experience? yes, I face it often but always at a magnitude where it's not of any concern (and not related to any MPI). Due to the limited precision in computers, a simple reordering of operation (although being equivalent in a mathematical sense) can lead to different results. Removing the anomalies with -O0 could proof that. The other point I heard especially for the x86 instruction set is, that the internal FPU has still 80 bits, while the presentation in memory is only 64 bit. Hence when all can be done in the registers, the result can be different compared to the case when some interim results need to be stored to RAM. For the Portland compiler there is a switch -Kieee -pc64 to force it to stay always in 64 bit, and a similar one for Intel is -mp (now -fltconsistency) and -mp1. Diagnostics below indicate that ifort 11.1 64-bit is in use. The options aren't the same as Reuti's "now" version (a 32-bit compiler which hasn't been supported for 3 years or more?). With ifort 10.1 and more recent, you would set at least -assume protect_parens -prec-div -prec-sqrt if you are interested in numerical consistency. If you don't want auto-vectorization of sum reductions, you would use instead -fp-model source -ftz (ftz sets underflow mode back to abrupt, while "source" sets gradual). It may be possible to expose 80-bit x87 by setting the ancient -mp option, but such a course can't be recommended without additional cautions. Quoted comment from OP seem to show a somewhat different question: Does OpenMPI implement any operations in a different way from MVAPICH? I would think it probable that the answer could be affirmative for operations such as allreduce, but this leads well outside my expertise with respect to specific MPI implementations. It isn't out of the question to suspect that such differences might be aggravated when using excessively aggressive ifort options such as -fast. libifport.so.5 => /opt/intel/Compiler/11.1/072/lib/intel64/libifport.so.5 (0x2b6e7e081000) libifcoremt.so.5 => /opt/intel/Compiler/11.1/072/lib/intel64/libifcoremt.so.5 (0x2b6e7e1ba000) libimf.so => /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so (0x2b6e7e45f000) libsvml.so => /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so (0x2b6e7e7f4000) libintlc.so.5 => /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x2b6e7ea0a000) -- Tim Prince
Re: [OMPI users] OpenMPI vs Intel Efficiency question
On 7/12/2011 11:06 PM, Mohan, Ashwin wrote: Tim, Thanks for your message. I was however not clear about your suggestions. Would appreciate if you could clarify. You say," So, if you want a sane comparison but aren't willing to study the compiler manuals, you might use (if your source code doesn't violate the aliasing rules) mpiicpc -prec-div -prec-sqrt -ansi-alias and at least (if your linux compiler is g++) mpiCC -O2 possibly with some of the other options I mentioned earlier." ###From your response above, I understand to use, for Intel, this syntax: "mpiicpc -prec-div -prec-sqrt -ansi-alias" and for OPENMPI use "mpiCC -O2". I am not certain about the other options you mention. ###Also, I presently use a hostfile while submitting my mpirun. Each node has four slots and my hostfile was "nodename slots=4". My compile code is mpiCC -o xxx.xpp. If you have as ancient a g++ as your indication of FC3 implies, it really isn't fair to compare it with a currently supported compiler. ###Do you suggest upgrading the current installation of g++? Would that help? How much it would help would depend greatly on your source code. It won't help much anyway if you don't choose appropriate options. Current g++ is nearly as good at auto-vectorization as icpc, unless you dive into the pragmas and cilk stuff provided with icpc. You really need to look at the gcc manual to understand those options; going into it in any more depth here would try the patience of the list. ###How do I ensure that all 4 slots are active when i submit a mpirun -np 4 command. When I do "top", I notice that all 4 slots are active. I noticed this when I did "top" with the Intel machine too, that is, it showed four slots active. Thank you..ashwin. I was having trouble inferring what platform you are running on, I guessed a single core HyperThread, which doesn't seem to agree with your "4 slots" terminology. If you have 2 single core hyperthread CPUs, it would be a very unusual application to find a gain for running 2 MPI processes per core, but if the sight of 4 processes running on your graph was your goal, I won't argue against it. You must be aware that most clusters running CPUs of the past have HT disabled in BIOS setup. -- Tim Prince
Re: [OMPI users] OpenMPI vs Intel Efficiency question
On 7/12/2011 7:45 PM, Mohan, Ashwin wrote: Hi, I noticed that the exact same code took 50% more time to run on OpenMPI than Intel. I use the following syntax to compile and run: Intel MPI Compiler: (Redhat Fedora Core release 3 (Heidelberg), Kernel version: Linux 2.6.9-1.667smp x86_64** mpiicpc -o .cpp -lmpi OpenMPI 1.4.3: (Centos 5.5 w/ python 2.4.3, Kernel version: Linux 2.6.18-194.el5 x86_64)** mpiCC .cpp -o **Other hardware specs** processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Xeon(TM) CPU 3.60GHz stepping: 4 cpu MHz : 3591.062 cache size : 1024 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 1 apicid : 0 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lmconstant_tsc pni monitor ds_cpl est tm2 cid xtpr bogomips: 7182.12 clflush size: 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: Can the issue of efficiency be deciphered from the above info? Does the compiler flags have an effect on the efficiency of the simulation. If so, what flags maybe useful to check to be included for Open MPI. The default options for icpc are roughly equivalent to the quite aggressive choice g++ -fno-strict-aliasing -ffast-math -fnocx-limited-range -O3 -funroll-loops --param max-unroll-times=2 while you apparently used default -O0 for your mpiCC (if it is g++), neither of which is a very good initial choice for performance analysis. So, if you want a sane comparison but aren't willing to study the compiler manuals, you might use (if your source code doesn't violate the aliasing rules) mpiicpc -prec-div -prec-sqrt -ansi-alias and at least (if your linux compiler is g++) mpiCC -O2 possibly with some of the other options I mentioned earlier. If you have as ancient a g++ as your indication of FC3 implies, it really isn't fair to compare it with a currently supported compiler. Then, Intel MPI, by default, would avoid using HyperThreading, even though you have it enabled on your CPU, so, I suppose, if you are running on a single core, it will be rotating among your 4 MPI processes 1 at a time. The early Intel HyperThread CPUs typically took 15% longer to run MPI jobs when running 2 processes per core. Will including MPICH2 increase efficiency in running simulations using OpenMPI? You have to choose a single MPI. Having MPICH2 installed shouldn't affect performance of OpenMPI or Intel MPI, except to break your installation if you don't keep things sorted out. OpenMPI and Intel MPI normally perform very close, if using equivalent settings, when working within the environments for which both are suited. -- Tim Prince
Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
On 5/10/2011 6:43 AM, francoise.r...@obs.ujf-grenoble.fr wrote: Hi, I compile a parallel program with OpenMPI 1.4.1 (compiled with intel compilers 12 from composerxe package) . This program is linked to MUMPS library 4.9.2, compiled with the same compilers and link with intel MKL. The OS is linux debian. No error in compiling or running the job, but the program freeze inside a call to "zmumps" routine, when the slaves process call MPI_COMM_DUP routine. The program is executed on 2 nodes of 12 cores each (westmere processors) with the following command : mpirun -np 24 --machinefile $OAR_NODE_FILE -mca plm_rsh_agent "oarsh" --mca btl self,openib -x LD_LIBRARY_PATH ./prog We have 12 process running on each node. We submit the job with OAR batch scheduler (the $OAR_NODE_FILE variable and "oarsh" command are specific to this scheduler and are usually working well with openmpi ) via gdb, on the slaves, we can see that they are blocked in MPI_COMM_DUP : (gdb) where #0 0x2b32c1533113 in poll () from /lib/libc.so.6 #1 0x00adf52c in poll_dispatch () #2 0x00adcea3 in opal_event_loop () #3 0x00ad69f9 in opal_progress () #4 0x00a34b4e in mca_pml_ob1_recv () #5 0x009b0768 in ompi_coll_tuned_allreduce_intra_recursivedoubling () #6 0x009ac829 in ompi_coll_tuned_allreduce_intra_dec_fixed () #7 0x0097e271 in ompi_comm_allreduce_intra () #8 0x0097dd06 in ompi_comm_nextcid () #9 0x0097be01 in ompi_comm_dup () #10 0x009a0785 in PMPI_Comm_dup () #11 0x0097931d in pmpi_comm_dup__ () #12 0x00644251 in zmumps (id=...) at zmumps_part1.F:144 #13 0x004c0d03 in sub_pbdirect_init (id=..., matrix_build=...) at sub_pbdirect_init.f90:44 #14 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048 the master wait further : (gdb) where #0 0x2b9dc9f3e113 in poll () from /lib/libc.so.6 #1 0x00adf52c in poll_dispatch () #2 0x00adcea3 in opal_event_loop () #3 0x00ad69f9 in opal_progress () #4 0x0098f294 in ompi_request_default_wait_all () #5 0x00a06e56 in ompi_coll_tuned_sendrecv_actual () #6 0x009ab8e3 in ompi_coll_tuned_barrier_intra_bruck () #7 0x009ac926 in ompi_coll_tuned_barrier_intra_dec_fixed () #8 0x009a0b20 in PMPI_Barrier () #9 0x00978c93 in pmpi_barrier__ () #10 0x004c0dc4 in sub_pbdirect_init (id=..., matrix_build=...) at sub_pbdirect_init.f90:62 #11 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048 Remark : The same code compiled and run well with intel MPI library, from the same intel package, on the same nodes. Did you try compiling with equivalent options in each compiler? For example, (supposing you had gcc 4.6) gcc -O3 -funroll-loops --param max-unroll-times=2 -march=corei7 would be equivalent (as closely as I know) to icc -fp-model source -msse4.2 -ansi-alias As you should be aware, default settings in icc are more closely equivalent to gcc -O3 -ffast-math -fno-cx-limited-range -funroll-loops --param max-unroll-times=2 -fnostrict-aliasing The options I suggest as an upper limit are probably more aggressive than most people have used successfully with OpenMPI. As to run-time MPI options, Intel MPI has affinity with Westmere awareness turned on by default. I suppose testing without affinity settings, particularly when banging against all hyperthreads, is a more severe test of your application. Don't you get better results at 1 rank per core? -- Tim Prince
Re: [OMPI users] USE mpi
On 5/7/2011 2:35 PM, Dmitry N. Mikushin wrote: didn't find the icc compiler Jeff, on 1.4.3 I saw the same issue, even more generally: "make install" cannot find the compiler, if it is an alien compiler (i.e. not the default gcc) - same situation for intel or llvm, for example. The workaround is to specify full paths to compilers with CC=... FC=... in ./configure params. Could it be "make install" breaks some env paths? Most likely reason for not finding an installed icc is that the icc environment (source the compilervars script if you have a current version) wasn't set prior to running configure. Setting up the compiler in question in accordance with its own instructions is a more likely solution than the absolute path choice. OpenMPI configure, for good reason, doesn't search your system to see where a compiler might be installed. What if you had 2 versions of the same named compiler? -- Tim Prince
Re: [OMPI users] Mixing the FORTRAN and C APIs.
On 5/6/2011 10:22 AM, Tim Hutt wrote: On 6 May 2011 16:45, Tim Hutt<tdh...@gmail.com> wrote: On 6 May 2011 16:27, Tim Prince<tcpri...@live.com> wrote: If you want to use the MPI Fortran library, don't convert your Fortran to C. It's difficult to understand why you would consider f2c a "simplest way," but at least it should allow you to use ordinary C MPI function calls. Sorry, maybe I wasn't clear. Just to clarify, all of *my* code is written in C++ (because I don't actually know Fortran), but I want to use some function from PARPACK which is written in Fortran. Hmm I converted my C++ code to use the C OpenMPI interface instead, and now I get link errors (undefined references). I remembered I've been linking with -lmpi -lmpi_f77, so maybe I need to also link with -lmpi_cxx or -lmpi++ ... what exactly do each of these libraries contain? Also I have run into the problem that the communicators are of type "MPI_Comm" in C, and "integer" in Fortran... I am using MPI_COMM_WORLD in each case so I assume that will end up referring to the same thing... but maybe you really can't mix Fortran and C. Expert opinion would be very very welcome! If you use your OpenMPI mpicc wrapper to compile and link, the MPI libraries should be taken care of. Style usage in an f2c translation is debatable, but you have an #include "f2c.h" or "g2c.h" which translates the Fortran data types to legacy C equivalent. By legacy I mean that in the f2c era, the inclusion of C data types in Fortran via USE iso_c_binding had not been envisioned. One would think that you would use the MPI header data types on both the Fortran and the C side, even though you are using legacy interfaces. Slip-ups in MPI data types often lead to run-time errors. If you have an error-checking MPI library such as the Intel MPI one, you get a little better explanation at the failure point. -- Tim Prince
Re: [OMPI users] Mixing the FORTRAN and C APIs.
On 5/6/2011 7:58 AM, Tim Hutt wrote: Hi, I'm trying to use PARPACK in a C++ app I have written. This is an FORTRAN MPI routine used to calculate SVDs. The simplest way I found to do this is to use f2c to convert it to C, and then call the resulting functions from my C++ code. However PARPACK requires that I write some user-defined operations to be parallel using MPI. So far I have just been calling the FORTRAN versions of the MPI functions from C, because I wasn't sure whether you can mix the APIs. I.e. I've been doing this: -8<- extern "C" { int mpi_init__(integer *); int mpi_comm_rank__(integer *, integer *, integer *); int mpi_comm_size__(integer *, integer *, integer *); int mpi_finalize__(integer *); int mpi_allgatherv__(doublereal *, integer *, integer *, doublereal *, integer *, integer *, integer *, integer *); // OpenMPI version. const integer MPI_DOUBLE_PRECISION = 17; } bool MPI__Init() { integer ierr = 0; mpi_init__(); return ierr == 0; } 8< It works so far, but is getting quite tedious and seems like the wrong way to do it. Also I don't know if it's related but when I use allgatherv it gives me a segfault: [panic:20659] *** Process received signal *** [panic:20659] Signal: Segmentation fault (11) [panic:20659] Signal code: Address not mapped (1) [panic:20659] Failing at address: 0x7f4effe8 [panic:20659] [ 0] /lib/libc.so.6(+0x33af0) [0x7f4f8fd62af0] [panic:20659] [ 1] /usr/lib/libstdc++.so.6(_ZNSolsEi+0x3) [0x7f4f905ec0c3] [panic:20659] [ 2] ./TDLSM() [0x510322] [panic:20659] [ 3] ./TDLSM() [0x50ec8d] [panic:20659] [ 4] ./TDLSM() [0x404ee7] [panic:20659] [ 5] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f4f8fd4dc4d] [panic:20659] [ 6] ./TDLSM() [0x404c19] [panic:20659] *** End of error message *** So my question is: Can I intermix the C and FORTRAN APIs within one program? Oh and also I think the cluster I will eventually run this on (cx1.hpc.ic.ac.uk, if anyone is from Imperial) doesn't use OpenMP, so what about other MPI implementations? If you want to use the MPI Fortran library, don't convert your Fortran to C. It's difficult to understand why you would consider f2c a "simplest way," but at least it should allow you to use ordinary C MPI function calls. The MPI Fortran library must be built against the same Fortran run-time libraries which you use for your own Fortran code. The header files for the Fortran MPI calls probably don't work in C. It would be a big struggle to get them to work with f2c, since f2c doesn't have much ability to deal with headers other than its own. There's no reason you can't make both C and Fortran MPI calls in the same application. If you mean mixing a send from one language with a receive in another, I think most would avoid that. Whether someone uses OpenMP has little to do with choice of MPI implementation. Some of us still may be cursing the choice of OpenMPI for the name of an MPI implementation. -- Tim Prince
Re: [OMPI users] Problem compiling OpenMPI on Ubuntu 11.04
On 04/19/2011 01:24 PM, Sergiy Bubin wrote: /usr/include/c++/4.5/iomanip(64): error: expected an expression { return { __mask }; } ^ /usr/include/c++/4.5/iomanip(94): error: expected an expression { return { __mask }; } ^ /usr/include/c++/4.5/iomanip(125): error: expected an expression { return { __base }; } ^ /usr/include/c++/4.5/iomanip(193): error: expected an expression { return { __n }; } ^ /usr/include/c++/4.5/iomanip(223): error: expected an expression { return { __n }; } ^ /usr/include/c++/4.5/iomanip(163): error: expected an expression { return { __c }; } ^ If you're using icpc, this seeming incompatibility between icpc and g++ 4.5 has been discussed on the icpc forum http://software.intel.com/en-us/forums/showthread.php?t=78677=%28iomanip%29 where you should see that you must take care to set option -std=c++0x when using current under icpc, as it is treated as a c++0x feature. You might try adding the option to the CXXFLAGS or whatever they are called in openmpi build (or to the icpc.cfg in your icpc installation). -- Tim Prince
Re: [OMPI users] Shared Memory Performance Problem.
On 3/30/2011 10:08 AM, Eugene Loh wrote: Michele Marena wrote: I've launched my app with mpiP both when two processes are on different node and when two processes are on the same node. The process 0 is the manager (gathers the results only), processes 1 and 2 are workers (compute). This is the case processes 1 and 2 are on different nodes (runs in 162s). @--- MPI Time (seconds) --- Task AppTime MPITime MPI% 0 162 162 99.99 1 162 30.2 18.66 2 162 14.7 9.04 * 486 207 42.56 The case when processes 1 and 2 are on the same node (runs in 260s). @--- MPI Time (seconds) --- Task AppTime MPITime MPI% 0 260 260 99.99 1 260 39.7 15.29 2 260 26.4 10.17 * 779 326 41.82 I think there's a contention problem on the memory bus. Right. Process 0 spends all its time in MPI, presumably waiting on workers. The workers spend about the same amount of time on MPI regardless of whether they're placed together or not. The big difference is that the workers are much slower in non-MPI tasks when they're located on the same node. The issue has little to do with MPI. The workers are hogging local resources and work faster when placed on different nodes. However, the message size is 4096 * sizeof(double). Maybe I are wrong in this point. Is the message size too huge for shared memory? No. That's not very large at all. Not even large enough to expect the non-temporal storage issue about cache eviction to arise. -- Tim Prince
Re: [OMPI users] Shared Memory Performance Problem.
On 3/28/2011 3:29 AM, Michele Marena wrote: Each node have two processors (no dual-core). which seems to imply that the 2 processors share memory space and a single memory buss, and the question is not about what I originally guessed. -- Tim Prince
Re: [OMPI users] Shared Memory Performance Problem.
On 3/27/2011 2:26 AM, Michele Marena wrote: Hi, My application performs good without shared memory utilization, but with shared memory I get performance worst than without of it. Do I make a mistake? Don't I pay attention to something? I know OpenMPI uses /tmp directory to allocate shared memory and it is in the local filesystem. I guess you mean shared memory message passing. Among relevant parameters may be the message size where your implementation switches from cached copy to non-temporal (if you are on a platform where that terminology is used). If built with Intel compilers, for example, the copy may be performed by intel_fast_memcpy, with a default setting which uses non-temporal when the message exceeds about some preset size, e.g. 50% of smallest L2 cache for that architecture. A quick search for past posts seems to indicate that OpenMPI doesn't itself invoke non-temporal, but there appear to be several useful articles not connected with OpenMPI. In case guesses aren't sufficient, it's often necessary to profile (gprof, oprofile, Vtune, ) to pin this down. If shared message slows your application down, the question is whether this is due to excessive eviction of data from cache; not a simple question, as most recent CPUs have 3 levels of cache, and your application may require more or less data which was in use prior to the message receipt, and may use immediately only a small piece of a large message. -- Tim Prince
Re: [OMPI users] intel compiler linking issue and issue of environment variable on remote node, with open mpi 1.4.3
On 3/21/2011 5:21 AM, ya...@adina.com wrote: I am trying to compile our codes with open mpi 1.4.3, by intel compilers 8.1. (1) For open mpi 1.4.3 installation on linux beowulf cluster, I use: ./configure --prefix=/home/yiguang/dmp-setup/openmpi-1.4.3 CC=icc CXX=icpc F77=ifort FC=ifort --enable-static LDFLAGS="-i-static - static-libcxa" --with-wrapper-ldflags="-i-static -static-libcxa" 2>&1 | tee config.log and make all install 2>&1 | tee install.log The issue is that I am trying to build open mpi 1.4.3 with intel compiler libraries statically linked to it, so that when we run mpirun/orterun, it does not need to dynamically load any intel libraries. But what I got is mpirun always asks for some intel library(e.g. libsvml.so) if I do not put intel library path on library search path($LD_LIBRARY_PATH). I checked the open mpi user archive, it seems only some kind user mentioned to use "-i-static"(in my case) or "-static-intel" in ldflags, this is what I did, but it seems not working, and I did not get any confirmation whether or not this works for anyone else from the user archive. could anyone help me on this? thanks! If you are to use such an ancient compiler (apparently a 32-bit one), you must read the docs which come with it, rather than relying on comments about a more recent version. libsvml isn't included automatically at link time by that 32-bit compiler, unless you specify an SSE option, such as -xW. It's likely that no one has verified OpenMPI with a compiler of that vintage. We never used the 32-bit compiler for MPI, and we encountered run-time library bugs for the ifort x86_64 which weren't fixed until later versions. -- Tim Prince
Re: [OMPI users] Open MPI access the same file in parallel ?
On 3/9/2011 11:05 PM, Jack Bryan wrote: thanks I am using GNU mpic++ compiler. Does it can automatically support accessing a file by many parallel processes ? It should follow the gcc manual, e.g. http://www.gnu.org/s/libc/manual/html_node/Opening-Streams.html I think you want *opentype to evaluate to 'r' (readonly). -- Tim Prince
Re: [OMPI users] What's wrong with this code?
On 2/23/2011 8:27 AM, Prentice Bisbal wrote: Jeff Squyres wrote: On Feb 23, 2011, at 9:48 AM, Tim Prince wrote: I agree with your logic, but the problem is where the code containing the error is coming from - it's comping from a header files that's a part of Open MPI, which makes me think this is a cmpiler error, since I'm sure there are plenty of people using the same header file. in their code. Are you certain that they all find it necessary to re-define identifiers from that header file, rather than picking parameter names which don't conflict? Without seeing the code, it sounds like Tim might be right: someone is trying to re-define the MPI_STATUS_SIZE parameter that is being defined by OMPI's mpif-config.h header file. Regardless of include file/initialization ordering (i.e., regardless of whether mpif-config.h is the first or Nth entity to try to set this parameter), user code should never set this parameter value. Or any symbol that begins with MPI_, for that matter. The entire "MPI_" namespace is reserved for MPI. I understand that, and I checked the code to make sure the programmer didn't do anything stupid like that. The entire code is only a few hundred lines in two different files. In the entire program, there is only 1 include statement: include 'mpif.h' and MPI_STATUS_SIZE appears only once: integer ierr,istatus(MPI_STATUS_SIZE) I have limited knowledge of Fortran programming, but based on this, I don't see how MPI_STATUS_SIZE could be getting overwritten. Earlier, you showed a preceding PARAMETER declaration setting a new value for that name, which would be required to make use of it in this context. Apparently, you intend to support only compilers which violate the Fortran standard by supporting a separate name space for PARAMETER identifiers, so that you can violate the MPI standard by using MPI_ identifiers in a manner which I believe is called shadowing in C. -- Tim Prince
Re: [OMPI users] What's wrong with this code?
On 2/23/2011 6:41 AM, Prentice Bisbal wrote: Tim Prince wrote: On 2/22/2011 1:41 PM, Prentice Bisbal wrote: One of the researchers I support is writing some Fortran code that uses Open MPI. The code is being compiled with the Intel Fortran compiler. This one line of code: integer ierr,istatus(MPI_STATUS_SIZE) leads to these errors: $ mpif90 -o simplex simplexmain579m.for simplexsubs579 /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-config.h(88): error #6406: Conflicting attributes or multiple declaration of name. [MPI_STATUS_SIZE] parameter (MPI_STATUS_SIZE=5) -^ simplexmain579m.for(147): error #6591: An automatic object is invalid in a main program. [ISTATUS] integer ierr,istatus(MPI_STATUS_SIZE) -^ simplexmain579m.for(147): error #6219: A specification expression object must be a dummy argument, a COMMON block object, or an object accessible through host or use association [MPI_STATUS_SIZE] integer ierr,istatus(MPI_STATUS_SIZE) -^ /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211): error #6756: A COMMON block data object must not be an automatic object. [MPI_STATUS_IGNORE] integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE) --^ /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211): error #6591: An automatic object is invalid in a main program. [MPI_STATUS_IGNORE] integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE) Any idea how to fix this? Is this a bug in the Intel compiler, or the code? I can't see the code from here. The first failure to recognize the PARAMETER definition apparently gives rise to the others. According to the message, you already used the name MPI_STATUS_SIZE in mpif-config.h and now you are trying to give it another usage (not case sensitive) in the same scope. If so, it seems good that the compiler catches it. I agree with your logic, but the problem is where the code containing the error is coming from - it's comping from a header files that's a part of Open MPI, which makes me think this is a cmpiler error, since I'm sure there are plenty of people using the same header file. in their code. Are you certain that they all find it necessary to re-define identifiers from that header file, rather than picking parameter names which don't conflict? -- Tim Prince
Re: [OMPI users] What's wrong with this code?
On 2/22/2011 1:41 PM, Prentice Bisbal wrote: One of the researchers I support is writing some Fortran code that uses Open MPI. The code is being compiled with the Intel Fortran compiler. This one line of code: integer ierr,istatus(MPI_STATUS_SIZE) leads to these errors: $ mpif90 -o simplex simplexmain579m.for simplexsubs579 /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-config.h(88): error #6406: Conflicting attributes or multiple declaration of name. [MPI_STATUS_SIZE] parameter (MPI_STATUS_SIZE=5) -^ simplexmain579m.for(147): error #6591: An automatic object is invalid in a main program. [ISTATUS] integer ierr,istatus(MPI_STATUS_SIZE) -^ simplexmain579m.for(147): error #6219: A specification expression object must be a dummy argument, a COMMON block object, or an object accessible through host or use association [MPI_STATUS_SIZE] integer ierr,istatus(MPI_STATUS_SIZE) -^ /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211): error #6756: A COMMON block data object must not be an automatic object. [MPI_STATUS_IGNORE] integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE) --^ /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211): error #6591: An automatic object is invalid in a main program. [MPI_STATUS_IGNORE] integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE) Any idea how to fix this? Is this a bug in the Intel compiler, or the code? I can't see the code from here. The first failure to recognize the PARAMETER definition apparently gives rise to the others. According to the message, you already used the name MPI_STATUS_SIZE in mpif-config.h and now you are trying to give it another usage (not case sensitive) in the same scope. If so, it seems good that the compiler catches it. -- Tim Prince
Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance
On 1/7/2011 6:49 AM, Jeff Squyres wrote: My understanding is that hyperthreading can only be activated/deactivated at boot time -- once the core resources are allocated to hyperthreads, they can't be changed while running. Whether disabling the hyperthreads or simply telling Linux not to schedule on them makes a difference performance-wise remains to be seen. I've never had the time to do a little benchmarking to quantify the difference. If someone could rustle up a few cycles (get it?) to test out what the real-world performance difference is between disabling hyperthreading in the BIOS vs. telling Linux to ignore the hyperthreads, that would be awesome. I'd love to see such results. My personal guess is that the difference is in the noise. But that's a guess. Applications which depend on availability of full size instruction lookaside buffer would be candidates for better performance with hyperthreads completely disabled. Many HPC applications don't stress ITLB, but some do. Most of the important resources are allocated dynamically between threads, but the ITLB is an exception. We reported results of an investigation on Intel Nehalem 4-core hyperthreading where geometric mean performance of standard benchmarks for certain commercial applications was 2% better with hyperthreading disabled at boot time, compared with best 1 rank per core scheduling with hyperthreading enabled. Needless to say, the report wasn't popular with marketing. I haven't seen an equivalent investigation for the 6-core CPUs, where various strange performance effects have been noted, so, as Jeff said, the hyperthreading effect could be "in the noise." -- Tim Prince
Re: [OMPI users] Call to MPI_Test has large time-jitter
On 12/17/2010 6:43 PM, Sashi Balasingam wrote: Hi, I recently started on an MPI-based, 'real-time', pipelined-processing application, and the application fails due to large time-jitter in sending and receiving messages. Here are related info - 1) Platform: a) Intel Box: Two Hex-core, Intel Xeon, 2.668 GHz (...total of 12 cores), b) OS: SUSE Linux Enterprise Server 11 (x86_64) - Kernel \r (\l) c) MPI Rev: (OpenRTE) 1.4, (...Installed OFED package) d) HCA: InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev a0) 2) Application detail a) Launching 7 processes, for pipelined processing, where each process waits for a message (sizes vary between 1 KBytes to 26 KBytes), then process the data, and outputs a message (sizes vary between 1 KBytes to 26 KBytes), to next process. b) MPI transport functions used : "MPI_Isend", MPI_Irecv, MPI_Test. i) For Receiving messages, I first make an MPI_Irecv call, followed by a busy-loop on MPI_Test, waiting for message ii) For Sending message, there is a busy-loop on MPI_Test to ensure prior buffer was sent, then use MPI_Isend. c) When the job starts, all these 7 process are put in High priority mode ( SCHED_FIFO policy, with priority setting of 99). The Job entails an input data packet stream (and a series of MPI messages), continually at 40 micro-sec rate, for a few minutes. 3) The Problem: Most calls to MPI_Test (...which is non-blocking) takes a few micro-sec, but around 10% of the job, it has a large jitter, that vary from 1 to 100 odd millisec. This causes some of the application input queues to fill-up and cause a failure. Any suggestions to look at on the MPI settings or OS config/issues will be much appreciated. I didn't see anything there about your -mca affinity settings. Even if the defaults don't choose optimum mapping, it's way better than allowing them to float as you would with multiple independent jobs running. -- Tim Prince
Re: [OMPI users] Mac Ifort and gfortran together
On 12/15/2010 8:22 PM, Jeff Squyres wrote: Sorry for the ginormous delay in replying here; I blame SC'10, Thanksgiving, and the MPI Forum meeting last week... On Nov 29, 2010, at 2:12 PM, David Robertson wrote: I'm noticing a strange problem with Open MPI 1.4.2 on Mac OS X 10.6. We use both Intel Ifort 11.1 and gfortran 4.3 on the same machine and switch between them to test and debug code. I had runtime problems when I compiled openmpi in my usual way of no shared libraries so I switched to shared and it runs now. What problems did you have? OMPI should work fine when compiled statically. However, in order for it to work with ifort I ended up needing to add the location of my intel compiled Open MPI libraries (/opt/intelsoft/openmpi/lib) to my DYLD_LIBRARY_PATH environment variable to to get codes to compile and/or run with ifort. Is this what Intel recommends for anything compiled with ifort on OS X, or is this unique to OMPI-compiled MPI applications? The problem is that adding /opt/intelsoft/openmpi/lib to DYLD_LIBRARY_PATH broke my Open MPI for gfortran. Now when I try to compile with mpif90 for gfortran it thinks it's actually trying to compile with ifort still. As soon as I take the above path out of DYLD_LIBRARY_PATH everything works fine. Also, when I run ompi_info everything looks right except prefix. It says /opt/intelsoft/openmpi rather than /opt/gfortransoft/openmpi like it should. It should be noted that having /opt/intelsoft/openmpi in LD_LIBRARY_PATH does not produce the same effect. I'm not quite clear on your setup, but it *sounds* like you're somehow mixing up 2 different installations of OMPI -- one in /opt/intelsoft and the other in /opt/gfortransoft. Can you verify that you're using the "right" mpif77 (and friends) when you intend to, and so on? Well, yes, he has to use the MPI Fortran libraries compiled by ifort with his ifort application build, and the ones compiled by gfortran with a gfortran application build. There's nothing "strange" about it; the PATH for mpif90 and DYLD_LIBRARY_PATH for the Fortran library have to be set correctly for each case. If linking statically with the MPI Fortran library, you still must choose the one built with the compatible Fortran. gfortran and ifort can share C run-time libraries but not the Fortran ones. It's the same as on linux (and, likely, Windows). -- Tim Prince
Re: [OMPI users] meaning of MPI_THREAD_*
On 12/6/2010 3:16 AM, Hicham Mouline wrote: Hello, 1. MPI_THREAD_SINGLE: Only one thread will execute. Does this really mean the process cannot have any other threads at all, even if they doen't deal with MPI at all? I'm curious as to how this case affects the openmpi implementation? Essentially, what is the difference between MPI_THREAD_SINGLE and MPI_THREAD_FUNNELED? 2. In my case, I'm interested in MPI_THREAD_SERIALIZED. However if it's available, I can use MPI_THREAD_FUNNELED. What cmake flags do I need to enable to allow this mode? 3. Assume I assign only 1 thread in my program to deal with MPI. What is the difference between int MPI::Init_thread(MPI_THREAD_SINGLE) int MPI::Init_thread(MPI_THREAD_FUNNELED) int MPI::Init() You're question is too broad; perhaps you didn't intend it that way. Are you trying to do something which may work only with a specific version of openmpi, or are you willing to adhere to portable practice? I tend to believe what it says at http://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node165.htm including: A call to MPI_INIT has the same effect as a call to MPI_INIT_THREAD with a required = MPI_THREAD_SINGLE You would likely use one of those if all your MPI calls are from a single thread, and you don't perform any threading inside MPI. MPI implementations vary on the extent to which a higher level of threading than what is declared can be used successfully (there's no guarantee of bad results if you exceed what was set by MPI_INIT). There shouldn't be any bad effect from setting a higher level of thread support which you never use. I would think your question about cmake flags would apply only once you chose a compiler. I have never seen anyone try mixing auto-parallelization with MPI; that would require MPI_THREAD_MULTIPLE but still appears unpredictable. MPI_THREAD_FUNNELED is used often with OpenMP parallelization inside MPI. -- Tim Prince
Re: [OMPI users] Help!!!!!!!!!!!!Openmpi instal for ubuntu 64 bits
On 11/29/2010 3:03 PM, Gus Correa wrote: Jeff Squyres wrote: 1- ./configure FC=ifort F77=ifort CC=icc CXX=icpc 2-make all 3 sudo make install all os passos 1 e 2 operam normalmente, mas quando uso o comando make install aparece o erro que nao consigo solucionar. You say only step 3 above fails. You could try "sudo -E make install". I take it that sudo -E should copy over the environment variable settings. I haven't been able to find any documentation of this option, and I don't currently have an Ubuntu installation to check it. Not being aware of such an option, I used to do: sudo source .. compilervars.sh make install -- Tim Prince
Re: [OMPI users] Help!!!!!!!!!!!!Openmpi instal for ubuntu 64 bits
On 11/29/2010 11:31 AM, Gus Correa wrote: Hi Mauricio Check if you have icc (in the Intel compiler bin directory/subdirectories). Check also if it is in your PATH environment variable. "which icc" will tell. If not, add it to PATH. Actually, the right way to do it is to run the Intel scripts to set the whole compiler environment, not only PATH. The scripts should be called something like iccvars.csh iccvars.sh for C/C++ and ifortvars.csh ifortvars.sh for Fortran, and are also in the Intel bin directory. You can source these scripts in your .cshrc/.bashrc file, using the correct shell (.sh if you use [ba]sh, .csh if you use [t]csh). This is in the Intel compiler documentation, take a look. For the icc version mentioned, there is a compilervars.[c]sh which takes care of both C++ and Fortran (if present), as do either of the iccvars or ifortvars, when the compilers are installed in the same directory. Also, you can compile OpenMPI with gcc,g++ and gfortran, if you want. If they are not yet installed in your Ubuntu, you can get them with apt-get, or whatever Ubuntu uses to get packages. icc ought to work interchangeably with gcc, provided the same g++ version is always on PATH. icc doesn't work without the g++. Thus, it is entirely reasonable to build openmpi with gcc and use either gcc or icc to build the application. gfortran and ifort, however, involve incompatible run-time libraries, and the openmpi fortran libraries won't be interchangeable. You must take care not to mix 32- and 64-bit compilers/libraries. Normally you would build everything 64-bit, both openmpi and the application. Ubuntu doesn't follow the standard scheme for location of 32-bit vs. 64-bit compilers and libraries, but the Intel compiler version you mentioned should resolve this automatically. -- Tim Prince
Re: [OMPI users] link problem on 64bit platform
On 11/1/2010 5:24 AM, Jeff Squyres wrote: On Nov 1, 2010, at 5:20 AM, jody wrote: jody@aim-squid_0 ~/progs $ mpiCC -g -o HelloMPI HelloMPI.cpp /usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /opt/openmpi-1.4.2/lib/libmpi_cxx.so when searching for -lmpi_cxx This is the key message -- it found libmpi_cxx.so, but the linker deemed it incompatible, so it skipped it. Typically, it means that the cited library is a 32-bit one, to which the 64-bit ld will react in this way. You could have verified this by file /opt/openmpi-1.4.2/lib/* By normal linux conventions a directory named /lib/ as opposed to /lib64/ would contain only 32-bit libraries. If gentoo doesn't conform with those conventions, maybe you should do your learning on a distro which does. -- Tim Prince
Re: [OMPI users] hdf5 build error using openmpi and Intel Fortran
On 10/6/2010 12:09 AM, Götz Waschk wrote: libtool: link: mpif90 -shared .libs/H5f90global.o .libs/H5fortran_types.o .libs/H5_ff.o .libs/H5Aff.o .libs/H5Dff.o .libs/H5Eff.o .libs/H5Fff.o .libs/H5Gff.o .libs/H5Iff.o .libs/H5Lff.o .libs/H5Off.o .libs/H5Pff.o .libs/H5Rff.o .libs/H5Sff.o .libs/H5Tff.o .libs/H5Zff.o .libs/H5_DBLE_InterfaceInclude.o .libs/H5f90kit.o .libs/H5_f.o .libs/H5Af.o .libs/H5Df.o .libs/H5Ef.o .libs/H5Ff.o .libs/H5Gf.o .libs/H5If.o .libs/H5Lf.o .libs/H5Of.o .libs/H5Pf.o .libs/H5Rf.o .libs/H5Sf.o .libs/H5Tf.o .libs/H5Zf.o .libs/H5FDmpiof.o .libs/HDF5mpio.o .libs/H5FDmpioff.o-lmpi -lsz -lz -lm -m64 -mtune=generic -rpath=/usr/lib64/openmpi/1.4-icc/lib -soname libhdf5_fortran.so.6 -o .libs/libhdf5_fortran.so.6.0.4 ifort: command line warning #10156: ignoring option '-r'; no argument required ifort: command line warning #10156: ignoring option '-s'; no argument required ld: libhdf5_fortran.so.6: No such file: No such file or directory Do -Wl,-rpath and -Wl,-soname= work any better? -- Tim Prince
Re: [OMPI users] Memory affinity
On 9/27/2010 2:50 PM, David Singleton wrote: On 09/28/2010 06:52 AM, Tim Prince wrote: On 9/27/2010 12:21 PM, Gabriele Fatigati wrote: HI Tim, I have read that link, but I haven't understood if enabling processor affinity are enabled also memory affinity because is written that: "Note that memory affinity support is enabled only when processor affinity is enabled" Can i set processory affinity without memory affinity? This is my question.. 2010/9/27 Tim Prince<n...@aol.com> On 9/27/2010 9:01 AM, Gabriele Fatigati wrote: if OpenMPI is numa-compiled, memory affinity is enabled by default? Because I didn't find memory affinity alone ( similar) parameter to set at 1. The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity has a useful introduction to affinity. It's available in a default build, but not enabled by default. Memory affinity is implied by processor affinity. Your system libraries are set up so as to cause any memory allocated to be made local to the processor, if possible. That's one of the primary benefits of processor affinity. Not being an expert in openmpi, I assume, in the absence of further easily accessible documentation, there's no useful explicit way to disable maffinity while using paffinity on platforms other than the specified legacy platforms. Memory allocation policy really needs to be independent of processor binding policy. The default memory policy (memory affinity) of "attempt to allocate to the NUMA node of the cpu that made the allocation request but fallback as needed" is flawed in a number of situations. This is true even when MPI jobs are given dedicated access to processors. A common one is where the local NUMA node is full of pagecache pages (from the checkpoint of the last job to complete). For those sites that support suspend/resume based scheduling, NUMA nodes will generally contain pages from suspended jobs. Ideally, the new (suspending) job should suffer a little bit of paging overhead (pushing out the suspended job) to get ideal memory placement for the next 6 or whatever hours of execution. An mbind (MPOL_BIND) policy of binding to the one local NUMA node will not work in the case of one process requiring more memory than that local NUMA node. One scenario is a master-slave where you might want: master (rank 0) bound to processor 0 but not memory bound slave (rank i) bound to processor i and memory bound to the local memory of processor i. They really are independent requirements. Cheers, David ___ interesting; I agree with those of your points on which I have enough experience to have an opinion. However, the original question was not whether it would be desirable to have independent memory affinity, but whether it is possible currently within openmpi to avoid memory placements being influenced by processor affinity. I have seen the case you mention, where performance of a long job suffers because the state of memory from a previous job results in an abnormal number of allocations falling over to other NUMA nodes, but I don't know the practical solution. -- Tim Prince
Re: [OMPI users] Memory affinity
On 9/27/2010 12:21 PM, Gabriele Fatigati wrote: HI Tim, I have read that link, but I haven't understood if enabling processor affinity are enabled also memory affinity because is written that: "Note that memory affinity support is enabled only when processor affinity is enabled" Can i set processory affinity without memory affinity? This is my question.. 2010/9/27 Tim Prince<n...@aol.com> On 9/27/2010 9:01 AM, Gabriele Fatigati wrote: if OpenMPI is numa-compiled, memory affinity is enabled by default? Because I didn't find memory affinity alone ( similar) parameter to set at 1. The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity has a useful introduction to affinity. It's available in a default build, but not enabled by default. Memory affinity is implied by processor affinity. Your system libraries are set up so as to cause any memory allocated to be made local to the processor, if possible. That's one of the primary benefits of processor affinity. Not being an expert in openmpi, I assume, in the absence of further easily accessible documentation, there's no useful explicit way to disable maffinity while using paffinity on platforms other than the specified legacy platforms. -- Tim Prince
Re: [OMPI users] Memory affinity
On 9/27/2010 9:01 AM, Gabriele Fatigati wrote: if OpenMPI is numa-compiled, memory affinity is enabled by default? Because I didn't find memory affinity alone ( similar) parameter to set at 1. The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity has a useful introduction to affinity. It's available in a default build, but not enabled by default. If you mean something other than this, explanation is needed as part of your question. taskset() or numactl() might be relevant, if you require more detailed control. -- Tim Prince
Re: [OMPI users] send and receive buffer the same on root
On 9/16/2010 9:58 AM, David Zhang wrote: It's compiler specific I think. I've done this with OpenMPI no problem, however on one another cluster with ifort I've gotten error messages about not using MPI_IN_PLACE. So I think if it compiles, it should work fine. On Thu, Sep 16, 2010 at 10:01 AM, Tom Rosmond <rosm...@reachone.com <mailto:rosm...@reachone.com>> wrote: I am working with a Fortran 90 code with many MPI calls like this: call mpi_gatherv(x,nsize(rank+1), mpi_real,x,nsize,nstep,mpi_real,root,mpi_comm_world,mstat) Compiler can't affect what happens here (unless maybe you use x again somewhere). Maybe you mean MPI library? Intel MPI probably checks this at run time and issues an error. I've dealt with run-time errors (which surfaced along with an ifort upgrade) which caused silent failure (incorrect numerics) on openmpi but a fatal diagnostic from Intel MPI run-time, due to multiple uses of the same buffer.Moral: even if it works for you now with openmpi, you could be setting up for unexpected failure in the future. -- Tim Prince
Re: [OMPI users] OpenMPI Run-Time "Freedom" Question
On 8/12/2010 6:04 PM, Michael E. Thomadakis wrote: On 08/12/10 18:59, Tim Prince wrote: On 8/12/2010 3:27 PM, Ralph Castain wrote: Ick - talk about confusing! I suppose there must be -some- rational reason why someone would want to do this, but I can't imagine what it would be I'm no expert on compiler vs lib confusion, but some of my own experience would say that this is a bad idea regardless of whether or not OMPI is involved. Compiler version interoperability is usually questionable, depending upon how far apart the rev levels are. Only answer I can offer is that you would have to try it. It will undoubtedly be a case-by-case basis: some combinations might work, others might fail. On Aug 12, 2010, at 3:53 PM, Michael E. Thomadakis wrote: Hello OpenMPI, we have deployed OpenMPI 1.4.1 and 1.4.2 on our Intel Nehalem cluster using Intel compilers V 11.1.059 and 11.1.072 respectively, and one user has the following request: Can we build OpenMPI version say O.1 against Intel compilers version say I.1 but then built an application with OpenMPI O.1 BUT then use a DIFFERENT Intel compiler version say I.2 to built and run this MPI application? I suggested to him to 1) simply try to built and run the application with O.1 but use Intel compilers version I.X whatever this X is and see if it has any issues. OR 2) If the above does not work, I would build OpenMPI O.1 against Intel version I.X so he can use THIS combination for his hypothetical application. He insists that I build OpenMPI O.1 with some version of Intel compilers I.Y but then at run time he would like to use *different* Intel run time libs at will I.Z <> I.X. Can you provide me with a suggestion for a sane solution to this ? :-) Best regards Michael Guessing at what is meant here, if you build MPI with a given version of Intel compilers, it ought to work when the application is built with a similar or more recent Intel compiler, or when the run-time LD_LIBRARY_PATH refers to a similar or newer library (within reason). There are similar constraints on glibc version. "Within reason" works over a more restricted range when C++ is involved. Note that the Intel linux compilers link to the gcc and glibc libraries as well as those which come with the compiler, and the MPI could be built with a combination of gcc and ifort to work with icc or gcc and ifort. gfortran and ifort libraries, however, are incompatible, except that libgomp calls can be supported by libiomp5. The "rational" use I can see is that an application programmer would likely wish to test a range of compilers without rebuilding MPI. Intel documentation says there is forward compatibility testing of libraries, at least to the extent that a build made with 10.1 would work with 11.1 libraries. The most recent Intel library compatibility break was between MKL 9 and 10. Dear Tim, I offered to provide myself the combination of OMPI+ Intel compilers so that application can use it in stable fashion. When I inquired about this application so I can look into this I was told that "there is NO application yet (!) that fails but just in case it fails ..." I was asked to hack into the OMPI building process to let OMPI use one run-time but then the MPI application using this OMPI ... use another! Thanks for the information on this. We indeed use Intel Compiler set 11.1.XXX + OMPI 1.4.1 and 1.4.2. The basic motive in this hypothetical situation is to build the MPI application ONCE and then swap run-time libs as newer compilers come out I am certain that even if one can get away with it with nearby run-time versions there is no guarantee of the stability at-infinitum. I end up having to spent more time for technically "awkward" requests than the reasonable ones. Reminds me when I was a teacher I had to spent more time with all the people trying to avoid doing the work than with the good students... hmmm :-) According to my understanding, your application (or MPI) built with an Intel 11.1 compiler should continue working with future Intel 11.1 and 12.x libraries. I don't expect Intel to test or support this compatibility beyond that. You will likely want to upgrade your OpenMPI earlier than the time when Intel compiler changes require a new MPI build. If the interest is in getting performance benefits of future hardware simply by installing new dynamic libraries without rebuilding an application, Intel MKL is the most likely favorable scenario. The MKL with optimizations for AVX is already in beta test, and should work as a direct replacement for the MKL in current releases. -- Tim Prince
Re: [OMPI users] OpenMPI Run-Time "Freedom" Question
On 8/12/2010 3:27 PM, Ralph Castain wrote: Ick - talk about confusing! I suppose there must be -some- rational reason why someone would want to do this, but I can't imagine what it would be I'm no expert on compiler vs lib confusion, but some of my own experience would say that this is a bad idea regardless of whether or not OMPI is involved. Compiler version interoperability is usually questionable, depending upon how far apart the rev levels are. Only answer I can offer is that you would have to try it. It will undoubtedly be a case-by-case basis: some combinations might work, others might fail. On Aug 12, 2010, at 3:53 PM, Michael E. Thomadakis wrote: Hello OpenMPI, we have deployed OpenMPI 1.4.1 and 1.4.2 on our Intel Nehalem cluster using Intel compilers V 11.1.059 and 11.1.072 respectively, and one user has the following request: Can we build OpenMPI version say O.1 against Intel compilers version say I.1 but then built an application with OpenMPI O.1 BUT then use a DIFFERENT Intel compiler version say I.2 to built and run this MPI application? I suggested to him to 1) simply try to built and run the application with O.1 but use Intel compilers version I.X whatever this X is and see if it has any issues. OR 2) If the above does not work, I would build OpenMPI O.1 against Intel version I.X so he can use THIS combination for his hypothetical application. He insists that I build OpenMPI O.1 with some version of Intel compilers I.Y but then at run time he would like to use *different* Intel run time libs at will I.Z <> I.X. Can you provide me with a suggestion for a sane solution to this ? :-) Best regards Michael Guessing at what is meant here, if you build MPI with a given version of Intel compilers, it ought to work when the application is built with a similar or more recent Intel compiler, or when the run-time LD_LIBRARY_PATH refers to a similar or newer library (within reason). There are similar constraints on glibc version. "Within reason" works over a more restricted range when C++ is involved. Note that the Intel linux compilers link to the gcc and glibc libraries as well as those which come with the compiler, and the MPI could be built with a combination of gcc and ifort to work with icc or gcc and ifort. gfortran and ifort libraries, however, are incompatible, except that libgomp calls can be supported by libiomp5. The "rational" use I can see is that an application programmer would likely wish to test a range of compilers without rebuilding MPI. Intel documentation says there is forward compatibility testing of libraries, at least to the extent that a build made with 10.1 would work with 11.1 libraries. The most recent Intel library compatibility break was between MKL 9 and 10. -- Tim Prince
Re: [OMPI users] Help on the big picture..
On 7/22/2010 4:11 PM, Gus Correa wrote: Hi Cristobal Cristobal Navarro wrote: yes, i was aware of the big difference hehe. now that openMP and openMPI is in talk, i've alwyas wondered if its a good idea to model a solution on the following way, using both openMP and openMPI. suppose you have n nodes, each node has a quadcore, (so you have n*4 processors) launch n proceses acorrding to the n nodes available. set a resource manager like SGE to fill the n*4 slots using round robin. on each process, make use of the other cores available on the node, with openMP. if this is possible, then on each one could make use fo the shared memory model locally at each node, evading unnecesary I/O through the nwetwork, what do you think? Before asking what we think about this, please check the many references posted on this subject over the last decade. Then refine your question to what you are interested in hearing about; evidently you have no interest in much of this topic. Yes, it is possible, and many of the atmosphere/oceans/climate codes that we run is written with this capability. In other areas of science and engineering this is probably the case too. However, this is not necessarily better/faster/simpler than dedicate all the cores to MPI processes. In my view, this is due to: 1) OpenMP has a different scope than MPI, and to some extent is limited by more stringent requirements than MPI; 2) Most modern MPI implementations (and OpenMPI is an example) use shared memory mechanisms to communicate between processes that reside in a single physical node/computer; The shared memory communication of several MPI implementations does greatly improve efficiency of message passing among ranks assigned to the same node. However, these ranks also communicate with ranks on other nodes, so there is a large potential advantage for hybrid MPI/OpenMP as the number of cores in use increases. If you aren't interested in running on more than 8 nodes or so, perhaps you won't care about this. 3) Writing hybrid code with MPI and OpenMP requires more effort, and much care so as not to let the two forms of parallelism step on each other's toes. The MPI standard specifies the use of MPI_init_thread to indicate which combination of MPI and threading you intend to use, and to inquire whether that model is supported by the active MPI. In the case where there is only 1 MPI process per node (possibly using several cores via OpenMP threading) there is no requirement for special affinity support. If there is more than 1 FUNNELED rank per multiple CPU node, it becomes important to maintain cache locality for each rank. OpenMP operates mostly through compiler directives/pragmas interspersed on the code. For instance, you can parallelize inner loops in no time, granted that there are no data dependencies across the commands within the loop. All it takes is to write one or two directive/pragma lines. More than loop parallelization can be done with OpenMP, of course, although not as much as can be done with MPI. Still, with OpenMP, you are restricted to work in a shared memory environment. By contrast, MPI requires more effort to program, but it takes advantage of shared memory and networked environments (and perhaps extended grids too). snipped tons of stuff rather than attempt to reconcile top postings -- Tim Prince
Re: [OMPI users] is loop unrolling safe for MPI logic?
On 7/18/2010 9:09 AM, Anton Shterenlikht wrote: On Sat, Jul 17, 2010 at 09:14:11AM -0700, Eugene Loh wrote: Jeff Squyres wrote: On Jul 17, 2010, at 4:22 AM, Anton Shterenlikht wrote: Is loop vectorisation/unrolling safe for MPI logic? I presume it is, but are there situations where loop vectorisation could e.g. violate the order of execution of MPI calls? I *assume* that the intel compiler will not unroll loops that contain MPI function calls. That's obviously an assumption, but I would think that unless you put some pragmas in there that tell the compiler that it's safe to unroll, the compiler will be somewhat conservative about what it automatically unrolls. More generally, a Fortran compiler that optimizes aggressively could "break" MPI code. http://www.mpi-forum.org/docs/mpi-20-html/node236.htm#Node241 That said, you may not need to worry about this in your particular case. This is a very important point, many thanks Eugene. Fortran MPI programmer definitely needs to pay attention to this. MPI-2.2 provides a slightly updated version of this guide: http://www.mpi-forum.org/docs/mpi22-report/node343.htm#Node348 many thanks anton From the point of view of the compiler developers, auto-vectorization and unrolling are distinct questions. An MPI or other non-inlined function call would not be subject to vectorization. While auto-vectorization or unrolling may expose latent bugs, MPI is not particularly likely to make them worse. You have made some misleading statements about vectorization along the way, but these aren't likely to relate to MPI problems. Upon my return, I will be working on a case which was developed and tested succeessfully under ifort 10.1 and other compilers, which is failing under current ifort versions. Current Intel MPI throws a run time error indicating that the receive buffer has been lost; the openmpi failure is more obscure. I will have to change the code to use distinct tags for each MPI send/receive pair in order to track it down. I'm not counting on that magically making the bug go away. ifort is not particularly aggressive about unrolling loops which contain MPI calls, but I agree that must be considered. -- Tim Prince
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
On 5/9/2010 8:45 PM, Terry Frankcombe wrote: I don't know what Jeff meant by that, but we haven't seen a feasible way of disabling HT without rebooting and using the BIOS options. According to this page: http://dag.wieers.com/blog/is-hyper-threading-enabled-on-a-linux-system in RHEL5/CentOS-5 it's easy to switch it on and off on the fly. ___ That's the same as Jeff explained. It requires root privilege, and affects all users. -- Tim Prince
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
On 5/6/2010 10:30 PM, John Hearns wrote: On 7 May 2010 03:17, Jeff Squyres<jsquy...@cisco.com> wrote: Indeed. I have seen some people have HT enabled in the bios just so that they can have the software option of turning them off via linux -- then you can run with HT and without it and see what it does to your specific codes. I may have missed this on the thread, but how do you do that? The Nehalem systems I have came delivered with HT enabled in the BIOS - I know it is not a real pain to reboot and configure, but it would be a lot easir to leave it on and switch off in software - also if you wanted to do back-to-back testing of performance with/without HT. ___ I don't know what Jeff meant by that, but we haven't seen a feasible way of disabling HT without rebooting and using the BIOS options. It is feasible to place 1 MPI process or thread per core. With careful affinity, performance when using 1 logical per core normally is practically the same as with HT disabled. -- Tim Prince
Re: [OMPI users] Fortran support on Windows Open-MPI
On 5/6/2010 9:07 PM, Trent Creekmore wrote: Compaq Visual Fortan for Windows was out, but HP aquired Compaq. HP, later deciding they did not want it, along with the Alpha processor techonology, sold them to Intel. So now it's Intel Visual Fortran Compiler for Windows. In addition, if you don't want that package, instead they do sell a plug-in for Microsoft Visual Studio. There is also a HPC/Parallel enviroment too for Visual Studio, but none of these are cheap. I don't see why you can't include Open MPI libraries in that enviroment. Trent -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Damien Sent: Thursday, May 06, 2010 10:53 PM To: us...@open-mpi.org Subject: [OMPI users] Fortran support on Windows Open-MPI Hi all, Can anyone tell me what the plans are for Fortran 90 support on Windows, with say the Intel compilers? I need to get MUMPS built and running using Open-MPI, with Visual Studio and Intel 11.1. I know Fortran isn't part of the regular CMake build for Windows. If someone's working on this I'm happy to test or help out. Damien ___ I'm not certain whether the top-post is intended as a reply to the original post, but I feel I must protest efforts to add confusion. Looking at the instructions for building on Windows, it appears that several routes have been taken with reported success, not including commercial Fortran. It seems it should not be a major task to include gfortran in the cygwin build. HP never transferred ownership of Compaq Fortran, not that it's relevant to the discussion. The most popular open source MPI for commercial Windows Fortran has been Argonne MPICH2, which offers a pre-built version compatible with Intel Fortran. Intel also offers MPI, derived originally from Argonne MPICH2, for both Windows and linux. I can't imagine OpenMPI libraries being added to the Microsoft HPC environment; maybe that's not exactly what the top poster meant. -- Tim Prince
Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS
On 4/26/2010 2:31 AM, Asad Ali wrote: On Mon, Apr 26, 2010 at 8:01 PM, Ashley Pittman <ash...@pittman.co.uk <mailto:ash...@pittman.co.uk>> wrote: On 25 Apr 2010, at 22:27, Asad Ali wrote: > Yes I use different machines such as > > machine 1 uses AMD Opterons. (Fedora) > > machine 2 and 3 use Intel Xeons. (CentOS) > > machine 4 uses slightly older Intel Xeons. (Debian) > > Only machine 1 gives correct results. While CentOS and Debian results are same but are wrong and different from those of machine 1. Have you verified the are actually wrong or are they just different? It's actually perfectly possible for the same program to get different results from run to run even on the same hardware and the same OS. All floating point operations by the MPI library are expected to be deterministic but changing the process layout or and MPI settings can affect this and of course anything the application does can introduce differences as well. Ashley. The code is the same with the same input/output and the same constants etc. From run to run the results can only be different if you either use different input/output or use different random number seeds. Here in my case the random number seeds are the same as well. This means that this code must give (and it does) the same results no matter how many times you run it. I didn't tamper with mpi-settings for any run. I have verified that results of only Fedora are correct because I know what is in my data and how should my model behave and I get a nearly perfect convergence on Fedora OS. Even my dual core laptop with Ubuntu 9.10 also gives correct results. The other OSs give the same results for a few hundred iterations as Fedora but then an unusual thing happens and the results start getting wrong. If you're really interested in solving your "problem," you'll have to consider important details such as which compiler was used, which options (e.g. 387 vs. sse), run-time setting of x87 or SSE control registers, 32- vs. 64-bit compilation. SSE2 is the default for 64-bit compilation, but compilers vary on defaults for 32-bit. If your program depends on x87 extra precision of doubles, or efficient mixing of double and long double, 387 code may be a better choice, but limits your efficiency. -- Tim Prince
Re: [OMPI users] OpenMPI multithreaded performance
On 4/7/2010 1:20 AM, Piero Lanucara wrote: Dear OpenMPI team hiw much performances we should expect using MPI multithread capability (MPI_init_thread in multiple format). It seems that no performance exist using some simple test like multiple mpi channel activated, overlapping comm and computation and so on Maybe I don't understand your question. Are you saying that none of the references found by search terms such as "hybrid mpi openmp" are useful for you? They cover so many topics, you would have to be much more specific about which topics you want in more detail. -- Tim Prince
Re: [OMPI users] OpenMPI/NAG Fortran: Missing libf52.so.1
On 3/16/2010 11:22 PM, Vedran Coralic wrote: Now, I think I know what the problem is. Basically, the NAG Fortran compiler and its libraries are only available on the master node so that the remaining nodes cannot access/find the required files. From my understanding, the only way to fix this would be put to copy the NAG Fortran compiler to all of the nodes in the cluster. Don't NAG provide static copies of their libraries? Yes, if you link the dynamic libraries, you must make them visible on each node, with the path set in LD_LIBRARY_PATH. On such a small cluster, (or with a fast shared file system), a usual way is to put them in a directory mounted across all nodes. Since you talk about a "work-around," you can copy the library folder to your own file system for each node, to check that you've got the hang of it. The LD_LIBRARY_PATH setting can be done in your user settings so it doesn't affect anyone else. -- Tim Prince
Re: [OMPI users] mpirun only works when -np <4
Gus Correa wrote: Hi Matthew 5) Are you setting processor affinity on mpiexec? mpiexec -mca mpi_paffinity_alone 1 -np ... bla, bla ... Good point. This option optimizes processor affinity on the assumption that no other jobs are running. If you ran 2 MPI jobs with this option, they would attempt to use the same logical processors, rather than spreading the work effectively. I have doubts whether the mpi_affinity could be relied upon with HyperThreading enabled; it would work OK if it understood how to avoid multiple processes on the same core. If you don't find an option inside openmpi to specify which logicals your jobs should use, you could do it by mpiexec -np 4 taskset... taking care to use a different core for each process (also different between jobs running together). You would have to check on your machine whether the taskset options would be such as -c 0,2,4,6 for separate cores on one package and -c 8,10,12,14 for the other, or some other scheme. /proc/cpuinfo would give valuable clues, even more /usr/sbin/irqbalance -debug (or wherever it lives on your system). Without affinity setting, you could also run into problems when running out of individual cores and forcing some pairs of processes to run (quite slowly) on single cores, while others run full speed on other cores.
Re: [OMPI users] MPI Processes and Auto Vectorization
amjad ali wrote: Hi, thanks T.Prince, Your saying: "I'll just mention that we are well into the era of 3 levels of programming parallelization: vectorization, threaded parallel (e.g. OpenMP), and process parallel (e.g. MPI)." is a really great new learning for me. Now I can perceive better. Can you please explain a bit about: " This application gains significant benefit from cache blocking, so vectorization has more opportunity to gain than for applications which have less memory locality." So now should I conclude from your reply that if we have single core processor in a PC, even than we can get benefit of Auto-Vectorization? And we do not need free cores for getting benefit of auto-vectorization? Thank you very much. Yes, we were using auto-vectorization from before the beginnings of MPI back in the days of single core CPUs; in fact, it would often show a greater gain than it did on later multi-core CPUs. The reason for greater effectiveness of auto-vectorization with cache blocking and possibly with single core CPUs would be less saturation of memory buss.
Re: [OMPI users] MPI Processes and Auto Vectorization
amjad ali wrote: Hi, Suppose we run a parallel MPI code with 64 processes on a cluster, say of 16 nodes. The cluster nodes has multicore CPU say 4 cores on each node. Now all the 64 cores on the cluster running a process. Program is SPMD, means all processes has the same workload. Now if we had done auto-vectorization while compiling the code (for example with Intel compilers); Will there be any benefit (efficiency/scalability improvement) of having code with the auto-vectorization? Or we will get the same performance as without Auto-vectorization in this example case? MEANS THAT if we do not have free cpu cores in a PC or cluster (all cores are running MPI processes), still the auto-vertorization is beneficial? Or it is beneficial only if we have some free cpu cores locally? How can we really get benefit in performance improvement with Auto-Vectorization? Auto-vectorization should give similar performance benefit under MPI as it does in a single process. That's about all that can be said when you say nothing about the nature of your application. This assumes that your MPI domain decomposition, which may not be highly vectorizable, doesn't take up too large a fraction of elapsed time. By the same token, auto-vectorization techniques aren't specific to MPI applications, so an in-depth treatment isn't topical here. I'll just mention that we are well into the era of 3 levels of programming parallelization: vectorization, threaded parallel (e.g. OpenMP), and process parallel (e.g. MPI). For an application which I work on, 8 nodes with auto-vectorization give about the performance of 12 nodes without, so compilers without auto-vectorization capability for such applications fell by the wayside a decade ago. This application gains significant benefit from cache blocking, so vectorization has more opportunity to gain than for applications which have less memory locality. I have not seen an application which was effectively vectorized which also gained from HyperThreading, but the gain for vectorization should be significantly greater than could be gained from HyperThreading. It's also common that vectorization gains more on lower clock speed/cheaper CPU models (of the same architecture), enabling lower cost of purchase or power consumption, but that's true of all forms of parallelization. Some applications can be vectorized effectively by any of the popular auto-vectorizing compilers, including recent gnu compilers, while others show much more gain with certain compilers, such as Intel, PGI, or Open64.