Re: [OMPI devel] Fwd: Open MPI 1.8: link problem when Fortran+C+Platform LSF
Jeff, the output of "mpicc --showme" is attached below. > Do you really need to add "-lbat -llsf" to the command line to make it work? As both 1.6.5 and 1.8.3 versions are build for work with Platform LSF, yes, we need libbat and liblsf. The 1.6.5 version links this library explicitly in the link line. The 1.8.3 does not. ### 1.6.5: icc -I/opt/MPI/openmpi-1.6.5/linux/intel/include/openmpi/opal/mca/hwloc/hwloc132/hwloc/include -I/opt/MPI/openmpi-1.6.5/linux/intel/include -I/opt/MPI/openmpi-1.6.5/linux/intel/include/openmpi -fexceptions -pthread -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -L/opt/MPI/openmpi-1.6.5/linux/intel/lib -lmpi -losmcomp -lrdmacm -libverbs -lrt -lnsl -lutil -lpsm_infinipath -lbat -llsf -ldl -lm -lnuma -lrt -lnsl -lutil ### 1.8.3: icc -I/opt/MPI/openmpi-1.8.3/linux/intel/include/openmpi/opal/mca/hwloc/hwloc172/hwloc/include -I/opt/MPI/openmpi-1.8.3/linux/intel/include/openmpi/opal/mca/event/libevent2021/libevent -I/opt/MPI/openmpi-1.8.3/linux/intel/include/openmpi/opal/mca/event/libevent2021/libevent/include -I/opt/MPI/openmpi-1.8.3/linux/intel/include -I/opt/MPI/openmpi-1.8.3/linux/intel/include/openmpi -fexceptions -pthread -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath -Wl,/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath -Wl,/opt/MPI/openmpi-1.8.3/linux/intel/lib -Wl,--enable-new-dtags -L/opt/MPI/openmpi-1.8.3/linux/intel/lib -lmpi On 10/18/14 01:56, Jeff Squyres (jsquyres) wrote: I think the LSF part of this may be a red herring. Do you really need to add "-lbat -llsf" to the command line to make it work? The error message *sounds* like y.tab.o was compiled differently than others...? It's hard to know without seeing the output of mpicc --showme. On Oct 17, 2014, at 7:51 AM, Ralph Castain <r...@open-mpi.org> wrote: Forwarding this for Paul until his email address gets updated on the User list: Begin forwarded message: Date: October 17, 2014 at 6:35:31 AM PDT From: Paul Kapinos <kapi...@itc.rwth-aachen.de> To: Open MPI Users <us...@open-mpi.org> Cc: "Kapinos, Paul" <kapi...@itc.rwth-aachen.de>, <fri...@cats.rwth-aachen.de> Subject: Open MPI 1.8: link problem when Fortran+C+Platform LSF Dear Open MPI developer, we have both Open MPI 1.6(.5) and 1.8(.3) in our cluster, configured to be used with Platform LSF. One of our users run into an issue when trying to link his code (combination of lex/C and Fortran) with v.1.8, whereby with OpenMPI/1.6er the code can be linked OK. $ make mpif90 -c main.f90 yacc -d example4.y mpicc -c y.tab.c mpicc -c mymain.c lex example4.l mpicc -c lex.yy.c mpif90 -o example main.o y.tab.o mymain.o lex.yy.o ld: y.tab.o(.text+0xd9): unresolvable R_X86_64_PC32 relocation against symbol `yylval' ld: y.tab.o(.text+0x16f): unresolvable R_X86_64_PC32 relocation against symbol `yyval' ... looking into "mpif90 --show-me" let us see that the link line and possibly the philosophy behind it has been changed, there is also a note on it: # Note that per https://svn.open-mpi.org/trac/ompi/ticket/3422, we # intentionally only link in the MPI libraries (ORTE, OPAL, etc. are # pulled in implicitly) because we intend MPI applications to only use # the MPI API. Well, by now we know two workarounds: a) add "-lbat -llsf" to the link line b) add " -Wl,--as-needed" to the link line What would be better? Maybe one of this should be added to linker_flags=..." in the .../share/openmpi/mpif90-wrapper-data.txt file? As of the note above, (b) would be better? Best Paul Kapinos P.S. $ mpif90 --show-me 1.6.5 ifort -nofor-main -I/opt/MPI/openmpi-1.6.5/linux/intel/include -fexceptions -I/opt/MPI/openmpi-1.6.5/linux/intel/lib -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -L/opt/MPI/openmpi-1.6.5/linux/intel/lib -lmpi_f90 -lmpi_f77 -lmpi -losmcomp -lrdmacm -libverbs -lrt -lnsl -lutil -lpsm_infinipath -lbat -llsf -ldl -lm -lnuma -lrt -lnsl -lutil 1.8.3 ifort -I/opt/MPI/openmpi-1.8.3/linux/intel/include -fexceptions -I/opt/MPI/openmpi-1.8.3/linux/intel/lib -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath -Wl,/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath -Wl,/opt/MPI/openmpi-1.8.3/linux/intel/lib -Wl,--enable-new-dtags -L/opt/MPI/openmpi-1.8.3/linux/intel/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi P.S.2 $ man ld --as-needed --no-as-needed This option affects ELF DT_NEEDED tags for dynamic libraries mentioned on the command line after the --as-needed option. Normally the linker will add a DT_NEEDED tag for each dynamic library mentioned on the command line, regardless of whether the library is actually needed or not. --as-needed causes a DT_NEEDED tag to only be emitted for a library that satisfies an undefined symbol reference from a regula
[OMPI devel] ROMIO+Lustre problems in OpenMPI 1.8.3
Dear Open MPI and ROMIO developer, We use Open MPI v.1.6.x and 1.8.x in our cluster. We have Lustre file system; we wish to use MPI_IO. So the OpenMPI's are compiled with this flag: > --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' In our newest installation openmpi/1.8.3 we found that MPI_IO is *broken*. Short seek for root of the evil bring the following to light: - the ROMIO component 'MCA io: romio' isn't here at all in the affected version, because - configure of ROMIO has *failed* (cf. logs (a,b,c). - because lustre_user.h was found but could not be compiled. In our system, there are two lustre_user.h available: $ locate lustre_user.h /usr/include/linux/lustre_user.h /usr/include/lustre/lustre_user.h As I'm not very convinient with lustre, I just attach both of them. pk224850@cluster:~[509]$ uname -a Linux cluster.rz.RWTH-Aachen.DE 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 13:45:55 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux pk224850@cluster:~[510]$ cat /etc/issue Scientific Linux release 6.5 (Carbon) Note that openmpi/1.8.1 seem to be fully OK (MPI_IO works) in our environment. Best Paul Kapinos P.S. Is there a confugure flag, which will enforce ROMIO? That is when ROMIO not available, configure would fail. This would make such hidden errors publique at installation time.. a) Log in Open MPI's config.log: -- configure:226781: OMPI configuring in ompi/mca/io/romio/romio configure:226866: running /bin/sh './configure' --with-file-system=testfs+ufs+nfs+lustre FROM_OMPI=yes CC="icc -std=c99" CFLAGS="-DNDEBUG -O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -finline-functions -fno-strict-aliasing -restrict -fexceptions -Qoption,cpp,--extended_float_types -pthread" CPPFLAGS=" -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/hwloc/hwloc172/hwloc/include -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent/include" FFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 " LDFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -fexceptions " --enable-shared --disable-static --with-file-system=testfs+ufs+nfs+lustre --prefix=/opt/MPI/openmpi-1.8.3/linux/intel --disable-aio --cache-file=/dev/null --srcdir=. --disable-option-checking configure:226876: /bin/sh './configure' *failed* for ompi/mca/io/romio/romio configure:226911: WARNING: ROMIO distribution did not configure successfully configure:227425: checking if MCA component io:romio can compile configure:227427: result: no -- b) dump of Open MPI's 'configure' output to the console: -- checking lustre/lustre_user.h usability... no checking lustre/lustre_user.h presence... yes configure: WARNING: lustre/lustre_user.h: present but cannot be compiled configure: WARNING: lustre/lustre_user.h: check for missing prerequisite headers? configure: WARNING: lustre/lustre_user.h: see the Autoconf documentation configure: WARNING: lustre/lustre_user.h: section "Present But Cannot Be Compiled" configure: WARNING: lustre/lustre_user.h: proceeding with the compiler's result configure: WARNING: ## ## configure: WARNING: ## Report this to disc...@mpich.org ## configure: WARNING: ## ## checking for lustre/lustre_user.h... no configure: error: LUSTRE support requested but cannot find lustre/lustre_user.h header file configure: /bin/sh './configure' *failed* for ompi/mca/io/romio/romio configure: WARNING: ROMIO distribution did not configure successfully checking if MCA component io:romio can compile... no -- c) ompi/mca/io/romio/romio's config.log: -- configure:20962: checking lustre/lustre_user.h usability configure:20962: icc -std=c99 -c -DNDEBUG -O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -finline-functions -fno-strict-aliasing -restrict -fexceptions -Qoption,cpp,--extended_float_types -pthread -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/hwloc/hwloc172/hwloc/include -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent/include conftest.c >&5 /usr/include/sys/quota.h(221): error: identifier "caddr_t" is undefined caddr_t __addr) __THROW; ^ compilation aborted for conftest.c
Re: [OMPI devel] ROMIO+Lustre problems in OpenMPI 1.8.3
(2nd try as the first log package was too big) Hello Howard, The version 1.8.1 installed on Jun 27 this year run fine, ROMIO is OK. Trying ro re-run using the same install script: found out that also 1.8.1 version of Open MPI now *cannot* build ROMIO support. Wow. That means that the regression is not/not only in the OpenMPI's ROMIO, but depends on our Linux/Kernel/Lustre. Very first look: we've new kernel and new /usr/include/sys/quota.h and we probably update from SL6.4 to SL6.5. - which information about our Linux/System do you need? - Interest/Need in getting a guest login to get in-deepth feeling? Best Paul Kapinos Attached: some logs from Instalation at 27.05 and today't try, and quota.h (changed at 29.09). Note that also the kernel changed (and maybe the Scientific Linux version from 6.4 to 6.5?) pk224850@cluster:~[502]$ ls -la /usr/include/sys/quota.h -rw-r--r-- 1 root root 7903 Aug 29 21:11 /usr/include/sys/quota.h pk224850@cluster:~[503]$ uname -a Linux cluster.rz.RWTH-Aachen.DE 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 13:45:55 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux pk224850@cluster:~[504]$ cat /etc/issue Scientific Linux release 6.5 (Carbon) Kernel \r on an \m On 10/29/14 19:06, Howard Pritchard wrote: Hi Paul, Thanks for the forward. I've opened an issue #255 <https://github.com/open-mpi/ompi/issues/255> to track the ROMIO config regression. Just to make sure, older releases of the 1.8 branch still configure and build properly with your current lustre setup? Thanks, Howard 2014-10-28 5:00 GMT-06:00 Paul Kapinos <kapi...@itc.rwth-aachen.de <mailto:kapi...@itc.rwth-aachen.de>>: Dear Open MPI and ROMIO developer, We use Open MPI v.1.6.x and 1.8.x in our cluster. We have Lustre file system; we wish to use MPI_IO. So the OpenMPI's are compiled with this flag: > --with-io-romio-flags='--with-__file-system=testfs+ufs+nfs+__lustre' In our newest installation openmpi/1.8.3 we found that MPI_IO is *broken*. Short seek for root of the evil bring the following to light: - the ROMIO component 'MCA io: romio' isn't here at all in the affected version, because - configure of ROMIO has *failed* (cf. logs (a,b,c). - because lustre_user.h was found but could not be compiled. In our system, there are two lustre_user.h available: $ locate lustre_user.h /usr/include/linux/lustre___user.h /usr/include/lustre/lustre___user.h As I'm not very convinient with lustre, I just attach both of them. pk224850@cluster:~[509]$ uname -a Linux cluster.rz.RWTH-Aachen.DE <http://cluster.rz.RWTH-Aachen.DE> 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 13:45:55 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux pk224850@cluster:~[510]$ cat /etc/issue Scientific Linux release 6.5 (Carbon) Note that openmpi/1.8.1 seem to be fully OK (MPI_IO works) in our environment. Best Paul Kapinos P.S. Is there a confugure flag, which will enforce ROMIO? That is when ROMIO not available, configure would fail. This would make such hidden errors publique at installation time.. a) Log in Open MPI's config.log: --__--__-- configure:226781: OMPI configuring in ompi/mca/io/romio/romio configure:226866: running /bin/sh './configure' --with-file-system=testfs+ufs+__nfs+lustre FROM_OMPI=yes CC="icc -std=c99" CFLAGS="-DNDEBUG -O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -finline-functions -fno-strict-aliasing -restrict -fexceptions -Qoption,cpp,--extended_float___types -pthread" CPPFLAGS=" -I/w0/tmp/pk224850/linuxc2___9713/openmpi-1.8.3_linux64___intel/opal/mca/hwloc/hwloc172/__hwloc/include -I/w0/tmp/pk224850/linuxc2___9713/openmpi-1.8.3_linux64___intel/opal/mca/event/__libevent2021/libevent -I/w0/tmp/pk224850/linuxc2___9713/openmpi-1.8.3_linux64___intel/opal/mca/event/__libevent2021/libevent/include" FFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 " LDFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -fexceptions " --enable-shared --disable-static --with-file-system=testfs+ufs+__nfs+lustre --prefix=/opt/MPI/openmpi-1.8.__3/linux/intel --disable-aio --cache-file=/dev/null --srcdir=. --disable-option-checking configure:226876: /bin/sh './configure' *failed* for ompi/mca/io/romio/romio configure:226911: WARNING: ROMIO distribution did not configure successfully configure:227425: checking if MCA component io:romio can compile configure:227427: result: no --__--__-- b) dump of Open MPI's 'configure' output to the console: --__--__-- checking lustre/lustre_user.h usability... no
Re: [OMPI devel] still supporting pgi?
Jeff, PGI compiler(s) are available on our Cluster: $ module avail pgi there are a lot of older versions, too: $ module load DEPRECATED $ module avail pgi best Paul P.S. in our standard environmet, Intel compieler and Open MPI are active, so $ module unload openmpi intel $ module load pgi P.S. We also have Sun/Oracle Studio: $ module avail studio On 12/11/14 19:45, Jeff Squyres (jsquyres) wrote: Ok. FWIW: I test with gcc and the intel compiler suite. I do not have a PGI license to test with. -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] Open MPI 1.8: link problem when Fortran+C+Platform LSF
Jeff and All, Belated Merry Christmas and a Happy New Year! And now we can be back to business :) I did some additional tests with versions installed on our cluster (1.8.3 ++ 1.8.4) and I *believe* I can confirm, that the problem seem to be not rooted on LSF support and yessir, adding "-lbat -llsf" is not a solution but a weird "workaround" which is not a real true workaround(*), as you wrote. Back to error description: a) the problem only arise in 1.8.x series when configured with these flags: > --disable-dlopen --disable-mca-dso We add these flags since early 2012 in order to minimise the NFS activity at start-up time. In the 1.6.x versions we *probably* do not see the error due to (*) - yessir, 'libbat.so' and 'liblsf.so' contain all the symbols missed and as these two libs are linked in by default prior 1.8.x, there is the mess you described below (symbols from libbat.so and liblsf.so *probably* used instead of symbols in the code). b) the problem vanish when > --as-needed command is passed to the linker: $ mpif90 -o example main.o y.tab.o mymain.o lex.yy.o -Wl,--as-needed c) yes it seem to be a general linkage issue: the behaviour of Intel compiler is the same as of GCC and studio compilers. a bit logs: version 1.8.3, configured with "--disable-dlopen --disable-mca-dso" $ mpif90 -o example main.o y.tab.o mymain.o lex.yy.o -showme ifort -o example main.o y.tab.o mymain.o lex.yy.o -I/opt/MPI/openmpi-1.8.3/linux/intel/include -fexceptions -I/opt/MPI/openmpi-1.8.3/linux/intel/lib -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath -Wl,/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath -Wl,/opt/MPI/openmpi-1.8.3/linux/intel/lib -Wl,--enable-new-dtags -L/opt/MPI/openmpi-1.8.3/linux/intel/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi (===> error) (===> add -Wl,--as-needed) $ mpif90 -o example main.o y.tab.o mymain.o lex.yy.o -Wl,--as-needed -showme ifort -o example main.o y.tab.o mymain.o lex.yy.o -Wl,--as-needed -I/opt/MPI/openmpi-1.8.3/linux/intel/include -fexceptions -I/opt/MPI/openmpi-1.8.3/linux/intel/lib -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath -Wl,/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath -Wl,/opt/MPI/openmpi-1.8.3/linux/intel/lib -Wl,--enable-new-dtags -L/opt/MPI/openmpi-1.8.3/linux/intel/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi (===> OK) (===> try to remove the LSF linking stuff at all:) ifort -o example main.o y.tab.o mymain.o lex.yy.o -Wl,--as-needed -I/opt/MPI/openmpi-1.8.3/linux/intel/include -fexceptions -I/opt/MPI/openmpi-1.8.3/linux/intel/lib -Wl,-rpath -Wl,/opt/MPI/openmpi-1.8.3/linux/intel/lib -Wl,--enable-new-dtags -L/opt/MPI/openmpi-1.8.3/linux/intel/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi (===> OK) (the same line as above but without -Wl,--as-needed ===> error). now the fun fact: omitting all the Open MPI part, > ifort -o example main.o y.tab.o mymain.o lex.yy.o lead to OK linking (the compiled app is not an MPI app). Recap: 1) - the error is related to configure with '--disable-dlopen --disable-mca-dso' 2) - the error vanishes when added '-Wl,--as-needed' to the link line 3) - the error is not special to any compiler or version 4) - the error is not related to LSF but linking with these libs just shut down it due to some symbols mess Well, I'm not really sure that (2) is the true workaround, or just starts some more library deep search and binds to LSF libs linked in somewhere in the bush. Could someone with moar xperience in linking libs and especially Open MPI take a look at this? (sorry for pushing this, but all this smells for me being an general linking problem rooted somewhere in Open MPI and '--disable-dlopen', see "fun fact" above) best Paul Kapinos P.S. Never used "-fPIC" here On 12/01/14 20:48, Jeff Squyres (jsquyres) wrote: Paul -- Sorry for the delay -- SC and the US Thanksgiving holiday last week got in the way of responding to this properly. I talked with Dave Goodell about this issue a bunch today. Going back to the original email in this thread (http://www.open-mpi.org/community/lists/devel/2014/10/16064.php), it seems like this is the original problem: $ make mpif90 -c main.f90 yacc -d example4.y mpicc -c y.tab.c mpicc -c mymain.c lex example4.l mpicc -c lex.yy.c mpif90 -o example main.o y.tab.o mymain.o lex.yy.o ld: y.tab.o(.text+0xd9): unresolvable R_X86_64_PC32 relocation against symbol `yylval' ld: y.tab.o(.text+0x16f): unresolvable R_X86_64_PC32 relocation against symbol `yyval' - You later confirmed that adding -fPIC to the compile/link lines make everything work without adding -lbat -llsf. Dave and I are sorta convinced (i.e., we could still be wrong, but we *think* this is right) that adding -lbat and -llsf to the link line is the Wrong solution. The issue seems to be that a correct/matching yylval
[OMPI devel] Bad performance (20% bandwidth loss) when compiling with GCC 5.2 instead of 4.x
Dear Open MPI developer, We're puzzled by reproducible performance (bandwidth) penalty observed when comparing measurements via InfibiBand between two nodes, OpenMPI/1.10.0 compiled with *GCC/5.2* instead of GCC 4.8 and Intel compiler. Take a look at the attached picture of two measurements of NetPIPE http://bitspjoule.org/netpipe/ benchmark done with one MPI rank per node, communicating via QDR InfiniBand (y axis: Mbps, y axis: sample number) Up to sample 64 (8195 bytes message size) the achieved performance is virtually the same; from sample 65 (12285 bytes, *less* than 12k) the version of GCC compiled using GCC 5.2 suffer form 20%+ penalty in bandwidth. The result is reproducible and independent from nodes and ever linux distribution (both Scientific Linux 6 and CentOS 7 have the same results). Both C and Fortran benchmarks offer the very same behaviour so it is *not* an f08 issue. The acchieved bandwidth is definitely IB-range (gigabytes per second), the communication is running via InfinfiBand in all cases (no failback to IP, huh). The compile line is the same; the output of ompi_info --all and --params is the very same (cf. attachments) up to added support for fortran-08 in /5.2 version. We know about existence of 'eager_limit' parameter, which is *not* changed and is 12288 in both versions (this is *less* that the first distinguishing sample). Again, for us the *only* difference is usage of other (new) GCC release. Any idea about this 20%+ bandwidth loss? Best Paul Kapinos -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 MCA btl: parameter "btl_openib_verbose" (current value: "false", data source: default, level: 9 dev/all, type: bool) Output some verbose OpenIB BTL information (0 = no output, nonzero = output) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_warn_no_device_params_found" (current value: "true", data source: default, level: 9 dev/all, type: bool, synonyms: btl_openib_warn_no_hca_params_found) Warn when no device-specific parameters are found in the INI file specified by the btl_openib_device_param_files MCA parameter (0 = do not warn; any other value = warn) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_warn_no_hca_params_found" (current value: "true", data source: default, level: 9 dev/all, type: bool, deprecated, synonym of: btl_openib_warn_no_device_params_found) Warn when no device-specific parameters are found in the INI file specified by the btl_openib_device_param_files MCA parameter (0 = do not warn; any other value = warn) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_warn_default_gid_prefix" (current value: "true", data source: default, level: 9 dev/all, type: bool) Warn when there is more than one active ports and at least one of them connected to the network with only default GID prefix configured (0 = do not warn; any other value = warn) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_warn_nonexistent_if" (current value: "true", data source: default, level: 9 dev/all, type: bool) Warn if non-existent devices and/or ports are specified in the btl_openib_if_[in|ex]clude MCA parameters (0 = do not warn; any other value = warn) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_abort_not_enough_reg_mem" (current value: "false", data source: default, level: 9 dev/all, type: bool) If there is not enough registered memory available on the system for Open MPI to function properly, Open MPI will issue a warning. If this MCA parameter is set to true, then Open MPI will also abort all MPI jobs (0 = warn, but do not abort; any other value = warn and abort) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_poll_cq_batch" (current value: "256", data source: default, level: 9 dev/all, type: unsigned) Retrieve up to poll_cq_batch completions from CQ MCA btl: parameter "btl_openib_device_param_files" (current value: "/opt/MPI/openmpi-1.10.0/linux/gcc/share/openmpi/mca-btl-openib-device-params.ini", data source: default, level: 9 dev/all, type: string, synonyms: btl_openib_hc
Re: [OMPI devel] Bad performance (20% bandwidth loss) when compiling with GCC 5.2 instead of 4.x
On 10/14/15 19:35, Jeff Squyres (jsquyres) wrote: On Oct 14, 2015, at 12:48 PM, Nathan Hjelm <hje...@lanl.gov> wrote: I think this is from a known issue. Try applying this and run again: https://github.com/open-mpi/ompi/commit/952d01db70eab4cbe11ff4557434acaa928685a4.patch The good news is that if this fixes your problem, the fix is already included in the upcoming v1.10.1 release. Indeed, that was it. Fixed! Many thanks for support! Best Paul -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
[OMPI devel] ROMIO+Lustre problems 2.0
Dear Open MPI and ROMIO developer, In short: ROMIO in actual OpenMPI versions cannot configure using old versions of Intel compiler. A while ago we reported about trouble in compiling Open MPI =>1.8.3 with ROMIO support, cf. (1), which was reported to be fixed (2). However we've found out that actual version 1.10.1 still failing in our environment to configure ROMIO subsystem *whenever using Intel compiler 11.1*, whereby the very same configure command success when using Intel compiler 14.0. Log snippet for compiling the 'conftest.c' program may be seen (4) A simple comparison of configure logs lead to observation: - intel/11.1 use -std=c99 - intel/14.0 use -std=gnu99 However, try to compile the failed conftest.c just by switching to new Intel compiler and -std=gnu99 also failed (whereby the unknown conftest.c from building the whole Sw using new compiler worked fine), following our assumption 'maybe there is something wrong with overall configure procedure using older versions of Intel compiler, or maybe ROMIO do not support old versions of Intel compilers at all'. Could someone have a look at this? The whole build and install dirs may be found at (3) (89 MB!) Best wishes Paul Kapinos 1) https://www.open-mpi.org/community/lists/devel/2014/10/16106.php 2) https://www.open-mpi.org/community/lists/devel/2014/10/16109.php https://github.com/hppritcha/ompi/commit/53fd425a6a0843a5de0a8c544901fbf01246ed31 3) https://rwth-aachen.sciebo.de/index.php/s/Ii6G4gULNZjC8CL 4) (OpenMPI's config.log) . It was created by Open MPI configure 1.10.1, which was generated by GNU Autoconf 2.69. Invocation command line was $ ./configure --with-verbs --with-lsf --with-devel-headers --enable-contrib-no-build=vt --enable-heterogeneous --enable-cxx-exceptions --enable-orterun-prefix-by-default --with-io-romio-flags=--with-file-system=testfs+ufs+nfs+lustre --enable-mpi-ext CFLAGS=-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 CXXFLAGS=-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 FFLAGS=-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 FCFLAGS=-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 LDFLAGS=-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 --prefix=/opt/MPI/openmpi-1.10.1test/linux/intel_11.1.080 --mandir=/opt/MPI/openmpi-1.10.1test/linux/intel_11.1.080/man --bindir=/opt/MPI/openmpi-1.10.1test/linux/intel_11.1.080/bin --libdir=/opt/MPI/openmpi-1.10.1test/linux/intel_11.1.080/lib --includedir=/opt/MPI/openmpi-1.10.1test/linux/intel_11.1.080/include --datarootdir=/opt/MPI/openmpi-1.10.1test/linux/intel_11.1.080/share . (ROMIO's config.log) . generated by GNU Autoconf 2.69. Invocation command line was $ ./configure --with-file-system=testfs+ufs+nfs+lustre FROM_OMPI=yes CC=icc -std=c99 CFLAGS=-DNDEBUG -O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -finline-functions -fno-strict-aliasing -restrict -fexceptions -Qoption,cpp,--extended_float_types -pthread CPPFLAGS= -I/w0/tmp/pk224850/linuxc2_6000/openmpi-1.10.1test_linux64_intel_11.1.080/opal/mca/hwloc/hwloc191/hwloc/include -I/w0/tmp/pk224850/linuxc2_6000/openmpi-1.10.1test_linux64_intel_11.1.080/opal/mca/event/libevent2021/libevent -I/w0/tmp/pk224850/linuxc2_6000/openmpi-1.10.1test_linux64_intel_11.1.080/opal/mca/event/libevent2021/libevent/include FFLAGS=-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 LDFLAGS=-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -fexceptions --enable-shared --disable-static --with-file-system=testfs+ufs+nfs+lustre --prefix=/opt/MPI/openmpi-1.10.1test/linux/intel_11.1.080 --disable-aio --cache-file=/dev/null --srcdir=. --disable-option-checking .. .. .. configure:20968: checking lustre/lustre_user.h usability configure:20968: icc -std=c99 -c -DNDEBUG -O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -finline-functions -fno-strict-aliasing -restrict -fexceptions -Qoption,cpp,--extended_float_types -pthread -I/w0/tmp/pk224850/linuxc2_6000/openmpi-1.10.1test_linux64_intel_11.1.080/opal/mca/hwloc/hwloc191/hwloc/include -I/w0/tmp/pk224850/linuxc2_6000/openmpi-1.10.1test_linux64_intel_11.1.080/opal/mca/event/libevent2021/libevent -I/w0/tmp/pk224850/linuxc2_6000/openmpi-1.10.1test_linux64_intel_11.1.080/opal/mca/event/libevent2021/libevent/include conftest.c >&5 /usr/include/sys/quota.h(221): error: identifier "caddr_t" is undefined caddr_t __addr) __THROW; ^ compilation aborted for conftest.c (code 2) configure:20968: $? = 2 configure: failed program was: .. configure:20968: result: no (conftest.c and /usr/include/sys/quota.h attached) -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 /* confdefs.h */ #define PACKAGE_NAME "ROMIO" #define PACKAGE_TARNAME "romio" #define PACKAGE_VERSION "Ope
Re: [OMPI devel] ROMIO+Lustre problems 2.0
Hello all, JFYI and for log purposes: *In short: 'caddr_t' issue is known and is addressed in new(er) ROMIO releases.* Below the (off-list) answer (snippet) from Rob Latham. On 12/08/15 13:16, Paul Kapinos wrote: In short: ROMIO in actual OpenMPI versions cannot configure using old versions of Intel compiler. > . caddr_t -- indirectly brought in via > quota.h -- has been a giant headache. MPICH has a "strict" mode which > helps with portability, but if quota.h is then less portable than ROMIO, > well, then we have problems. > > Here's some more information: > https://press3.mcs.anl.gov/romio/2015/02/26/lustre-preadpwrite-and-caddr_t/ > > I've tried having ROMIO's configure look for caddr_t and define it if not > set: I don't remember the exact problem but compilers with strict settings > would still have problems compiling quota.h A short look into the above link tells us: > If you found this page because you are facing a similar problem, please try > the latest MPICH. and again a look into one of the GIT patches on the link show me that ROMIO bundled in openmpi/1.10.1 seem to be quite old (at least definitely older that the patches). We do not know about there is any interest in supporting older Intel compilers, but we do not want to keep silent that currently openmpi/1.10.1 cannot be configured (and thus build) using intel/11.1 compiler with ROMIO+Lustre support. Best wishes Paul -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
[OMPI devel] SHMEM, "mpp/shmem.fh", CMake and infinite loops
Dear OpenMPI developer, we have some troubles when using OpenMPI and CMake on codes using 'SHMEM'. Cf. 'man shmem_swap', > Fortran: > INCLUDE "mpp/shmem.fh" Yes here is one such header file: > openmpi-1.X.Y/oshmem/include/mpp/shmem.fh ... since version 1.7. at least. The significnat content is this line: > include 'shmem.fh' whereby OpenMPI mean to include not the same file by itself (= infinite loop!) but I believe these one file: > openmpi-1.X.Y/oshmem/include/shmem.fh (The above paths are in the source code distributions; in the installation the files are located here: include/shmem.fh include/mpp/shmem.fh) This works. Unless you start using CMake. Because CMake is 'intelligent' and try to add the search paths recursively, (I believe,) gloriously enabling the infinite loop by including the 'shmem.fh' file from the 'shmem.fh' file. Steps to repriduce: $ mkdir build; cd build; cmake .. $ make The second one command need some minute(s), sticking by the 'Scanning dependencies of target mpihelloworld' step. If connecting by 'strace -p ' to the 'cmake' process you will see lines like below, again and again. So I think CMake just include the 'shmem.fh' file from itself unless the stack is full / a limit is reached / the moon shines, and thus hangs for a while (seconds/minutes) in the 'Scanning dependencies...' state. *Well, maybe having a file including the same file is not that good?* If the file 'include/mpp/shmem.fh' would include not 'shmem.fh' but 'somethingelse.fh' located in 'include/...' these infinite loop would be impossible at all... And by the way: is here a way to limit the maximum include depths in CMake for header files? This would workaround this one 'infinite include loop' issue... Have a nice day, Paul Kapinos .. access("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", R_OK) = 0 stat("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", {st_mode=S_IFREG|0644, st_size=205, ...}) = 0 open("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", O_RDONLY) = 5271 fstat(5271, {st_mode=S_IFREG|0644, st_size=205, ...}) = 0 mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08457d2000 read(5271, "!\n! Copyright (c) 2013 Me"..., 32768) = 205 read(5271, "", 32768) = 0 access("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", R_OK) = 0 stat("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", {st_mode=S_IFREG|0644, st_size=205, ...}) = 0 open("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", O_RDONLY) = 5272 fstat(5272, {st_mode=S_IFREG|0644, st_size=205, ...}) = 0 mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08457ca000 read(5272, "!\n! Copyright (c) 2013 Me"..., 32768) = 205 read(5272, "", 32768) = 0 .. -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 cmake_minimum_required(VERSION 2.8) project(mpihelloworld) enable_language(Fortran) find_package(MPI REQUIRED) #include_directories(${MPI_INCLUDE_PATH}) include_directories(${MPI_Fortran_INCLUDE_PATH}) add_executable(mpihelloworld mpihelloworld.F90) #link_directories(${MPI_LIBRARIES}) #target_link_libraries(mpihelloworld ${MPI_LIBRARIES}) link_directories(${MPI_Fortran_LIBRARIES}) target_link_libraries(mpihelloworld ${MPI_Fortran_LIBRARIES}) ! Paul Kapinos 22.09.2009 - ! RZ RWTH Aachen, www.rz.rwth-aachen.de ! ! MPI-Hello-World ! PROGRAM PK_MPI_Test ! USE MPI IMPLICIT NONE include "mpif.h" #if defined(SHMEM) INCLUDE "mpp/shmem.fh" #endif ! INTEGER :: my_MPI_Rank, laenge, ierr CHARACTER*(MPI_MAX_PROCESSOR_NAME) my_Host CALL MPI_INIT (ierr) CALL MPI_COMM_RANK( MPI_COMM_WORLD, my_MPI_Rank, ierr ) CALL MPI_GET_PROCESSOR_NAME(my_Host, laenge, ierr) WRITE (*,*) "Prozessor ", my_MPI_Rank, "on Host: ", my_Host(1:laenge) CALL Sleep(1) CALL MPI_FINALIZE(ierr) CONTAINS SUBROUTINE foo1 #if defined(SHMEM) INCLUDE "mpp/shmem.fh" #endif END SUBROUTINE foo1 SUBROUTINE foo2 #if defined(SHMEM) INCLUDE "mpp/shmem.fh" #endif END SUBROUTINE foo2 SUBROUTINE foo3 #if defined(SHMEM) INCLUDE "mpp/shmem.fh" #endif END SUBROUTINE foo3 SUBROUTINE foo4 #if defined(SHMEM) INCLUDE "mpp/shmem.fh" #endif END SUBROUTINE foo4 SUBROUTINE foo5 #if defined(SHMEM) INCLUDE "mpp/shmem.fh" #endif END SUBROUTINE foo5 END PROGRAM PK_MPI_Test smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] SHMEM, "mpp/shmem.fh", CMake and infinite loops
Hi Gilles, On 07/13/16 01:10, Gilles Gouaillardet wrote: Paul, The two header files in include/mpp simply include the file with the same name in the upper directory. Yessir! (and CMake do not care about the upper directory and build infinite loop) A simple workaround is to replace these two files in include/mpp with symbolic links to files with the same name in the upper directory. Would you mind giving this a try ? It work very well, at least for the one test case provided. So yes, patching any installation of Open MPI could be a workaround. However we would really love to avoid this need to patch any Open MPI installation Maybe OpenMPI's developer could think about how-to minimize the probability of such loops? Symlink is one alternative, another one would be renaming one of the headers.. we fully trust to Open MPI's developers expertise in this :-) Have a nice day, Paul Kapinos pk224850@linuxc2:/opt/MPI/openmpi-1.8.1/linux/intel/include[519]$ ls -la mpp/shmem.fh lrwxrwxrwx 1 pk224850 pk224850 11 Jul 13 13:20 mpp/shmem.fh -> ../shmem.fh Cheers, Gilles On Wednesday, July 13, 2016, Paul Kapinos <kapi...@itc.rwth-aachen.de <mailto:kapi...@itc.rwth-aachen.de>> wrote: Dear OpenMPI developer, we have some troubles when using OpenMPI and CMake on codes using 'SHMEM'. Cf. 'man shmem_swap', > Fortran: > INCLUDE "mpp/shmem.fh" Yes here is one such header file: > openmpi-1.X.Y/oshmem/include/mpp/shmem.fh ... since version 1.7. at least. The significnat content is this line: > include 'shmem.fh' whereby OpenMPI mean to include not the same file by itself (= infinite loop!) but I believe these one file: > openmpi-1.X.Y/oshmem/include/shmem.fh (The above paths are in the source code distributions; in the installation the files are located here: include/shmem.fh include/mpp/shmem.fh) This works. Unless you start using CMake. Because CMake is 'intelligent' and try to add the search paths recursively, (I believe,) gloriously enabling the infinite loop by including the 'shmem.fh' file from the 'shmem.fh' file. Steps to repriduce: $ mkdir build; cd build; cmake .. $ make The second one command need some minute(s), sticking by the 'Scanning dependencies of target mpihelloworld' step. If connecting by 'strace -p ' to the 'cmake' process you will see lines like below, again and again. So I think CMake just include the 'shmem.fh' file from itself unless the stack is full / a limit is reached / the moon shines, and thus hangs for a while (seconds/minutes) in the 'Scanning dependencies...' state. *Well, maybe having a file including the same file is not that good?* If the file 'include/mpp/shmem.fh' would include not 'shmem.fh' but 'somethingelse.fh' located in 'include/...' these infinite loop would be impossible at all... And by the way: is here a way to limit the maximum include depths in CMake for header files? This would workaround this one 'infinite include loop' issue... Have a nice day, Paul Kapinos .. access("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", R_OK) = 0 stat("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", {st_mode=S_IFREG|0644, st_size=205, ...}) = 0 open("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", O_RDONLY) = 5271 fstat(5271, {st_mode=S_IFREG|0644, st_size=205, ...}) = 0 mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08457d2000 read(5271, "!\n! Copyright (c) 2013 Me"..., 32768) = 205 read(5271, "", 32768) = 0 access("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", R_OK) = 0 stat("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", {st_mode=S_IFREG|0644, st_size=205, ...}) = 0 open("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", O_RDONLY) = 5272 fstat(5272, {st_mode=S_IFREG|0644, st_size=205, ...}) = 0 mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08457ca000 read(5272, "!\n! Copyright (c) 2013 Me"..., 32768) = 205 read(5272, "", 32768) = 0 .. -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 ___ devel mailing list de...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2016/07/19195.php -- Dipl.-Inform
[OMPI devel] Still bothered / cannot run an application
(cross-post to 'users' and 'devel' mailing lists) Dear Open MPI developer, a long time ago, I reported about an error in Open MPI: http://www.open-mpi.org/community/lists/users/2012/02/18565.php Well, in the 1.6 the behaviour has changed: the test case don't hang forever and block an InfiniBand interface, but seem to run through, and now this error message is printed: -- The OpenFabrics (openib) BTL failed to register memory in the driver. Please check /var/log/messages or dmesg for driver specific failure reason. The failure occured here: Local host:mlx4_0 Device:openib_reg_mr Function: Cannot allocate memory() Errno says: You may need to consult with your system administrator to get this problem fixed. -- Looking into FAQ http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages deliver us no hint about what is bad. The locked memory is unlimited: -- pk224850@linuxbdc02:~[502]$ cat /etc/security/limits.conf | grep memlock #- memlock - max locked-in-memory address space (KB) * hardmemlock unlimited * softmemlock unlimited -- Could it still be an Open MPI issue? Are you interested in reproduce this? Best, Paul Kapinos P.S: The same test with Intel MPI cannot run using DAPL, but run very fine opef 'ofa' (= native verbs as Open MPI use it). So I believe the problem is rooted in the communication pattern of the program; it send very LARGE messages to a lot of/all other processes. (The program perform an matrix transposition of a distributed matrix). -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] MPI_THREAD_MULTIPLE testing on trunk
Christopher, I cannot reproduce your problem on my fresh installed 1.6.1rc2. I've used the attached program which is essentially your test case with a bit modification sin order to make in compilable. But what I see is that there seem to be a small timeout somewhere in initializing stage: if you starting processes on nodes in another IB island without explicitly definition which interface has to be used for startup communication, it hangs for some 20 seconds. (I think openmpi try to communicate over not connected Eth's and run into timeout). Thus we use this: -mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0 Nevertheless, I cannot reproduce your initial issue with 1.6.1rc2 in our environment. Best Paul Kapinos $ time /opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0 -np 4 -H linuxscc005,linuxscc004 a.out linuxscc004.rz.RWTH-Aachen.DE(3) of 4 provided=(3) linuxscc005.rz.RWTH-Aachen.DE(0) of 4 provided=(3) linuxscc004.rz.RWTH-Aachen.DE(1) of 4 provided=(3) linuxscc005.rz.RWTH-Aachen.DE(2) of 4 provided=(3) /opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -mca oob_tcp_if_include 0.06s user 0.09s system 9% cpu 1.608 total $ time /opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -np 4 -H linuxscc005,linuxscc004 a.out linuxscc004.rz.RWTH-Aachen.DE(1) of 4 provided=(3) linuxscc004.rz.RWTH-Aachen.DE(3) of 4 provided=(3) linuxscc005.rz.RWTH-Aachen.DE(0) of 4 provided=(3) linuxscc005.rz.RWTH-Aachen.DE(2) of 4 provided=(3) /opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -np 4 -H a.out 0.04s user 0.10s system 0% cpu 23.600 total On 08/03/12 09:29, Christopher Yeoh wrote: I've narrowed it down to a very simple test case (you don't need to explicitly spawn any threads). Just need a program like: If its run with "--mpi-preconnect_mpi 1" then it hangs in MPI_Init_thread. If not, then it hangs in MPI_Barrier. Get a backtrace that looks like this (with the former): -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 #include #include #include #include #include int main(int argc, char **argv) { char hostname[MPI_MAX_PROCESSOR_NAME]; int rank, size, provided, laenge; MPI_Init_thread( , , MPI_THREAD_MULTIPLE, ); MPI_Comm_size(MPI_COMM_WORLD, ); MPI_Comm_rank(MPI_COMM_WORLD, ); MPI_Get_processor_name(hostname,); if (provided != MPI_THREAD_MULTIPLE) { MPI_Finalize(); errx(1, "MPI_Init_thread expected, MPI_THREAD_MULTIPLE (%d), " "got %d \n", MPI_THREAD_MULTIPLE, provided); } printf("%s(%d) of %d provided=(%d)\n", hostname, rank, size, provided); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] 1.6.1rc1 posted
Jeff, All, testing our well-known example of the registered memory problem (see http://www.open-mpi.org/community/lists/users/2012/02/18565.php) on freshly-installed 1.6.1rc2, found out that "Fall back to send/receive semantics" did not work always it. However the behaviour has changed: 1.5.3. and older: MPI processes hang and block the IB interface(s) forever 1.6.1rc2: a) MPI processes run through (if the chunk size is less than 8Gb) with or without a warning; or b) MPI processes die (if the chunk size is more than 8Gb) Note that the same program which die in (b) run fine over IPoIB (-mca btl ^openib). However, the performance is very bad in this case... some 1100 sec. instead of about a minute. Reproducing: compile attached file and let it run on nodes with >=24GB with log_num_mtt : 20 log_mtts_per_seg: 3 (=32Gb, our default values): $ mpiexec a.out 108000 108001 Well, we know about the need to raise the values of one of these parameters, but I wanted to let you to know that your workaround for the problem is still not 100% perfect but only 99%. Best, Paul Kapinos P.S: A note about the informative warning: -- WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. Registerable memory: 32768 MiB Total memory:98293 MiB -- On node with 24 GB this warning did not came around, although the max. size of registered memory (32GB) is only 1.5x of RAM, but in http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem at least the 2x RAM size is recommended. Should this warning not came out in all cases when registered memory < 2x RAM? On 07/28/12 04:20, Jeff Squyres wrote: - A bunch of changes to eliminate hangs on OpenFabrics-based networks. Users with Mellanox hardware are ***STRONGLY ENCOURAGED*** to check their registered memory kernel module settings to ensure that the OS will allow registering more than 8GB of memory. See this FAQ item for details: http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem - Fall back to send/receive semantics if registered memory is unavilable for RDMA. -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 // #include "mpi.h" nach oben, sonst Fehlermeldung ///opt/intel/impi/4.0.3.008/intel64/include/mpicxx.h(93): catastrophic error: #error directive: "SEEK_SET is #defined but must not be for the C++ binding of MPI. Include mpi.h before stdio.h" // #error "SEEK_SET is #defined but must not be for the C++ binding of MPI. Include mpi.h before stdio.h" #include "mpi.h" #include #include #include #include #include using namespace std; void startMPI( int , int , int & argc, char ** argv ) { MPI_Init(, ); MPI_Comm_size(MPI_COMM_WORLD, ); MPI_Comm_rank(MPI_COMM_WORLD, ); } void cleanUp( double * vector) { delete [] vector; vector = 0; } // wollen wir MPI_Standard-Konform bleibene //void initVector( double * vector, unsigned long , double val) void initVector( double * vector, int , double val) { for( int i=0; i<length; ++i ) { vector[i] = val; } } /* ** Ausführung nur vom Master! ** Test der Eingabe: ** Initialisierung der Vektorlänge mit dem eingelesenen Wert */ // wollen wir MPI_Standard-Konform bleibene //void input( int & argc, char ** argv, unsigned long ) void input( int & argc, char ** argv, int , int ) { length = 1000; block = 1000; if( argc > 1) { length = atol(argv[1]); } if( argc > 2) { block = atol(argv[2]); } } /* ** Testausgabe (optional) */ // wollen wir MPI_Standard-Konform bleibene //void printVector( double * vector, unsigned long , int , int ) void printVector( double * vector, int , int , int ) { printf("process %i:\t[ ", proc); for(long i=0;i<length;++i) { if( i%count==0 && i>0) { printf("\n\t\t "); } printf(" %5.1f", vector[i] ); } printf(" ] \n"); } // wollen wir MPI_Standard-Konform bleibene //void checkResult( double , unsigned long , const double , int , const double ) void checkResult( double , int , const double , int , const double ) { double targetVal = bufferLength * testVal * procCount; double diff = (targetVal >= checkSum)? (targetVal - checkSum):(targetVal - checkSum)*(-1); if(diff < epsilon) { printf("# Test ok! #\n"); } else { printf("difference occured: %lf \n", diff); } printf("[SOLL | IST] =
Re: [OMPI devel] MPI_THREAD_MULTIPLE testing on trunk
Hi Christopher, I cannot reproduce your problem on my fresh installed 1.6.1rc2. I've used the attached program which is essentially your test case with a bit modification sin order to make in compilable. The 1.6 tree works fine for me, its only trunk (I haven't tried 1.7). Any chance you could try a recent trunk snapshot to see if it works for you? Would be handy for me to know if if its something specific to the setup I'm testing or not. I've tried to compile the latest snapshot version: 1.9a1r26955 and run into this: http://www.open-mpi.org/community/lists/devel/2012/08/11377.php so currently I cannot check it again. Remember me again if the link issue is fixed! Best, Paul -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] 1.6.1rc1 posted
Hi Jeff, Hi All, On 08/07/12 18:51, Jeff Squyres wrote: So I'm not 100% clear on what you mean here: when you set the OFED params to allow registration of more memory than you have physically, does the problem go away? We are talking about machines with 24GB RAM (S) and 96GB RAM (L). The default values for Mellanox/OFED parameter are 20/3 => 32GB registerable memory (RM) on both S and L. This is more than memory of S, but less than 2x memory of S, and less than memory of L. If the OFED parameter are pimped to at least RM=64GB (20/3 => 21/3, 22/3, 24/3) there are no errors, I've just tested it with 8GB respectively 15.5 GB of data (starting usually 1x ppn). If the OFED parameter are _not_ changed (=32GB RM) there is _no_ warning on S nodes; on L nodes this warns the user: -- WARNING: It appears that your OpenFabrics subsystem is configured to only ... Registerable memory: 32768 MiB Total memory:98293 MiB -- .. hardly surprising - the warning came if and only if (RM < memory). If the OFED parameter are _not_ changed (=32GB RM) and I'm trying to send at least 8GB _in one chunk_ then the 'queue pair' error came out (see S_log.txt and my last mail). More exactly at least one process seem to die in MPI_Finalize (all output of the program is correct). The same error came out also on L nodes, surrounded by the above warning (L_log.txt). From your log messages, the warning messages were from machines with nearly 100GB RAM but only 32GB register-able. But only one of those was the same as one that showed QP creation failures. >> So I'm not clear which was which. Regardless: can you pump the MTT params up to allow registering all of physical memory on those machines, and see if you get any failures? as you can see on a node with 24GB memory and 32GB RM there can be a failure without any warning from Open MPI side :-( To be clear: we're trying to determine if we should spend more effort on making OMPI work properly in low-registered-memory-availabile scenarios, or whether we should just emphasize "go reset your MTT parameters to allow registering all of physical memory." After making the experience with failures when only 1.5x of phys.mem. is allowed for registering I would follow Mellanox in "go reset your MTT to allow _twice_ the phys.memory". So, - if the OFED parameter are pimped everything is OK - there is a [rare] combination when your great workaround did not catch. - allowing 2x memory for being registered could be a good idea. Does this make sense? Best, Paul Kapinos P.S. The used example program is of course an synthetical thing but it is strongly sympathized with the Serpent software. (however serpent usually use chunks whereby the actual error arise if all the 8GB are send in one piece). P.S.2 When all works, with increasing chunk size to HUGE values, the performance seem to became worse - sending all 15.5 GB in one piece is more than twice slower than sending with 200 mb pieces. See chunked_send.txt (the first parameter is #doubles of data, the 2nd is #doubles in a chunk). P.S.3 all experiments above with 1.6.1rc2 P.S.4. I'm also performing some linpack runs with 6x nodes and my very first impression is that increasing log_num_mtt to huge values is a bad idea (performance loss of some 5%). But let me finish it first... -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 $ $MPIEXEC -n 6 -m 1 -H linuxbmc0191,linuxbmc0037,linuxbmc0105,linuxbmc0219,linuxbmc0221,linuxbmc0246 a.out 208000 208001 MPI_Reduce in einem! Elapsed time: 281.862924 $ $MPIEXEC -n 6 -m 1 -H linuxbmc0191,linuxbmc0037,linuxbmc0105,linuxbmc0219,linuxbmc0221,linuxbmc0246 a.out 208000 20800 Elapsed time: 137.281245 $ $MPIEXEC -n 6 -m 1 -H linuxbmc0191,linuxbmc0037,linuxbmc0105,linuxbmc0219,linuxbmc0221,linuxbmc0246 a.out 208000 2080 Elapsed time: 124.584747 $ $MPIEXEC -n 6 -m 1 -H linuxbmc0191,linuxbmc0037,linuxbmc0105,linuxbmc0219,linuxbmc0221,linuxbmc0246 a.out 208000 208 Elapsed time: 124.167813 process 1 starts test process 3 starts test process 5 starts test process 0 starts test Epsilon = 0.10 process 2 starts test process 4 starts test ### process 4 yields partial sum: 10800.4 size, block: 108000 108001 MPI_Reduce in einem! ### process 2 yields partial sum: 10800.4 size, block: 108000 108001 MPI_Reduce in einem! ### process 3 yields partial sum: 10800.4 ### process 0 yields partial sum: 10800.4 ### process 5 yields partial sum: 10800.4 size, block: 108000 108001 MPI_Reduce in einem!
[OMPI devel] Another case of 'secret' disabling of InfiniBand
Dear Open MPI developer, in this post: http://www.open-mpi.org/community/lists/users/2012/10/20416.php I already reported about a case when Open MPI silently (without any word of caution!) changed the transport from InfiniBand to IPoIB, thus loosing the performance. Another case of 'secret' disabling of InfiniBand is the use of 'multiple' threading level (assume the threading support is enabled by --enable-mpi-thread-multiple). Please have a look at ompi/mca/btl/openib/btl_openib_component.c (v.1.6.1, ll.2504-2508). In this lines a message about disabling InfiniBand transport is compounded, but normally it did not came out because is seem to be intended to be debug info only. The problem is not the fallback itself but the muted way it is done. The user has hardly a possibility to get it out about the application is creeping over TCP, unless the performance loss will be noticed and analysed. Well, we believe that disabling of high performance network _without any word a caution_ is a bad thing, because it may lead to huge waste of resources (because an actual problem may not be noticed for years - the program seem to work!). We will probably forbid any fallback to workaround this scenarios in future. Maybe a bit more verbosity at this place is a good idea? Best, Paul Kapinos -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] MTT parameters vor really big nodes?
Yevgeny, Jeff, I've tried 26/2 on a node with 2TB RAM - the IB cards are not reachable with this setup. 26/3 not yet tested (it's a bit work for our admins to 'repair' a node in case it is not reachable over the IB interface). By now we've a couple of nodes with up to 2TB RAM running with 23/5 setup; this seem to be the sonic barrier. Best, Paul On 11/04/12 13:29, Yevgeny Kliteynik wrote: Hi Jeff, On 11/4/2012 1:11 PM, Jeff Squyres wrote: Yevgeny - Could Mellanox update the FAQ item about this? Large-memory nodes are becoming more common. Sure. But I'd like to hear Paul's input on this first. Did it work with log_num_mtt=26? I don't have that kind of machines to test this. -- YK On Nov 3, 2012, at 6:33 PM, Yevgeny Kliteynik wrote: Hi Paul, On 10/31/2012 10:22 PM, Paul Kapinos wrote: Hello Yevgeny, hello all, Yevgeny, first of all thanks for explaining what the MTT parameters do and why there are two of them! I mean this post: http://www.open-mpi.org/community/lists/devel/2012/08/11417.php Well, the official recommendation is "twice the RAM amount". And here we are: we have 2 nodes with 2 TB (that with a 'tera') RAM and a couple of nodes with 1TB, each with 4x Mellanox IB adapters. Thus we should have raised the MTT parameters in order to make up to 4 TB memory registrable. You don't really *have* to be able to register twice the available RAM. It's just heuristics. It depends on the application that you're running and fragmentation that it creates in the MTT. However: I've tried to raise the MTT parameters in multiple combinations, but the maximum amount of registrable memory I was able to get was one TB (23 / 5). All tries to get more (24/5, 23/6 for 2 TB) lead to not responding InfiniBand HCAs. Is there any another limits in the kernel have to be adjusted in order to be able to register that a bunch of memory? Unfortunately, current driver has a limitation in this area so 1TB (23/5 values) is probably the top what the driver can do. IIRC, log_num_mtt can reach 26, so perhaps you can try 26/2 (same 1TB), and then, if it works, try 26/3 (fingers crossed), which will bring you to 2 TB, but I'm not sure it will work. This has already been fixed, and the fix was accepted to the upstream Linux kernel, so it will be included in the next OFED/MLNX_OFED versions. -- YK Best, Paul Kapinos ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] Trunk: Link Failure -- multiple definition of ib_address_t_class
Got the same error on all builds (4 compiler, with and without trheading support, 64 and 32bit) on our systems, effectively prohibiting the building of the 1.7 release. Any idea how to workaround this? Need more logs? Best On 08/06/12 19:41, Gutierrez, Samuel K wrote: Looks like the type is defined twice - once in ompi/mca/common/ofacm/common_ofacm_xoob.h and another time in ./ompi/mca/btl/openib/btl_openib_xrc.h. Thanks, Sam On Aug 6, 2012, at 11:23 AM, Jeff Squyres wrote: I don't have XRC support in my kernels, so it wouldn't show up for me. Did someone have 2 instances of the ib_address_t class? On Aug 6, 2012, at 1:17 PM, Gutierrez, Samuel K wrote: Hi, Anyone else seeing this? Creating mpi/man/man3/OpenMPI.3 man page... CCLD libmpi.la mca/btl/openib/.libs/libmca_btl_openib.a(btl_openib_xrc.o):(.data.rel+0x0): multiple definition of `ib_address_t_class' mca/common/ofacm/.libs/libmca_common_ofacm_noinst.a(common_ofacm_xoob.o):(.data.rel+0x0): first defined here collect2: ld returned 1 exit status Thanks, Sam ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] Trunk: Link Failure -- multiple definition of ib_address_t_class
Hello, On 04/05/13 03:16, Paul Hargrove wrote: I found that not only did I need XRC, but I also needed to configure with --enable-static to reproduce the problem. I suspect that if Mellanox added --enable-static to an existing MTT configuration this would not have remained unfixed for so long. Well, AFAIK we do not use --enable-static in our builds and in the config-log --disable-static is seen multiple times. Neverthelesse we run into the error > mca/btl/openib/.libs/libmca_btl_openib.a(btl_openib_xrc.o):(.data.rel+0x0): > multiple definition of `ib_address_t_class' The configure line we're using is something like this: ./configure --with-verbs --with-lsf --with-devel-headers --enable-contrib-no-build=vt --enable-heterogeneous --enable-cxx-exceptions --enable-orterun-prefix-by-default --disable-dlopen --disable-mca-dso --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' --enable-mpi-ext .. (adding paths, compiler-specific optimisation things and -m32 or -m64) An config.log file attached FYI Best Paul -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 config.log.gz Description: GNU Zip compressed data smime.p7s Description: S/MIME Cryptographic Signature
[OMPI devel] two questions about 1.7.1
Hello All, I. Using the new Open MPI 1.7.1 we see some messages on the console: > example mpiext init > example mpiext fini ... on each call to MPI_INIT, MPI_FINALIZE at least in Fortran programs. Seems somebody forgot to disable some 'printf'-debug-output? =) II. In the 1.7.x series, the 'carto' framework has been deleted: http://www.open-mpi.org/community/lists/announce/2013/04/0053.php > - Removed maffinity, paffinity, and carto frameworks (and associated > MCA params). Is there some replacement for this? Or, would Open MPI detect the NUMA structure of nodes automatically? Background: Currently we're using the 'carto' framework on our kinda special 'Bull BCS' nodes. Each such node consist of 4 boards with own IB card but build a shared memory system. Clearly, communicating should go over the nearest IB interface - for this we use 'carto' now. best Paul Kapinos -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 PROGRAM PK_MPI_Test_2 USE MPI IMPLICIT NONE !include "mpif.h" INTEGER :: ierr CALL MPI_INIT (ierr) CALL MPI_FINALIZE(ierr) END PROGRAM PK_MPI_Test_2 ! ? ! $ ./a.out ! example mpiext init ! example mpiext fini smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] max. message size
Mr mohammad assadsolimani, 1. your questions belongs to the 'OMPI users' mailing list, this one is for developers. 2. in MPI, the input parameters are defined to be integers. Integers usually are 32bit long (unless you use ILP64 instead of LP64, what is still unusual). So I swear, your program send the messages as byte sequences and thus cannot send more than 2^31 (=2Gb) in one call. Possible solutions: - send data in chunks - use ILP64 (no idea about Open MPI support it) - use constructed data types instead of "bytes". (using doubles istead of sending the same data as byte sequence lead to 8x raise of the max message size). see also http://montecarlo.vtt.fi/mtg/2012_Madrid/Hans_Hammer2.pdf esp. pp. 6 - 7. Best Paul On 07/08/13 15:08, mohammad assadsolimani wrote: Dear all, I do my PhD in physics and use a program, which uses openmpi for a sophisticated calculation. But there is a Problem with "max. message size ". That is limited to ~2GB. I do not know how is called it exactly?! Is there any possibility to extend this memory size? I am very grateful for all of your help and thank you in advanced Mohammad -- Webmail: http://mail.livenet.ch Glauben entdecken: http://www.jesus.ch Christliches Webportal: http://www.livenet.ch ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] two questions about 1.7.1
Hi Jeff, On 06/19/13 15:26, Jeff Squyres (jsquyres) wrote: ... II. In the 1.7.x series, the 'carto' framework has been deleted: http://www.open-mpi.org/community/lists/announce/2013/04/0053.php - Removed maffinity, paffinity, and carto frameworks (and associated MCA params). Is there some replacement for this? Or, would Open MPI detect the NUMA structure of nodes automatically? Yes. OMPI uses hwloc internally now to figure this stuff out. Background: Currently we're using the 'carto' framework on our kinda special 'Bull BCS' nodes. Each such node consist of 4 boards with own IB card but build a shared memory system. Clearly, communicating should go over the nearest IB interface - for this we use 'carto' now. It should do this automatically in the 1.7 series. Hmm; I see there isn't any verbose output about which devices it picks, though. :-( Try this patch, and run with --mca btl_base_verbose 100 and see if you see appropriate devices being mapped to appropriate processes: Index: mca/btl/openib/btl_openib_component.c === --- mca/btl/openib/btl_openib_component.c (revision 28652) +++ mca/btl/openib/btl_openib_component.c (working copy) @@ -2712,6 +2712,8 @@ mca_btl_openib_component.ib_num_btls < mca_btl_openib_component.ib_max_btls); i++) { if (distance != dev_sorted[i].distance) { +BTL_VERBOSE(("openib: skipping device %s; it's too far away", + ibv_get_device_name(dev_sorted[i].ib_dev))); break; } Well, I've tried this path on actual 1.7.3 (where the code is moved some 12 lines - beginning with 2700). !! - no output "skipping device"! Also when starting main processes and -bind-to-socket used. What I see is >[cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_1, port 1 >[cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable device >[cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_0, port 1 >[cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable device .. one message block per process. Is seems that processes see both IB cards in the special nodes(*) but none were disabled, or at least the verbosity path did not worked. Well, is there any progress on this frontline? Or, can I activate more verbosity / what did I do wrong with the path? (see attached file) Best! Paul Kapinos *) the nodes used for testing are also Bull BCS nodes but vonsisting of just two boards instead of 4 -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. * Copyright (c) 2004-2013 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2006-2009 Mellanox Technologies. All rights reserved. * Copyright (c) 2006-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2006-2007 Voltaire All rights reserved. * Copyright (c) 2009-2012 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2011-2013 NVIDIA Corporation. All rights reserved. * Copyright (c) 2012 Oak Ridge National Laboratory. All rights reserved * Copyright (c) 2013 Intel, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow * * $HEADER$ */ #include "ompi_config.h" #include #include #include #ifdef HAVE_UNISTD_H #include #endif #include #include #include #include #if BTL_OPENIB_MALLOC_HOOKS_ENABLED /* * The include of malloc.h below breaks abstractions in OMPI (by * directly including a header file from another component), but has * been ruled "ok" because the openib component is only supported on * Linux. * * The malloc hooks in newer glibc were deprecated, including stock * malloc.h causes compilation warnings. Instead, we use the internal * linux component malloc.h which does not cause these warnings. * Internally, OMPI uses the built-in ptmalloc from the linux memory * component anyway. */ #include "opal/mca/memory/linux/malloc.h" #endif #include &qu
Re: [OMPI devel] two questions about 1.7.1
On 12/03/13 23:27, Jeff Squyres (jsquyres) wrote: On Nov 22, 2013, at 1:19 PM, Paul Kapinos <kapi...@rz.rwth-aachen.de> wrote: Well, I've tried this path on actual 1.7.3 (where the code is moved some 12 lines - beginning with 2700). !! - no output "skipping device"! Also when starting main processes and -bind-to-socket used. What I see is [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_1, port 1 [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable device [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_0, port 1 [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable device That's actually ok -- that's from the usnic BTL, not the openib BTL. The usnic BTL is the Cisco UD verbs component, and it only works with Cisco UCS servers and VICs; it will not work with generic IB cards. Hence, these messages are telling you that the usnic BTL is disqualifying itself because the ibv devices it found are not Cisco UCS VICs. Argh - what a shame not to see "btl:usnic" :-| Look for the openib messages, not the usnic messages. Well, as said there were *no messages* form the patch you provided in http://www.open-mpi.org/community/lists/devel/2013/06/12472.php I've attached of a run with single process per node on nodes with 2 NICs, maybe you can see what goes wrong.. Best Paul -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: registering btl components [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found loaded component self [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: component self register function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found loaded component sm -- WARNING: A user-supplied value attempted to override the default-only MCA variable named "btl_sm_use_knem". The user-supplied value was ignored. -- [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: component sm register function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found loaded component openib [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: component openib register function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found loaded component usnic [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: component usnic register function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: opening btl components [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found loaded component self [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component self open function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found loaded component sm [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component sm open function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found loaded component openib [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component openib open function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found loaded component usnic [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component usnic open function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component self [cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component self returned success [cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component sm [cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component sm returned success [cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component openib [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: registering btl components [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded component self [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component self register function successful [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded component sm [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component sm register function successful [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded component openib [cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: oob CPC available for use on mlx4_1:1 [cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: rdmacm IP address not found on port [c
Re: [OMPI devel] two questions about 1.7.1
On 12/04/13 14:53, Jeff Squyres (jsquyres) wrote: On Dec 4, 2013, at 4:31 AM, Paul Kapinos <kapi...@rz.rwth-aachen.de> wrote: Argh - what a shame not to see "btl:usnic" :-| What a shame you don't have Cisco hardware to use the usnic BTL! :-p Well, this is far above my decision level :o) Look for the openib messages, not the usnic messages. Well, as said there were *no messages* form the patch you provided in http://www.open-mpi.org/community/lists/devel/2013/06/12472.php Ah, I see. I've attached of a run with single process per node on nodes with 2 NICs, maybe you can see what goes wrong.. What I'm guessing is happening here is that hwloc was built without PCI device detection, and therefore you're not getting the benefit of the near/far detection. I don't think we currently export whether hwloc was built with PCI device detection support or not, so look for the section in your configure output labeled: --- MCA component hwloc:hwloc152 (m4 configuration macro, priority 75) Send the output of that section here. There should be tests for PCI libraries in there; that should tell us whether you have PCI detection support enabled. The whole configure output attached, to prevent bad copying, as far as output of 'ompi_info --all'. As far as I see "it should be there": > checking whether to enable hwloc PCI device support... yes (default) -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 log_01_conf.txt.gz Description: GNU Zip compressed data openmpi-1.7.3js_ompi_info--all.txt.gz Description: GNU Zip compressed data smime.p7s Description: S/MIME Cryptographic Signature
[OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions
Dear Open MPI developer, I. we see peculiar behaviour in the new 1.7.4 version of Open MPI which is a change to previous versions: - when calling "mpiexec", it returns "1" and exits silently. The behaviour is reproducible; well not that easy reproducible. We have multiple InfiniBand islands in our cluster. All nodes are passwordless reachable from each other in somehow way; some via IPoIB, for some routing you also have to use ethernet cards and IB/TCP gateways. One island (b) is configured to use the IB card as the main TCP interface. In this island, the variable OMPI_MCA_oob_tcp_if_include is set to "ib0" (*) Another island (h) is configured in convenient way: IB cards also are here and may be used for IPoIB in the island, but the "main interface" used for DNS and Hostname binds is eth0. When calling 'mpiexec' from (b) to start a process on (h), and OpenMPI version is 1.7.4, and OMPI_MCA_oob_tcp_if_include is set to "ib0", mpiexec just exits with return value "1" and no error/warning. When OMPI_MCA_oob_tcp_if_include is unset it works pretty fine. All previously versions of Open MPI (1.6.x, 1.7.3) ) did not have this behaviour; so this is aligned to v1.7.4 only. See log below. You ask why to hell starting MPI processes on other IB island? Because our front-end nodes are in the island (b) but we sometimes need to start something also on island (h), which has been worced perfectly until 1.7.4. (*) This is another Spaghetti Western long story. In short, we set OMPI_MCA_oob_tcp_if_include to 'ib0' in the subcluster where the IB card is configured to be the main network interface, in order to stop Open MPI trying to connect via (possibly unconfigured) ethernet cards - which lead to endless waiting, sometimes. Cf. http://www.open-mpi.org/community/lists/users/2011/11/17824.php -- pk224850@cluster:~[523]$ module switch $_LAST_MPI openmpi/1.7.3 Unloading openmpi 1.7.3 [ OK ] Loading openmpi 1.7.3 for intel compiler [ OK ] pk224850@cluster:~[524]$ $MPI_BINDIR/mpiexec -H linuxscc004 -np 1 hostname ; echo $? linuxscc004.rz.RWTH-Aachen.DE 0 pk224850@cluster:~[525]$ module switch $_LAST_MPI openmpi/1.7.4 Unloading openmpi 1.7.3 [ OK ] Loading openmpi 1.7.4 for intel compiler [ OK ] pk224850@cluster:~[526]$ $MPI_BINDIR/mpiexec -H linuxscc004 -np 1 hostname ; echo $? 1 pk224850@cluster:~[527]$ -- II. During some experiments with envvars and v1.7.4, got the below messages. -- Sorry! You were supposed to get help about: no-included-found But I couldn't open the help file: /opt/MPI/openmpi-1.7.4/linux/intel/share/openmpi/help-oob-tcp.txt: No such file or directory. Sorry! -- [linuxc2.rz.RWTH-Aachen.DE:13942] [[63331,0],0] ORTE_ERROR_LOG: Not available in file ess_hnp_module.c at line 314 -- Reproducing: $MPI_BINDIR/mpiexec -mca oob_tcp_if_include ib0 -H linuxscc004 -np 1 hostname *frome one node with no 'ib0' card*, also without infiniband. Yessir this is a bad idea, and the 1.7.3 has said more understanding "you do wrong thing": -- None of the networks specified to be included for out-of-band communications could be found: Value given: ib0 Please revise the specification and try again. -- No idea, why the file share/openmpi/help-oob-tcp.txt has not been installed in 1.7.4, as we compile this version in pretty the same way as previous versions.. Best, Paul Kapinos -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions
As said, the change in behaviour is new in 1.7.4 - all previous versions has been worked. Moreover, setting "-mca oob_tcp_if_include ib0" is a workaround for older versions of Open MPI for some 60-seconds timeout when starting the same command (which is still sucessfull); or for infinite waiting in same cases. Attached are logs of the commands: $ export | grep OMPI | tee export_OMPI-linuxbmc0008.txt $ $MPI_BINDIR/mpiexec -mca oob_tcp_if_include ib0 -mca oob_base_verbose 100 -H linuxscc004 -np 1 hostname 2>&1 | tee oob_base_verbose-linuxbmc0008-173.txt (and -174 for appropriate versions 1.7.3 and 1.7.4) $ ifconfig 2>&1 | tee ifconfig-linuxbmc0008.txt (and -linuxscc004 for the two nodes; linuxscc004 is in (h) fabric and 'mpiexec' was called from node linuxbmc0008 which is in the (b) fabric where the 'ib0' is configured to be the main interface) and the OMPI environment on linuxbmc0008. Maybe you can see something from this. Best Paul On 02/11/14 20:29, Ralph Castain wrote: I've added better error messages in the trunk, scheduled to move over to 1.7.5. I don't see anything in the code that would explain why we don't pickup and use ib0 if it is present and specified in if_include - we should be doing it. For now, can you run this with "-mca oob_base_verbose 100" on your cmd line and send me the output? Might help debug the behavior. Thanks Ralph On Feb 11, 2014, at 1:22 AM, Paul Kapinos <kapi...@rz.rwth-aachen.de> wrote: Dear Open MPI developer, I. we see peculiar behaviour in the new 1.7.4 version of Open MPI which is a change to previous versions: - when calling "mpiexec", it returns "1" and exits silently. The behaviour is reproducible; well not that easy reproducible. We have multiple InfiniBand islands in our cluster. All nodes are passwordless reachable from each other in somehow way; some via IPoIB, for some routing you also have to use ethernet cards and IB/TCP gateways. One island (b) is configured to use the IB card as the main TCP interface. In this island, the variable OMPI_MCA_oob_tcp_if_include is set to "ib0" (*) Another island (h) is configured in convenient way: IB cards also are here and may be used for IPoIB in the island, but the "main interface" used for DNS and Hostname binds is eth0. When calling 'mpiexec' from (b) to start a process on (h), and OpenMPI version is 1.7.4, and OMPI_MCA_oob_tcp_if_include is set to "ib0", mpiexec just exits with return value "1" and no error/warning. When OMPI_MCA_oob_tcp_if_include is unset it works pretty fine. All previously versions of Open MPI (1.6.x, 1.7.3) ) did not have this behaviour; so this is aligned to v1.7.4 only. See log below. You ask why to hell starting MPI processes on other IB island? Because our front-end nodes are in the island (b) but we sometimes need to start something also on island (h), which has been worced perfectly until 1.7.4. (*) This is another Spaghetti Western long story. In short, we set OMPI_MCA_oob_tcp_if_include to 'ib0' in the subcluster where the IB card is configured to be the main network interface, in order to stop Open MPI trying to connect via (possibly unconfigured) ethernet cards - which lead to endless waiting, sometimes. Cf. http://www.open-mpi.org/community/lists/users/2011/11/17824.php -- pk224850@cluster:~[523]$ module switch $_LAST_MPI openmpi/1.7.3 Unloading openmpi 1.7.3 [ OK ] Loading openmpi 1.7.3 for intel compiler [ OK ] pk224850@cluster:~[524]$ $MPI_BINDIR/mpiexec -H linuxscc004 -np 1 hostname ; echo $? linuxscc004.rz.RWTH-Aachen.DE 0 pk224850@cluster:~[525]$ module switch $_LAST_MPI openmpi/1.7.4 Unloading openmpi 1.7.3 [ OK ] Loading openmpi 1.7.4 for intel compiler [ OK ] pk224850@cluster:~[526]$ $MPI_BINDIR/mpiexec -H linuxscc004 -np 1 hostname ; echo $? 1 pk224850@cluster:~[527]$ -- II. During some experiments with envvars and v1.7.4, got the below messages. -- Sorry! You were supposed to get help about: no-included-found But I couldn't open the help file: /opt/MPI/openmpi-1.7.4/linux/intel/share/openmpi/help-oob-tcp.txt: No such file or directory. Sorry! -- [linuxc2.rz.RWTH-Aachen.DE:13942] [[63331,0],0] ORTE_ERROR_LOG: Not available in file ess_hnp_module.c at line 314 -- Reproducing: $MPI_BINDIR/mpiexec -mca oob_tcp_if_include ib0 -H linuxscc004 -np 1 hostname *frome one node with no 'ib0' card*, also without i
Re: [OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions
Attached the output from openmpi/1.7.5a1r30708 $ $MPI_BINDIR/mpiexec -mca oob_tcp_if_include ib0 -mca oob_base_verbose 100 -H linuxscc004 -np 1 hostname 2>&1 | tee oob_base_verbose-linuxbmc0008-175a1r29587.txt Well, some 5 lines added. (The ib0 on linuxscc004 is not reachable from linuxbmc0008 - this lead to TCP shutdown? cf. line 36-37) On 02/13/14 01:28, Ralph Castain wrote: Could you please give the nightly 1.7.5 tarball a try using the same cmd line options and send me the output? I see the problem, but am trying to understand how it happens. I've added a bunch of diagnostic statements that should help me track it down. Thanks Ralph On Feb 12, 2014, at 1:26 AM, Paul Kapinos <kapi...@rz.rwth-aachen.de> wrote: As said, the change in behaviour is new in 1.7.4 - all previous versions has been worked. Moreover, setting "-mca oob_tcp_if_include ib0" is a workaround for older versions of Open MPI for some 60-seconds timeout when starting the same command (which is still sucessfull); or for infinite waiting in same cases. Attached are logs of the commands: $ export | grep OMPI | tee export_OMPI-linuxbmc0008.txt $ $MPI_BINDIR/mpiexec -mca oob_tcp_if_include ib0 -mca oob_base_verbose 100 -H linuxscc004 -np 1 hostname 2>&1 | tee oob_base_verbose-linuxbmc0008-173.txt (and -174 for appropriate versions 1.7.3 and 1.7.4) $ ifconfig 2>&1 | tee ifconfig-linuxbmc0008.txt (and -linuxscc004 for the two nodes; linuxscc004 is in (h) fabric and 'mpiexec' was called from node linuxbmc0008 which is in the (b) fabric where the 'ib0' is configured to be the main interface) and the OMPI environment on linuxbmc0008. Maybe you can see something from this. Best Paul On 02/11/14 20:29, Ralph Castain wrote: I've added better error messages in the trunk, scheduled to move over to 1.7.5. I don't see anything in the code that would explain why we don't pickup and use ib0 if it is present and specified in if_include - we should be doing it. For now, can you run this with "-mca oob_base_verbose 100" on your cmd line and send me the output? Might help debug the behavior. Thanks Ralph On Feb 11, 2014, at 1:22 AM, Paul Kapinos <kapi...@rz.rwth-aachen.de> wrote: Dear Open MPI developer, I. we see peculiar behaviour in the new 1.7.4 version of Open MPI which is a change to previous versions: - when calling "mpiexec", it returns "1" and exits silently. The behaviour is reproducible; well not that easy reproducible. We have multiple InfiniBand islands in our cluster. All nodes are passwordless reachable from each other in somehow way; some via IPoIB, for some routing you also have to use ethernet cards and IB/TCP gateways. One island (b) is configured to use the IB card as the main TCP interface. In this island, the variable OMPI_MCA_oob_tcp_if_include is set to "ib0" (*) Another island (h) is configured in convenient way: IB cards also are here and may be used for IPoIB in the island, but the "main interface" used for DNS and Hostname binds is eth0. When calling 'mpiexec' from (b) to start a process on (h), and OpenMPI version is 1.7.4, and OMPI_MCA_oob_tcp_if_include is set to "ib0", mpiexec just exits with return value "1" and no error/warning. When OMPI_MCA_oob_tcp_if_include is unset it works pretty fine. All previously versions of Open MPI (1.6.x, 1.7.3) ) did not have this behaviour; so this is aligned to v1.7.4 only. See log below. You ask why to hell starting MPI processes on other IB island? Because our front-end nodes are in the island (b) but we sometimes need to start something also on island (h), which has been worced perfectly until 1.7.4. (*) This is another Spaghetti Western long story. In short, we set OMPI_MCA_oob_tcp_if_include to 'ib0' in the subcluster where the IB card is configured to be the main network interface, in order to stop Open MPI trying to connect via (possibly unconfigured) ethernet cards - which lead to endless waiting, sometimes. Cf. http://www.open-mpi.org/community/lists/users/2011/11/17824.php -- pk224850@cluster:~[523]$ module switch $_LAST_MPI openmpi/1.7.3 Unloading openmpi 1.7.3 [ OK ] Loading openmpi 1.7.3 for intel compiler [ OK ] pk224850@cluster:~[524]$ $MPI_BINDIR/mpiexec -H linuxscc004 -np 1 hostname ; echo $? linuxscc004.rz.RWTH-Aachen.DE 0 pk224850@cluster:~[525]$ module switch $_LAST_MPI openmpi/1.7.4 Unloading openmpi 1.7.3 [ OK ] Loading openmpi 1.7.4 for intel compiler [ OK ] pk224850@cluster:~[526]$ $MPI_BINDIR/mpiexec -H linuxscc004 -np 1 hostname ; echo $? 1 pk224850@cluster:~[527]$ -- II. During some experiments with envvars and v1.7.4, got
[OMPI devel] Open MPI's 'mpiexec' trash output of program being aborted?
Dear Open MPI developer, Please take a look at the attached 'program'. In this program, we try to catch signals send from outside, and "handle" them. In case of different signals different output has to be produced. When you start this file directly, or using 'mpiexec' from Intel MPI, and then abort it by Ctrl-C, the output "SIGINT received" is written to file and to StdOut. When you start this file using Open MPI's 'mpiexec', the output is written to file, but *not* to StdOutput - 'mpiexec' seem to nick it. Is that behaviour intentionally? (it is quite uncomfortable, huh) Best Paul Kapinos P.S. Tested versions: 1.6.5, 1.7.4 -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 #!/usr/bin/perl use Sys::Hostname; open (MYFILE, '>>testoutput.txt'); $| = 1; print"running on ", hostname, "\n"; print MYFILE "running on ", hostname, "\n"; $SIG{INT} = sub { print "SIGINT received\n"; print MYFILE "SIGINT received\n"; exit 0 }; $SIG{TERM} = sub { print "SIGTERM received\n"; print MYFILE "SIGTERM received\n"; exit 0 }; sleep 10; close (MYFILE); smime.p7s Description: S/MIME Cryptographic Signature
[OMPI devel] openmpi2.0.0 - bad performance - ALIGNMENT
Dear Open MPI developers, there is already a thread about 'sm BTL performace of the openmpi-2.0.0' https://www.open-mpi.org/community/lists/devel/2016/07/19288.php and we also see 30% bandwidth loss, on communication *via InfiniBand*. And we also have a clue: the IB buffers seem not to be aligned in 2.0.0 - in contrast to previous series (from at least 1.8.x). That means, - if we use a simple wrapper wrapping 'malloc' to the 32-bit-aligned-variant, we get the full bandwidth using the same compiled binary; and - there is nothing to grep in 'ompi_info -all | grep memalign' in 2.0.0 while in 1.10.3 there are 'btl_openib_memalign' and 'btl_openib_memalign_threshold' parameters. => seem the whole 'IB buffer alignment' part vanished in /2.0.0 ? Could we get the aligned IB buffers in 2.x series back, please? It's about 30% of performance Best Paul P.S. btl_openib_get_alignment and btl_openib_put_alignment are by default '0' - setting they high did not change the behaviour... -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
[OMPI devel] --with-devel-headers and intenal 'hwloc' header on 'mpicc --showme:compile'
Hi Jeff, Hi All, did you noticed the last update here: https://github.com/open-mpi/hwloc/issues/229 In short: configure option --with-devel-headers lead to the internal 'hwloc' headers to be on the 'mpicc --showme:compile', > which are absolutely not meant for stuff outside the Open MPI source tree). > If this is happening, then either Open MPI was installed improperly or it is > a real bug, and we should figure out how that is happening. The question is: is this a known and/or intended behaviour? Or is an installation with --with-devel-headers to be considered as installed improperly? Best, Paul Kapinos -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
[OMPI devel] QE, mpif.h and the Intel compiler
Dear Open MPI developer, QE is an MPI program, cf. http://qe-forge.org/gf/download/frsrelease/224/1044/qe-6.0.tar.gz In FFTXlib a questionable source code part is contained which cannot be compiled using version 17 of the Intel compilers and Open MPI; I've condensed it (see attachment). Note that - using Intel MPI all tested compilers (Intel/GCC/Oracle Studio/PGI) can compile stick_base_TEST.f90 with no errors/warnings issued - using Open MPI the same is true for GCC/Oracle Studio/PGI; - using Intel compiler up to version 16 and Open MPI, *warnings* like (1) are issued (compilation succeds); - using Intel compiler /17, *errors* like (2) are issued (compilation fails). - when 'PRIVATE' statement in line #3 removed, also Intel compiler compile the snippet with no error. Well the questions is - is stick_base_TEST.f90 a correct MPI code? - if YES why does the Intel/17 + OpenMPI combination do not like this? If NO why to hell none of other compiler+MPI combinations complain about this? :o) Have a nice day, Paul Kapinos P.S. Did you noticed also this one? https://www.mail-archive.com/users@lists.open-mpi.org//msg30320.html - 1 -- /opt/MPI/openmpi-1.10.4/linux/intel_16.0.2.181/include/mpif-sizeof.h(2254): warning #6738: The type/rank/keyword signature for this specific procedure matches another specific procedure that shares the same generic-name. [PMPI_SIZEOF_REAL64_R15] SUBROUTINE PMPI_Sizeof_real64_r15(x, size, ierror) -^ - 2 -- /opt/MPI/openmpi-1.10.4/linux/intel_17.0.0.064/include/mpif-sizeof.h(220): error #5286: Ambiguous generic interface MPI_SIZEOF: previously declared specific procedure MPI_SIZEOF_COMPLEX32_R13 is not distinguishable from this declaration. [MPI_SIZEOF_COMPLEX32_R13] SUBROUTINE MPI_Sizeof_complex32_r13(x, size, ierror) -^ -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 MODULE stick_base IMPLICIT NONE PRIVATE INCLUDE 'mpif.h' PUBLIC :: sticks_map_set CONTAINS SUBROUTINE sticks_map_set() INCLUDE 'mpif.h' END SUBROUTINE sticks_map_set END MODULE stick_base smime.p7s Description: S/MIME Cryptographic Signature ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
[OMPI devel] #warning "Including liblustreapi.h is deprecated. Include lustreapi.h directly."
Dear Open MPI developer, trying to compile Open MPI 2.0.1 using PGI compilers (19.9 and 16.7) we ran into an error the root of one is not known yet, but we see also this warning: #warning "Including liblustreapi.h is deprecated. Include lustreapi.h directly." in the /usr/include/lustre/liblustreapi.h file, included from 'openmpi-2.0.1/ompi/mca/fs/lustre/fs_lustre.c' file (line 46, doh). well, it is about you on change or keep the way the Lustre headers being included in Open MPI. Just my $2%. Have a nice day, Paul Kapinos pk224850@lnm001:/w0/tmp/pk224850/OpenMPI/2.0.1/pgi_16.9./openmpi-2.0.1/ompi/mca/fs/lustre[527]$ make CC fs_lustre.lo pgcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc1112/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc1112/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -I../../../.. -I../../../../orte/include -I/w0/tmp/pk224850/OpenMPI/2.0.1/pgi_16.9./openmpi-2.0.1/opal/mca/hwloc/hwloc1112/hwloc/include -I/w0/tmp/pk224850/OpenMPI/2.0.1/pgi_16.9./openmpi-2.0.1/opal/mca/event/libevent2022/libevent -I/w0/tmp/pk224850/OpenMPI/2.0.1/pgi_16.9./openmpi-2.0.1/opal/mca/event/libevent2022/libevent/include -DNDEBUG -O0 -c fs_lustre.c -MD -fpic -DPIC -o .libs/fs_lustre.o PGC-W-0267-#warning -- "Including liblustreapi.h is deprecated. Include lustreapi.h directly." (/usr/include/lustre/liblustreapi.h: 41) PGC-W-0114-More than one type specified (/usr/include/libcfs/posix/posix-types.h: 58) PGC-W-0143-Useless typedef declaration (no declarators present) (/usr/include/libcfs/posix/posix-types.h: 58) PGC-W-0114-More than one type specified (/usr/include/libcfs/posix/posix-types.h: 61) PGC-W-0143-Useless typedef declaration (no declarators present) (/usr/include/libcfs/posix/posix-types.h: 61) PGC-W-0114-More than one type specified (/usr/include/libcfs/posix/posix-types.h: 65) PGC-W-0143-Useless typedef declaration (no declarators present) (/usr/include/libcfs/posix/posix-types.h: 65) PGC-W-0114-More than one type specified (/usr/include/libcfs/posix/posix-types.h: 68) PGC-W-0143-Useless typedef declaration (no declarators present) (/usr/include/libcfs/posix/posix-types.h: 68) PGC-W-0114-More than one type specified (/usr/include/libcfs/posix/posix-types.h: 72) PGC-W-0143-Useless typedef declaration (no declarators present) (/usr/include/libcfs/posix/posix-types.h: 72) PGC-W-0114-More than one type specified (/usr/include/libcfs/posix/posix-types.h: 75) PGC-W-0143-Useless typedef declaration (no declarators present) (/usr/include/libcfs/posix/posix-types.h: 75) PGC-W-0114-More than one type specified (/usr/include/libcfs/posix/posix-types.h: 91) PGC-W-0143-Useless typedef declaration (no declarators present) (/usr/include/libcfs/posix/posix-types.h: 91) PGC-W-0114-More than one type specified (/usr/include/libcfs/posix/posix-types.h: 94) PGC-W-0143-Useless typedef declaration (no declarators present) (/usr/include/libcfs/posix/posix-types.h: 94) PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 157) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/sys/quota.h: 157) PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 158) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/sys/quota.h: 158) PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 159) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/sys/quota.h: 159) PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 160) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/sys/quota.h: 160) PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 161) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/sys/quota.h: 161) PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 162) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/sys/quota.h: 162) PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 163) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/sys/quota.h: 163) PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 164) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/sys/quota.h: 164) PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 211) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/sys/quota.h: 211) PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 212) PGC-W-0156-Type not specified, 'int' assumed (/usr/include/sys/quota.h: 212) PGC-W-0095-Type cast required for this conversion (fs_lustre.c: 93) PGC/x86-64 Linux 16.9-0: compilation completed with severe errors make: *** [fs_lustre.lo] Error 1 -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cry
[OMPI devel] weird error message (you'll be puzzled!)
Dear Open MPI developer, please take a look at the attached 'hello MPI world' file. We know that it contain an error (you should never put '1476395012' into MPI_Init_thread() call! It was a typo, initially...) BUT, see what happens if you compile it: $ mpif90 -g mpihelloworld.f90 $ ./a.out 1476395012 3 *** The MPI_Init_thread() function was called before MPI_INIT was invoked. *** This is disallowed by the MPI standard. *** Your MPI job will now abort. [cluster-hpc.rz.RWTH-Aachen.DE:25739] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed For me, reading this: > MPI_Init_thread() function was called before MPI_INIT was invoked. > This is disallowed by the MPI standard ...produced some cognitive dissonance, as the MPI's calls to MPI_Init_thread and MPI_Init are well-known to be *mutually exclusive*. Well maybe with 'MPI_Init_thread() function' something Open MPI- internal is meant instead of MPI's MPI_Init_thread, but the error message stays strongly unbelievable ( 2 + 2 = 6 !) Maybe you can text a better error message? :o) Have a nice day, Paul Kapinos P.S. Tested versions: 1.10.6 and 2.0.1, with support for MPI_THREAD_MULTIPLE > MPI_Init_thread(3) Open MPI MPI_Init_thread(3) > NAME >MPI_Init_thread - Initializes the MPI execution environment > .. > DESCRIPTION >This routine, or MPI_Init, must be called before any other MPI routine >(apart from MPI_Initialized) is called. MPI can be initialized at most >once; subsequent calls to MPI_Init or MPI_Init_thread are erroneous. > >MPI_Init_thread, as compared to MPI_Init, has a provision to request a >certain level of thread support in required: > MPI_Init(3)Open MPIMPI_Init(3) > NAME >MPI_Init - Initializes the MPI execution environment > . > DESCRIPTION >This routine, or MPI_Init_thread, must be called before any other MPI >routine (apart from MPI_Initialized) is called. MPI can be initialized >at most once; subsequent calls to MPI_Init or MPI_Init_thread are erro- >neous. -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 ! Paul Kapinos 22.09.2009 - ! RZ RWTH Aachen, www.rz.rwth-aachen.de ! ! MPI-Hello-World ! PROGRAM PK_MPI_Test USE MPI IMPLICIT NONE !include "mpif.h" ! INTEGER :: my_MPI_Rank, laenge, ierr INTEGER :: requ, provid, required ! INTEGER :: PROVIDED, REQUIRED CHARACTER*(MPI_MAX_PROCESSOR_NAME) my_Host ! !WRITE (*,*) "Jetz penn ich mal 30" !CALL Sleep(30) !WRITE (*,*) "Starten" !CALL MPI_INIT (ierr) required = MPI_THREAD_MULTIPLE requ = 1476395012 WRITE (*,*) requ, required CALL MPI_Init_thread (requ, provid, ierr) WRITE (*,*) "MPI_Init_thread (", requ, provid, ierr, ")" ! REQUIRED = MPI_THREAD_MULTIPLE !MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, MPI_THREAD_SERIALIZED, MPI_THREAD_MULTIPLE ist evil ! CALL MPI_INIT_THREAD(REQUIRED, PROVIDED, ierr) ! WRITE(*,*) "Threading levels: ", MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, MPI_THREAD_SERIALIZED, MPI_THREAD_MULTIPLE ! WRITE(*,*) "Fordere multithreading an:", MPI_THREAD_MULTIPLE, REQUIRED, PROVIDED ! !WRITE (*,*) "Nach MPI_INIT" !CALL Sleep(30) CALL MPI_COMM_RANK( MPI_COMM_WORLD, my_MPI_Rank, ierr ) !WRITE (*,*) "Nach MPI_COMM_RANK" CALL MPI_GET_PROCESSOR_NAME(my_Host, laenge, ierr) WRITE (*,*) "Prozessor ", my_MPI_Rank, "on Host: ", my_Host(1:laenge) ! sleeping or spinnig - the same behaviour !CALL Sleep(2) !DO WHILE (.TRUE.) !ENDDO CALL Sleep(1) !IF (my_MPI_Rank == 1) STOP CALL MPI_FINALIZE(ierr) ! !WRITE (*,*) "Daswars" ! END PROGRAM PK_MPI_Test smime.p7s Description: S/MIME Cryptographic Signature ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] mpi_yield_when_idle=1 and still 100%CPU
Good news everyone! PR #4331 works, I was able to apply it on 3.0.0-release (two hunks succeeded with a offset, and the line 65 in openmpi-3.0.0/ompi/runtime/ompi_mpi_params.c has to be changed > < bool ompi_mpi_yield_when_idle = true; > > bool ompi_mpi_yield_when_idle = false; prior pathing to make the Hunk #2 working). The patched version then (when set '-mca mpi_poll_when_idle true') change load level of waiting processes from 100% load to some 2.5% load, or with other words: the processes sleep well :-) This is true for both trivial example with MPI_Barrier and for full-featured release of ParaView (5.4.1), and thus is a solution of our issue. It would be great if next release of Open MPI will contain this feature, even if it will take years until these version of OpenMPI will be commonly-adopted software (JFYI: CentOS 7.3 has openmpi-1.10.3 as default release; I'm fighting with some ISV to let they update their Sw to 1.10.x NOW; we know about one who just managed to go from 1.6.x to 1.8.x a half year ago...) Thank you very much! Paul Kapinos On 10/12/2017 09:31 AM, Gilles Gouaillardet wrote: > Paul, > > > i made PR #4331 https://github.com/open-mpi/ompi/pull/4431 in order to > implement > this. > > in order to enable passive wait, you simply need to > > mpirun --mca mpi_poll_when_idle true ... > > > fwiw, when you use mpi_yield_when_idle, Open MPI does (highly oversimplified) > > for (...) sched_yield(); > > > as you already noted, top show 100% cpu usage (a closer look shows the usage > is > in the kernel and not user space). > > that being said, since the process is only yielding, the other running > processes > will get most of their time slices, > > and hence the system remains pretty responsive. > > > Can you please give this PR a try ? > > the patch can be manually downloaded at > https://github.com/open-mpi/ompi/pull/4431.patch > > > Cheers, > > > Gilles > > > On 10/12/2017 12:37 AM, Paul Kapinos wrote: >> Dear Jeff, >> Dear All, >> >> we know about *mpi_yield_when_idle* parameter [1]. We read [2]. You're right, >>> if an MPI application is waiting a long time for messages, >>> perhaps its message passing algorithm should be re-designed >> ... but we cannot spur the ParaView/VTK developer to rewrite their software >> famous for busy-wait on any user mouse move with N x 100% CPU load [3]. >> >> It turned out that >> a) (at least some) spin time is on MPI_Barrier call (waitin' user >> interaction) >> b) for Intel MPI and MPICH we found a way to disable this busy wait [4] >> >> c) But, for both 'pvserver' and minimal example (attached), we were not able >> to >> stop the busy waiting with Open MPI: setting *mpi_yield_when_idle* parameter >> to >> '1' just seem to move the spin activity from userland to kernel, with >> staying at >> 100%, cf. attached screenshots and [5]. The behaviour is the same for 1.10.4 >> and >> 2.0.2. >> >> Well, The Question: is there a way/a chance to effectively disable the busy >> wait >> using Open MPI? >> >> Best, >> >> Paul Kapinos >> >> [1] http://www.open-mpi.de/faq/?category=running#force-aggressive-degraded >> [2] >> http://blogs.cisco.com/performance/polling-vs-blocking-message-passingprogress >> [3] >> https://www.paraview.org/Wiki/Setting_up_a_ParaView_Server#Server_processes_always_have_100.25_CPU_usage >> >> [4] >> https://public.kitware.com/pipermail/paraview-developers/2017-October/005587.html >> [5] >> https://serverfault.com/questions/180711/what-exactly-do-the-colors-in-htop-status-bars-mean >> >> >> >> >> >> >> ___ >> devel mailing list >> devel@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/devel > > ___ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel
Re: [OMPI devel] mpi_yield_when_idle=1 and still 100%CPU
Hi Gilles, Thank you for your message and quick path! You likely mean (instead of links in your eMail below) https://github.com/open-mpi/ompi/pull/4331 and https://github.com/open-mpi/ompi/pull/4331.patch for your PR #4331 (note '4331' instead of '4431' :-) I was not able to path 1.10.7 release - likely because you develop on much much newer version of Open MPI. Q1: on *which* release the path 4331 should be applied? Q2: I assume it is unlikely that this patch would be back-ported to 1.10.x? Best Paul Kapinos On 10/12/2017 09:31 AM, Gilles Gouaillardet wrote: > Paul, > > > i made PR #4331 https://github.com/open-mpi/ompi/pull/4431 in order to > implement > this. > > in order to enable passive wait, you simply need to > > mpirun --mca mpi_poll_when_idle true ... > > > fwiw, when you use mpi_yield_when_idle, Open MPI does (highly oversimplified) > > for (...) sched_yield(); > > > as you already noted, top show 100% cpu usage (a closer look shows the usage > is > in the kernel and not user space). > > that being said, since the process is only yielding, the other running > processes > will get most of their time slices, > > and hence the system remains pretty responsive. > > > Can you please give this PR a try ? > > the patch can be manually downloaded at > https://github.com/open-mpi/ompi/pull/4431.patch > > > Cheers, > > > Gilles > > > On 10/12/2017 12:37 AM, Paul Kapinos wrote: >> Dear Jeff, >> Dear All, >> >> we know about *mpi_yield_when_idle* parameter [1]. We read [2]. You're right, >>> if an MPI application is waiting a long time for messages, >>> perhaps its message passing algorithm should be re-designed >> ... but we cannot spur the ParaView/VTK developer to rewrite their software >> famous for busy-wait on any user mouse move with N x 100% CPU load [3]. >> >> It turned out that >> a) (at least some) spin time is on MPI_Barrier call (waitin' user >> interaction) >> b) for Intel MPI and MPICH we found a way to disable this busy wait [4] >> >> c) But, for both 'pvserver' and minimal example (attached), we were not able >> to >> stop the busy waiting with Open MPI: setting *mpi_yield_when_idle* parameter >> to >> '1' just seem to move the spin activity from userland to kernel, with >> staying at >> 100%, cf. attached screenshots and [5]. The behaviour is the same for 1.10.4 >> and >> 2.0.2. >> >> Well, The Question: is there a way/a chance to effectively disable the busy >> wait >> using Open MPI? >> >> Best, >> >> Paul Kapinos >> >> [1] http://www.open-mpi.de/faq/?category=running#force-aggressive-degraded >> [2] >> http://blogs.cisco.com/performance/polling-vs-blocking-message-passingprogress >> [3] >> https://www.paraview.org/Wiki/Setting_up_a_ParaView_Server#Server_processes_always_have_100.25_CPU_usage >> >> [4] >> https://public.kitware.com/pipermail/paraview-developers/2017-October/005587.html >> [5] >> https://serverfault.com/questions/180711/what-exactly-do-the-colors-in-htop-status-bars-mean >> >> >> >> >> >> >> ___ >> devel mailing list >> devel@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/devel > > ___ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel
[OMPI devel] mpi_yield_when_idle=1 and still 100%CPU
Dear Jeff, Dear All, we know about *mpi_yield_when_idle* parameter [1]. We read [2]. You're right, > if an MPI application is waiting a long time for messages, > perhaps its message passing algorithm should be re-designed ... but we cannot spur the ParaView/VTK developer to rewrite their software famous for busy-wait on any user mouse move with N x 100% CPU load [3]. It turned out that a) (at least some) spin time is on MPI_Barrier call (waitin' user interaction) b) for Intel MPI and MPICH we found a way to disable this busy wait [4] c) But, for both 'pvserver' and minimal example (attached), we were not able to stop the busy waiting with Open MPI: setting *mpi_yield_when_idle* parameter to '1' just seem to move the spin activity from userland to kernel, with staying at 100%, cf. attached screenshots and [5]. The behaviour is the same for 1.10.4 and 2.0.2. Well, The Question: is there a way/a chance to effectively disable the busy wait using Open MPI? Best, Paul Kapinos [1] http://www.open-mpi.de/faq/?category=running#force-aggressive-degraded [2] http://blogs.cisco.com/performance/polling-vs-blocking-message-passingprogress [3] https://www.paraview.org/Wiki/Setting_up_a_ParaView_Server#Server_processes_always_have_100.25_CPU_usage [4] https://public.kitware.com/pipermail/paraview-developers/2017-October/005587.html [5] https://serverfault.com/questions/180711/what-exactly-do-the-colors-in-htop-status-bars-mean -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 ! Paul Kapinos 06.10.2017 - ! IT Center RWTH Aachen, www.itc.rwth-aachen.de ! ! MPI-busy-waiting-World ! PROGRAM PK_MPI_Test USE MPI IMPLICIT NONE !include "mpif.h" INTEGER :: my_MPI_Rank, ierr CALL MPI_INIT(ierr) CALL MPI_COMM_RANK( MPI_COMM_WORLD, my_MPI_Rank, ierr ) IF (my_MPI_Rank == 0) CALL Sleep(20) CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) ! all but rank=0 processes busy-wait here... CALL MPI_FINALIZE(ierr) END PROGRAM PK_MPI_Test smime.p7s Description: S/MIME Cryptographic Signature ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel
Re: [OMPI devel] OMPI devel] mpi_yield_when_idle=1 and still 100%CPU
Gilles, a short update about the patched version (3.0.0). After we updated from CentOS 7.3 to 7.4, this version build with all versions of all compilers stopped to work with message like: $ a.out: symbol lookup error: /opt/MPI/openmpi-3.0.0p/linux/intel_17.0.5.239/lib/openmpi/mca_mpool_memkind.so: undefined symbol: memkind_get_kind_by_partition Short look in-depth show up that memkind package containing libmemkind.so.0.0.1 was updated from memkind-1.4.0-1.el7.x86_64 to memkind-1.5.0-1.el7.x86_64 by updating CentOS, and that the older one really contain 'memkind_get_kind_by_partition' symbol whilst the new one did not have this symbol anymore. So I will rebuild this version to look what happen' (regular v3.0.0 didn't have the issue). Nevertheless I liked to give you a report about this side effect of your patch ... [reading the fine google search results first] Likely I see another one of type https://github.com/open-mpi/ompi/issues/4466 Most amazing is that only one version of Open MPI (the patched 3.0.0 one) stops to work instead of all. Seem's we're lucky. WOW. will report on results of 3.0.0p rebuild. best, Paul Kapinos $ objdump -S /usr/lib64/libmemkind.so.0.0.1 | grep -i memkind_get_kind_by_partition 7f70 : 7f76: 77 19 ja 7f91 <memkind_get_kind_by_partition+0x21> 7f89: 74 06 je 7f91 <memkind_get_kind_by_partition+0x21> On 10/12/2017 11:21 AM, Gilles Gouaillardet wrote: > Paul, > > Sorry for the typo. > > The patch was developed on the master branch. > Note v1.10 is no more supported, and since passive wait is a new feature, it > would start at v3.1 or later. > > That being said, if you are kind of stucked with 1.10.7, i can try to craft a > one off patch in order to help > > > Cheers, > > Gilles > > Paul Kapinos <kapi...@itc.rwth-aachen.de> wrote: >> Hi Gilles, >> Thank you for your message and quick path! >> >> You likely mean (instead of links in your eMail below) >> https://github.com/open-mpi/ompi/pull/4331 and >> https://github.com/open-mpi/ompi/pull/4331.patch >> for your PR #4331 (note '4331' instead of '4431' :-) >> >> I was not able to path 1.10.7 release - likely because you develop on much >> much >> newer version of Open MPI. >> >> Q1: on *which* release the path 4331 should be applied? >> >> Q2: I assume it is unlikely that this patch would be back-ported to 1.10.x? >> >> Best >> Paul Kapinos >> >> >> >> >> On 10/12/2017 09:31 AM, Gilles Gouaillardet wrote: >>> Paul, >>> >>> >>> i made PR #4331 https://github.com/open-mpi/ompi/pull/4431 in order to >>> implement >>> this. >>> >>> in order to enable passive wait, you simply need to >>> >>> mpirun --mca mpi_poll_when_idle true ... >>> >>> >>> fwiw, when you use mpi_yield_when_idle, Open MPI does (highly >>> oversimplified) >>> >>> for (...) sched_yield(); >>> >>> >>> as you already noted, top show 100% cpu usage (a closer look shows the >>> usage is >>> in the kernel and not user space). >>> >>> that being said, since the process is only yielding, the other running >>> processes >>> will get most of their time slices, >>> >>> and hence the system remains pretty responsive. >>> >>> >>> Can you please give this PR a try ? >>> >>> the patch can be manually downloaded at >>> https://github.com/open-mpi/ompi/pull/4431.patch >>> >>> >>> Cheers, >>> >>> >>> Gilles >>> >>> >>> On 10/12/2017 12:37 AM, Paul Kapinos wrote: >>>> Dear Jeff, >>>> Dear All, >>>> >>>> we know about *mpi_yield_when_idle* parameter [1]. We read [2]. You're >>>> right, >>>>> if an MPI application is waiting a long time for messages, >>>>> perhaps its message passing algorithm should be re-designed >>>> ... but we cannot spur the ParaView/VTK developer to rewrite their software >>>> famous for busy-wait on any user mouse move with N x 100% CPU load [3]. >>>> >>>> It turned out that >>>> a) (at least some) spin time is on MPI_Barrier call (waitin' user >>>> interaction) >>>> b) for Intel MPI and MPICH we found a way to disable this busy wait [4] >>>> >>>> c) But, for both 'pvserver' and minimal example (attached), we were not >>>> able to >>>> stop the busy waiting with Op
Re: [OMPI devel] NVIDIA 'nvfortran' cannot link libmpi_usempif08.la
JFYI: the sane issue is also in Open MPI 4.1.1. I cannot open an Gitlab issue due to lack of account(*) so I would kindly ask somebody to open one, if possible. Have a nice day Paul Kapinos (* too many accounts in my life. ) On 4/16/21 6:02 PM, Paul Kapinos wrote: Dear Open MPI developers, trying to build OpenMPI/4.1.0 using the NVIDIA compilers [1] (version 21.1.xx, 'rebranded' PGI compilers..) we ran into the below error at linking of libmpi_usempif08.la. It seems something is going bad at the configure stage (detection of PIC flags?!). Note that last 'true' PGI compiler (tried: pgi_20.4) did not produce that issue. A known workaround is to add '-fPIC' to the CFLAGS, CXXFLAGS, FCFLAGS (maybe not need to all of those). (I do not add config.log this time to avoid lockot from the mailing list; of course I can provide this and all other kind of information) Have a nice day, Paul Kapinos [1] https://developer.nvidia.com/hpc-compilers FCLD libmpi_usempif08.la /usr/bin/ld: .libs/comm_spawn_multiple_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/startall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/testall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/testany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/testsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/type_create_struct_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/type_get_contents_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/waitall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/waitany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/waitsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/pcomm_spawn_multiple_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/pstartall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/ptestall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/ptestany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/ptestsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/ptype_create_struct_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/ptype_get_contents_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/pwaitall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/pwaitany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/pwaitsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/abort_f08.o: relocation R_X86_64_PC32 against symbol `ompi_abort_f' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: Bad value -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
[OMPI devel] inconsistency of behaviour and description of "openmpi-4.1.0$ ./configure --with-pmix=external ... "
Dear Open MPI developer, The help for ./configure in openmpi-4.1.0 says (see also below): > external" forces Open MPI to use an external installation of PMIx. If I understood correctly, this (also) means that if there is *no* external PMIx available on the system, the ./configure run should fail and exit with non-zero exit code. However, when trying this on a node with no PMIx library installed, we just found out that the configure finishes with no error message, and configure also internal PMIx, > checking if user requested internal PMIx support(external)... no > ... > PMIx support: Internal This is surprising and feels like an error. Could you have a look at this? Thank you! Have a nice day, Paul Kapinos P.S. grep for 'PMIx' in config-log https://rwth-aachen.sciebo.de/s/xtNIx2dJlTy2Ams (pastebin and gist both need accounts and I hate accounts). ./configure --help . --with-pmix(=DIR) Build PMIx support. DIR can take one of three values: "internal", "external", or a valid directory name. "internal" (or no DIR value) forces Open MPI to use its internal copy of PMIx. "external" forces Open MPI to use an external installation of PMIx. Supplying a valid directory name also forces Open MPI to use an external installation of PMIx, and adds DIR/include, DIR/lib, and DIR/lib64 to the search path for headers and libraries. Note that Open MPI does not support --without-pmix. -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
[OMPI devel] NVIDIA 'nvfortran' cannot link libmpi_usempif08.la
Dear Open MPI developers, trying to build OpenMPI/4.1.0 using the NVIDIA compilers [1] (version 21.1.xx, 'rebranded' PGI compilers..) we ran into the below error at linking of libmpi_usempif08.la. It seems something is going bad at the configure stage (detection of PIC flags?!). Note that last 'true' PGI compiler (tried: pgi_20.4) did not produce that issue. A known workaround is to add '-fPIC' to the CFLAGS, CXXFLAGS, FCFLAGS (maybe not need to all of those). (I do not add config.log this time to avoid lockot from the mailing list; of course I can provide this and all other kind of information) Have a nice day, Paul Kapinos [1] https://developer.nvidia.com/hpc-compilers FCLD libmpi_usempif08.la /usr/bin/ld: .libs/comm_spawn_multiple_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/startall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/testall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/testany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/testsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/type_create_struct_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/type_get_contents_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/waitall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/waitany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/waitsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/pcomm_spawn_multiple_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/pstartall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/ptestall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/ptestany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/ptestsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/ptype_create_struct_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/ptype_get_contents_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/pwaitall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/pwaitany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: profile/.libs/pwaitsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: .libs/abort_f08.o: relocation R_X86_64_PC32 against symbol `ompi_abort_f' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: Bad value -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature