Re: [OMPI devel] Master build failure on Mac OS 10.8 with --enable-static/--disable-shared
Scratching my head over this one - I can replicate it, but need to think a bit on how to solve it. On Mon, Feb 2, 2015 at 7:08 PM, Paul Hargrove wrote: > I have a Mac OSX 10.8 system, where cc is clang. > I have no problems with a default build from the current master tarball. > However, a static-only build leads to a link failure on opal_wrapper. > > Configured with > --prefix=... --enable-debug CC=cc CXX=c++ --enable-static > --disable-shared > > Failing portion of "make V=1": > > /bin/sh ../../../libtool --tag=CC --mode=link cc -g -finline-functions > -fno-strict-aliasing -export-dynamic-o opal_wrapper opal_wrapper.o > ../../../opal/libopen-pal.la > libtool: link: cc -g -finline-functions -fno-strict-aliasing -o > opal_wrapper opal_wrapper.o ../../../opal/.libs/libopen-pal.a -lm > Undefined symbols for architecture x86_64: > "_opal_pmix", referenced from: > _opal_get_proc_hostname in libopen-pal.a(proc.o) > ld: symbol(s) not found for architecture x86_64 > clang: error: linker command failed with exit code 1 (use -v to see > invocation) > > -Paul > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16899.php >
Re: [OMPI devel] RFC: Remove embedded libltdl
On Mon, Feb 2, 2015 at 5:47 PM, Paul Hargrove wrote: > I'll report my test results more completely later, but all 4 PGI-based > builds I have results for so far have failed with libtool replacing > "-lltdl" in link command line with "/usr/lib/libltdl.so" rather than the > correct "/usr/lib64/libltdl.so". > > So, this is a PGI compiler issue not a Cray one. > Will know later is "PGI" needs to be replaced with "non-GNU" > All non-PGI compilers tested out fine, including Open64, PathScale, Oracle/Studio, IBM and Intel. I found no other problems with Jeff's tarball that aren't also present in master. My PGI testers (one each for v 9, 10, 11, 12, 13, and 14) are all on 2 systems at NERSC. I am now going to see about a PGI compiler on a system at another center (or two?) in order to see how universal the problem is. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] RFC: Remove embedded libltdl
On Mon, Feb 2, 2015 at 9:26 PM, Paul Hargrove wrote: > I am now going to see about a PGI compiler on a system at another center > (or two?) in order to see how universal the problem is. That was a dead-end. Of the many non-NERSC non-Cray institutions where I have accounts, I could only find one that still has PGI compilers. However, they don't have libltdl installed! -Paul -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] open mpi
hello, can anyone tell how to get idea about which interface of master node is being used while i m running open mpi program on cluster from master node ?? thanking you khushi
[OMPI devel] failed to open libltdl.so
I found another failure mode for non-embedded libltdl. On a system with libltdl.so on the login node but NOT the compute nodes I encountered the following, once per rank, at job launch: /home/phhargrove/OMPI/openmpi-libltdl-linux-x86_64 psm/INST/bin/orted: error while loading shared libraries: libltdl.so.3: cannot open shared object file: No such file or directory The mpirun command hung as a result. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] Master hangs in opal_fifo test
I have seen opal_fifo hang on 2 distinct systems + Linux/ppc32 with xlc-11.1 + Linux/x86-64 with icc-14.0.1.106 I have no explanation to offer for either hang. No "weird" configure options were passed to either. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] Master build broken libfabrics + PGI
On a Linux/x86_64 system with PGI-14.3 I have configured a current master tarball with the following: --prefix=... --enable-debug CC=pgcc CXX=pgCC FC=pgfortran I see "make V=1" fail as shown below. This does NOT occur with GNU or Intel compilers on the same system. Initial guess is mis-ordered includes. -Paul DEPDIR=.deps depmode=pgcc /bin/sh ../../../../../openmpi-dev-803-g5919b63/config/depcomp \ /bin/sh ../../../../libtool --tag=CC --mode=compile pgcc -DHAVE_CONFIG_H -I. -I../../../../../openm pi-dev-803-g5919b63/opal/mca/common/libfabric -I../../../../opal/include -I../../../../ompi/include -I../../.. /../oshmem/include -I../../../../opal/mca/common/libfabric/libfabric -I../../../../opal/mca/hwloc/hwloc191/hwl oc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen -I/scratch/hargr ove/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric -I/scratch/hargrove/openmpi-dev-803-g5919b63/ opal/mca/common/libfabric/libfabric/include -D_GNU_SOURCE -DSYSCONFDIR=\"/scratch/hargrove/inst/etc\" -DRDMADI R=\"/tmp\" -DEXTDIR=\"/scratch/hargrove/inst/lib/openmpi\" -I../../../../../openmpi-dev-803-g5919b63 -I../../. ./.. -I../../../../../openmpi-dev-803-g5919b63/opal/include -I../../../../../openmpi-dev-803-g5919b63/orte/inc lude -I../../../../orte/include -I../../../../../openmpi-dev-803-g5919b63/ompi/include -I../../../../../openmp i-dev-803-g5919b63/oshmem/include -I/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/hwloc/hwloc191/hwloc /include -I/scratch/hargrove/bld/opal/mca/hwloc/hwloc191/hwloc/include -I/scratch/hargrove/openmpi-dev-803-g59 19b63/opal/mca/event/libevent2022/libevent -I/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/event/libeven t2022/libevent/include -I/scratch/hargrove/bld/opal/mca/event/libevent2022/libevent/include -g -c -o libfabric/src/libmca_common_libfabric_la-fi_tostr.lo `test -f 'libfabric/src/fi_tostr.c' || echo '../../../../../openmpi-dev-803-g5919b63/opal/mca/common/libfabric/'`libfabric/src/fi_tostr.c libtool: compile: pgcc -DHAVE_CONFIG_H -I. -I../../../../../openmpi-dev-803-g5919b63/opal/mca/common/libfabric -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/common/libfabric/libfabric -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen -I/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric -I/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric/include -D_GNU_SOURCE -DSYSCONFDIR=\"/scratch/hargrove/inst/etc\" -DRDMADIR=\"/tmp\" -DEXTDIR=\"/scratch/hargrove/inst/lib/openmpi\" -I../../../../../openmpi-dev-803-g5919b63 -I../../../.. -I../../../../../openmpi-dev-803-g5919b63/opal/include -I../../../../../openmpi-dev-803-g5919b63/orte/include -I../../../../orte/include -I../../../../../openmpi-dev-803-g5919b63/ompi/include -I../../../../../openmpi-dev-803-g5919b63/oshmem/include -I/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/hwloc/hwloc191/hwloc/include -I/scratch/hargrove/bld/opal/mca/hwloc/hwloc191/hwloc/include -I/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/event/libevent2022/libevent -I/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/event/libevent2022/libevent/include -I/scratch/hargrove/bld/opal/mca/event/libevent2022/libevent/include -g -c ../../../../../openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric/src/fi_tostr.c -MD -fpic -DPIC -o libfabric/src/.libs/libmca_common_libfabric_la-fi_tostr.o PGC-S-0040-Illegal use of symbol, pthread_mutex_t (/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric/include/rdma/fi_eq.h: 75) PGC-W-0156-Type not specified, 'int' assumed (/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric/include/rdma/fi_eq.h: 75) PGC-S-0040-Illegal use of symbol, pthread_cond_t (/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric/include/rdma/fi_eq.h: 76) PGC-W-0156-Type not specified, 'int' assumed (/scratch/hargrove/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric/include/rdma/fi_eq.h: 76) PGC-S-0043-Redefinition of symbol, pthread_mutex_t (/usr/include/x86_64-linux-gnu/bits/pthreadtypes.h: 104) PGC-S-0043-Redefinition of symbol, pthread_cond_t (/usr/include/x86_64-linux-gnu/bits/pthreadtypes.h: 130) PGC/x86-64 Linux 14.3-0: compilation completed with severe errors make[2]: *** [libfabric/src/libmca_common_libfabric_la-fi_tostr.lo] Error 1 -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] Master hangs in opal_LIFO test
CORRECTION: It is the opal_lifo (not fifo) test which hung on both systems. -Paul On Mon, Feb 2, 2015 at 11:03 PM, Paul Hargrove wrote: > I have seen opal_fifo hang on 2 distinct systems > + Linux/ppc32 with xlc-11.1 > + Linux/x86-64 with icc-14.0.1.106 > > I have no explanation to offer for either hang. > No "weird" configure options were passed to either. > > -Paul > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] Master hangs in opal_LIFO test
There is right now another bug report concerning opal_lifo and ppc64 here: https://github.com/open-mpi/ompi/issues/371 and there were hangs on ppc64 a few weeks ago in opal_lifo which Nathan fixed with additional barriers. On Mon, Feb 02, 2015 at 11:18:43PM -0800, Paul Hargrove wrote: > CORRECTION: > > It is the opal_lifo (not fifo) test which hung on both systems. > > -Paul > > On Mon, Feb 2, 2015 at 11:03 PM, Paul Hargrove wrote: > > > I have seen opal_fifo hang on 2 distinct systems > > + Linux/ppc32 with xlc-11.1 > > + Linux/x86-64 with icc-14.0.1.106 > > > > I have no explanation to offer for either hang. > > No "weird" configure options were passed to either. > > > > -Paul > > > > -- > > Paul H. Hargrove phhargr...@lbl.gov > > Computer Languages & Systems Software (CLaSS) Group > > Computer Science Department Tel: +1-510-495-2352 > > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16913.php
Re: [OMPI devel] Master hangs in opal_LIFO test
Paul, George and i were able to reproduce this issue with icc 14.0 but not with icc 14.3 and later i am trying to see how the difference/bug could be automatically handled Cheers, Gilles On 2015/02/03 16:18, Paul Hargrove wrote: > CORRECTION: > > It is the opal_lifo (not fifo) test which hung on both systems. > > -Paul > > On Mon, Feb 2, 2015 at 11:03 PM, Paul Hargrove wrote: > >> I have seen opal_fifo hang on 2 distinct systems >> + Linux/ppc32 with xlc-11.1 >> + Linux/x86-64 with icc-14.0.1.106 >> >> I have no explanation to offer for either hang. >> No "weird" configure options were passed to either. >> >> -Paul >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> > > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16913.php
Re: [OMPI devel] HELP in OpenMPI - for PH.D research
2015-02-02 22:08 UTC+01:00, Jeff Squyres (jsquyres) : > On Jan 25, 2015, at 1:06 PM, Cyrille DIBAMOU MBEUYO > wrote: >> >> Good afternoon development team, >> >> I have a small problem in OpenMPI to achieve my Ph.D research >> >> My problem is that : >> >> while saving the context.PID of a process running on a node with BLCR >> through OpenMPI on the checkpoint folder, i also want to get and save the >> utilisation average of the CPU and the Memory for this process on a >> file, and use this informations later. > > I was hoping Adrian would answer here, since this is a CR question. :-) > > The current code does not do this, as you have discovered -- the only way to > save it would be to modify the code to do this. Are you comfortable doing > that? > > If so, what version of OMPI are you using? >>> I'm using Open MPI 1.6.5 > >> Or there is another method to have this informations ? > > Do you want this information in an ongoing basis, or just when you > checkpoint / restart? >>> I want this information when i checkpoint/restart > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16880.php > -- DIBAMOU MBEUYO Cyrille Computer Engineer, M.Sc. Ph.D. Student in Computer Science *Mobile* : (+237) 696 608 826 / 674 979 502 The University Of Ngaoundere, CAMEROUN *Other Email *: cdiba...@univ-ndere.cm
Re: [OMPI devel] HELP in OpenMPI - for PH.D research
Have you looked at the "self" CR module? > On Feb 3, 2015, at 8:46 AM, Cyrille DIBAMOU MBEUYO wrote: > > 2015-02-02 22:08 UTC+01:00, Jeff Squyres (jsquyres) : >> On Jan 25, 2015, at 1:06 PM, Cyrille DIBAMOU MBEUYO >> wrote: >>> >>> Good afternoon development team, >>> >>> I have a small problem in OpenMPI to achieve my Ph.D research >>> >>> My problem is that : >>> >>> while saving the context.PID of a process running on a node with BLCR >>> through OpenMPI on the checkpoint folder, i also want to get and save the >>> utilisation average of the CPU and the Memory for this process on a >>> file, and use this informations later. >> >> I was hoping Adrian would answer here, since this is a CR question. :-) >> >> The current code does not do this, as you have discovered -- the only way to >> save it would be to modify the code to do this. Are you comfortable doing >> that? >> >> If so, what version of OMPI are you using? I'm using Open MPI 1.6.5 >> >>> Or there is another method to have this informations ? >> >> Do you want this information in an ongoing basis, or just when you >> checkpoint / restart? I want this information when i checkpoint/restart >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/02/16880.php >> > > > -- > DIBAMOU MBEUYO Cyrille > Computer Engineer, M.Sc. > Ph.D. Student in Computer Science > *Mobile* : (+237) 696 608 826 / 674 979 502 > The University Of Ngaoundere, CAMEROUN > *Other Email *: cdiba...@univ-ndere.cm > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16916.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] open mpi
Let's see if I understand you correctly. You are running "mpirun" on the master node, with your applications running on other nodes in the cluster. In that situation, mpirun is using TCP sockets to communicate with the OMPI daemons on the remote nodes, and you would like to know which Ethernet interface on the master node is being used for that purpose. Correct? If so, then add "-mca oob_base_verbose 30" to your cmd line. You'll get a bunch of output as it will report everything about the messaging system, but you'll early on see which interface is being used. I would suggest just running it with one non-master node and doing a "hostname" command to limit the noise. The selection of interface is done the same way every time, so it will wind up picking the same interface unless the backend nodes change their connectivity. On Mon, Feb 2, 2015 at 10:52 PM, khushi popat wrote: > hello, > > > can anyone tell how to get idea about which interface of master node is > being used while i m running open mpi program on cluster from master node ?? > > > thanking you > khushi > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16909.php >
Re: [OMPI devel] HELP in OpenMPI - for PH.D research
Thank you, i'll look for it. Best regards. 2015-02-03 14:57 UTC+01:00, Jeff Squyres (jsquyres) : > Have you looked at the "self" CR module? > >> On Feb 3, 2015, at 8:46 AM, Cyrille DIBAMOU MBEUYO >> wrote: >> >> 2015-02-02 22:08 UTC+01:00, Jeff Squyres (jsquyres) : >>> On Jan 25, 2015, at 1:06 PM, Cyrille DIBAMOU MBEUYO >>> wrote: Good afternoon development team, I have a small problem in OpenMPI to achieve my Ph.D research My problem is that : while saving the context.PID of a process running on a node with BLCR through OpenMPI on the checkpoint folder, i also want to get and save the utilisation average of the CPU and the Memory for this process on a file, and use this informations later. >>> >>> I was hoping Adrian would answer here, since this is a CR question. :-) >>> >>> The current code does not do this, as you have discovered -- the only way >>> to >>> save it would be to modify the code to do this. Are you comfortable >>> doing >>> that? >>> >>> If so, what version of OMPI are you using? > I'm using Open MPI 1.6.5 >>> Or there is another method to have this informations ? >>> >>> Do you want this information in an ongoing basis, or just when you >>> checkpoint / restart? > I want this information when i checkpoint/restart >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/02/16880.php >>> >> >> >> -- >> DIBAMOU MBEUYO Cyrille >> Computer Engineer, M.Sc. >> Ph.D. Student in Computer Science >> *Mobile* : (+237) 696 608 826 / 674 979 502 >> The University Of Ngaoundere, CAMEROUN >> *Other Email *: cdiba...@univ-ndere.cm >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/02/16916.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16917.php > -- DIBAMOU MBEUYO Cyrille Computer Engineer, M.Sc. Ph.D. Student in Computer Science *Mobile* : (+237) 696 608 826 / 674 979 502 The University Of Ngaoundere, CAMEROUN *Other Email *: cdiba...@univ-ndere.cm
Re: [OMPI devel] Master hangs in opal_fifo test
Thats the second report involving icc 14. I will dig into this later this week. -Nathan On Mon, Feb 02, 2015 at 11:03:41PM -0800, Paul Hargrove wrote: >I have seen opal_fifo hang on 2 distinct systems > + Linux/ppc32 with xlc-11.1 > + Linux/x86-64 with icc-14.0.1.106 >I have no explanation to offer for either hang. >No "weird" configure options were passed to either. >-Paul >-- >Paul H. Hargrove phhargr...@lbl.gov >Computer Languages & Systems Software (CLaSS) Group >Computer Science Department Tel: +1-510-495-2352 >Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16911.php pgp61B5texc0o.pgp Description: PGP signature