Re: [OMPI devel] oshmem enabled by default
Since "disabled by default" is just part of a macro argument we can say anything we want. I propose the following: Index: config/oshmem_configure_options.m4 === --- config/oshmem_configure_options.m4 (revision 32424) +++ config/oshmem_configure_options.m4 (working copy) @@ -22,7 +22,7 @@ AC_MSG_CHECKING([if want oshmem]) AC_ARG_ENABLE([oshmem], [AC_HELP_STRING([--enable-oshmem], - [Enable building the OpenSHMEM interface (disabled by default)])], + [Enable building the OpenSHMEM interface (available on Linux only, where it is enabled by default)])], [oshmem_arg_given=yes], [oshmem_arg_given=no]) if test "$oshmem_arg_given" = "yes"; then -Paul On Mon, Aug 4, 2014 at 7:34 PM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Paul, > > this is a bit trickier ... > > on a Linux platform oshmem is built by default, > on a non Linux platform, oshmem is *not* built by default. > > so the configure message (disabled by default) is correct on non Linux > platform, and incorrect on Linux platform ... > > i do not know what should be done, here are some options : > - have a different behaviour on Linux vs non Linux platforms (by the way, > does autotools support this ?) > - disable by default, provide only the --enable-oshmem option (so > configure abort if --enable-oshmem on non Linux platforms) > - provide only the --disable-oshmem option, useful only on Linux > platforms. on non Linux platforms do not build oshmem and this is not an > error > - other ? > > Cheers, > > Gilles > > r31155 | rhc | 2014-03-20 05:32:15 +0900 (Thu, 20 Mar 2014) | 5 lines > > As per the thread on ticket #4399, OSHMEM does not support non-Linux > platforms. So provide a check for Linux and error out if --enable-oshmem is > given on a non-supported platform. If no OSHMEM option is given (enable or > disable), then don't attempt to build OSHMEM unless we are on a Linux > platform. Default to building if we are on Linux for now, pending the > outcome of the Debian situation. > > > On 2014/08/05 6:41, Paul Hargrove wrote: > > In both trunk and 1.8.2rc3 the behavior is to enable oshmem by default. > > In the 1.8.2rc3 tarball the configure help output matches the behavior. > HOWEVER, in the trunk the configure help output still says oshmem is > DISabled by default. > > {~/OMPI/ompi-trunk}$ svn info | grep "Revision" > Revision: 32422 > {~/OMPI/ompi-trunk}$ ./configure --help | grep -A1 'enable-oshmem ' > --enable-oshmem Enable building the OpenSHMEM interface (disabled > by > default) > > -Paul > > > On Thu, Jul 24, 2014 at 2:09 PM, Ralph Castain > wrote: > > > Actually, it already is set correctly - the help message was out of date, > so I corrected that. > > On Jul 24, 2014, at 10:58 AM, Marco Atzeri > wrote: > > > On 24/07/2014 15:52, Ralph Castain wrote: > > Oshmem should be enabled by default now > > Ok, > so please reverse the configure switch > > --enable-oshmem Enable building the OpenSHMEM interface > > (disabled by default) > > I will test enabling it in the meantime. > > Regards > Marco > > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2014/07/15254.php > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this > post:http://www.open-mpi.org/community/lists/devel/2014/07/15261.php > > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15502.php > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15507.php > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] oshmem enabled by default
Paul, this is a bit trickier ... on a Linux platform oshmem is built by default, on a non Linux platform, oshmem is *not* built by default. so the configure message (disabled by default) is correct on non Linux platform, and incorrect on Linux platform ... i do not know what should be done, here are some options : - have a different behaviour on Linux vs non Linux platforms (by the way, does autotools support this ?) - disable by default, provide only the --enable-oshmem option (so configure abort if --enable-oshmem on non Linux platforms) - provide only the --disable-oshmem option, useful only on Linux platforms. on non Linux platforms do not build oshmem and this is not an error - other ? Cheers, Gilles r31155 | rhc | 2014-03-20 05:32:15 +0900 (Thu, 20 Mar 2014) | 5 lines As per the thread on ticket #4399, OSHMEM does not support non-Linux platforms. So provide a check for Linux and error out if --enable-oshmem is given on a non-supported platform. If no OSHMEM option is given (enable or disable), then don't attempt to build OSHMEM unless we are on a Linux platform. Default to building if we are on Linux for now, pending the outcome of the Debian situation. On 2014/08/05 6:41, Paul Hargrove wrote: > In both trunk and 1.8.2rc3 the behavior is to enable oshmem by default. > > In the 1.8.2rc3 tarball the configure help output matches the behavior. > HOWEVER, in the trunk the configure help output still says oshmem is > DISabled by default. > > {~/OMPI/ompi-trunk}$ svn info | grep "Revision" > Revision: 32422 > {~/OMPI/ompi-trunk}$ ./configure --help | grep -A1 'enable-oshmem ' > --enable-oshmem Enable building the OpenSHMEM interface (disabled > by > default) > > -Paul > > > On Thu, Jul 24, 2014 at 2:09 PM, Ralph Castain wrote: > >> Actually, it already is set correctly - the help message was out of date, >> so I corrected that. >> >> On Jul 24, 2014, at 10:58 AM, Marco Atzeri wrote: >> >>> On 24/07/2014 15:52, Ralph Castain wrote: Oshmem should be enabled by default now >>> Ok, >>> so please reverse the configure switch >>> >>> --enable-oshmem Enable building the OpenSHMEM interface >> (disabled by default) >>> I will test enabling it in the meantime. >>> >>> Regards >>> Marco >>> >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15254.php >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15261.php >> > > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15502.php
[OMPI devel] minor atomics nit
Running "make dist" on trunk I see: --> Generating assembly for "SPARC" "default-.text-.globl-:--.L-#-1-0-1-0-0" Could not open ../../../opal/asm/base/SPARC.asm: No such file or directory Which is apparent because the following lines were never removed from opal/asm/asm-data.txt # default compile mode on Solaris. Evil. equiv to about Sparc v8 SPARC default-.text-.globl-:--.L-#-1-0-1-0-0 sparc-solaris README is clear about having dropped support for SPARC < v8plus. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] [vt] --with-openmpi-inside configure argument
I noticed that Open MPI is passing --with-openmpi-inside=1.7 in the arguments passed to ompi/contrib/vt/vt/configure and ompi/contrib/vt/vt/extlib/otf/configure The extlib/otf case just tests if the value is set, but the top-level vt/configure is checking for the specific string "1.7": # Check whether we are inside Open MPI package inside_openmpi="no" AC_ARG_WITH(openmpi-inside, [], [ AS_IF([test x"$withval" = "xyes" -o x"$withval" = "x1.7"], [ inside_openmpi="$withval" CPPFLAGS="-DINSIDE_OPENMPI $CPPFLAGS" # Set FC to F77 if Open MPI version < 1.7 AS_IF([test x"$withval" = "xyes" -a x"$FC" = x -a x"$F77" != x], [FC="$F77"]) ]) ]) That logic looks a bit fragile with respect to any future changes. Specifically the inner AS_IF is true for the desired condition "version < 1.7" only because the outer AS_IF currently ensures the only possible values of "$withval" are "yes" and "1.7". -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO
My thought was to post initially as a blocker, pending a discussion with Jeff at tomorrow's telecon. If he thinks this is something we can fix in some central point (thus catching it everywhere), then it could be quick and worth doing. However, I'm skeptical as I tried to do that in the most obvious place, and it failed (could be operator error). Will let you know tomorrow. Truly appreciate your digging on this! Ralph On Aug 4, 2014, at 3:50 PM, Paul Hargrove wrote: > Ralph and Jeff, > > I've been digging and find the problem is wider than just the one library and > has manifestations specific to FreeBSD, NetBSD and Solaris. I am adding new > info to the ticket as I unearth it. > > Additionally, it appears this existed in 1.8, 1.8.1 and in the 1.7 series as > well. > So, would suggest this NOT be a blocker for a 1.8.2 release. > > Of course I am willing to provide testing if you still want to push for a > quick resolution. > > -Paul > > > On Mon, Aug 4, 2014 at 1:27 PM, Ralph Castain wrote: > Okay, I filed a blocker on this for 1.8.2 and assigned it to Jeff. I took a > crack at fixing it, but came up short :-( > > > On Aug 3, 2014, at 10:46 PM, Paul Hargrove wrote: > >> I've identified the difference between the platform that does link libutil >> and the one that does not. >> >> 1) libutil is linked (as an OMPI dependency) only on the working system: >> >> Working system: >> $ grep 'checking for .* LIBS' configure.out >> checking for OPAL LIBS... -lm -lpciaccess -ldl >> checking for ORTE LIBS... -lm -lpciaccess -ldl -ltorque >> checking for OMPI LIBS... -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil >> >> NON-working system: >> $ grep 'checking for .* LIBS' configure.out >> checking for OPAL LIBS... -lm -ldl >> checking for ORTE LIBS... -lm -ldl -ltorque >> checking for OMPI LIBS... -lm -ldl -ltorque >> >> So, the working system that does link libutil is doing so as an OMPI >> dependency. >> However it is also needed for opal (only caller of openpty is >> opal/util/open_pty.c). >> >> 2) Only the working system is building ROMIO: >> >> Comparing the 'checking if * can compile' lines of configure output shows >> only ONE difference: >> >> checking if MCA component fs:ufs can compile... yes >> checking if MCA component fs:pvfs2 can compile... no >> checking if MCA component io:ompio can compile... yes >> -checking if MCA component io:romio can compile... no >> +checking if MCA component io:romio can compile... yes >> checking if MCA component mpool:grdma can compile... yes >> checking if MCA component mpool:sm can compile... yes >> checking if MCA component mpool:udreg can compile... no >> >> So, it appears that *if* ROMIO is configured in, then "-lutil" gets added to >> OMPI_WRAPPER_EXTRA_LIBS. >> This masks the fact that it is missing from OPAL_WRAPPER_EXTRA_LIBS. >> >> >> I have confirmed that I can reproduce the static linking failure by adding >> --disable-io-romio to the configure options of the system that worked >> previously. >> >> So, I update my report (and the email subject line) to: >>Static linking fails on Linux when not building ROMIO >> >> -Paul >> >> >> >> On Sun, Aug 3, 2014 at 6:22 PM, Paul Hargrove wrote: >> Hmm, >> >> On a different Linux/x86-64 host things work as expected with '-lutil' >> linked explicitly: >> >> $ ./INST/bin/mpicc -showme BLD/examples/hello_c.c >> pgcc BLD/examples/hello_c.c >> -I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/include >> -L/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib >> -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath >> -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib >> -Wl,-rpath >> -Wl,/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib >> >> -L/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib >> -lmpi -lopen-rte -lopen-pal -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil >> >> Searching for relevant differences now... >> >> -Paul >> >> >> On Sun, Aug 3, 2014 at 4:58 PM, Paul Hargrove wrote: >> >> I've configured the 1.8.2rc3 tarball with "--enable-static --disable-shared" >> on a fairly standard Linux/x86-64 platform. While there are no problems on >> the same platform w/o these configure flags, with them I cannot link any >> application codes. >> >> $ mpicc -ghello_c.c -o hello_c >> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib/libopen-pal.a(opal_pty.o): >> In function `opal_openpty': >> opal_pty.c:(.text+0x1): undefined reference to `openpty' >> >> I checked "make openpty" and the manpage says to link with '-lutil'. >> The '-showme' does not show libutil: >> >> $ mpicc -showme hello_c.c >> gcc hello_c.c >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/include >> -pthread -L/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >> -Wl,/usr/syscom/opt
Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO
Ralph and Jeff, I've been digging and find the problem is wider than just the one library and has manifestations specific to FreeBSD, NetBSD and Solaris. I am adding new info to the ticket as I unearth it. Additionally, it appears this existed in 1.8, 1.8.1 and in the 1.7 series as well. So, would suggest this NOT be a blocker for a 1.8.2 release. Of course I am willing to provide testing if you still want to push for a quick resolution. -Paul On Mon, Aug 4, 2014 at 1:27 PM, Ralph Castain wrote: > Okay, I filed a blocker on this for 1.8.2 and assigned it to Jeff. I took > a crack at fixing it, but came up short :-( > > > On Aug 3, 2014, at 10:46 PM, Paul Hargrove wrote: > > I've identified the difference between the platform that does link libutil > and the one that does not. > > 1) libutil is linked (as an OMPI dependency) only on the working system: > > Working system: > $ grep 'checking for .* LIBS' configure.out > checking for OPAL LIBS... -lm -lpciaccess -ldl > checking for ORTE LIBS... -lm -lpciaccess -ldl -ltorque > checking for OMPI LIBS... -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil > > NON-working system: > $ grep 'checking for .* LIBS' configure.out > checking for OPAL LIBS... -lm -ldl > checking for ORTE LIBS... -lm -ldl -ltorque > checking for OMPI LIBS... -lm -ldl -ltorque > > So, the working system that does link libutil is doing so as an OMPI > dependency. > However it is also needed for opal (only caller of openpty is > opal/util/open_pty.c). > > 2) Only the working system is building ROMIO: > > Comparing the 'checking if * can compile' lines of configure output shows > only ONE difference: > > checking if MCA component fs:ufs can compile... yes > checking if MCA component fs:pvfs2 can compile... no > checking if MCA component io:ompio can compile... yes > -checking if MCA component io:romio can compile... no > +checking if MCA component io:romio can compile... yes > checking if MCA component mpool:grdma can compile... yes > checking if MCA component mpool:sm can compile... yes > checking if MCA component mpool:udreg can compile... no > > So, it appears that *if* ROMIO is configured in, then "-lutil" gets added > to OMPI_WRAPPER_EXTRA_LIBS. > This masks the fact that it is missing from OPAL_WRAPPER_EXTRA_LIBS. > > > I have confirmed that I can reproduce the static linking failure by adding > --disable-io-romio to the configure options of the system that worked > previously. > > So, I update my report (and the email subject line) to: >Static linking fails on Linux when not building ROMIO > > -Paul > > > > On Sun, Aug 3, 2014 at 6:22 PM, Paul Hargrove wrote: > >> Hmm, >> >> On a different Linux/x86-64 host things work as expected with '-lutil' >> linked explicitly: >> >> $ ./INST/bin/mpicc -showme BLD/examples/hello_c.c >> pgcc BLD/examples/hello_c.c >> -I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/include >> -L/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib >> -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath >> -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib >> -Wl,-rpath >> -Wl,/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib >> -L/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib >> -lmpi -lopen-rte -lopen-pal -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil >> >> Searching for relevant differences now... >> >> -Paul >> >> >> On Sun, Aug 3, 2014 at 4:58 PM, Paul Hargrove wrote: >> >>> >>> I've configured the 1.8.2rc3 tarball with "--enable-static >>> --disable-shared" on a fairly standard Linux/x86-64 platform. While there >>> are no problems on the same platform w/o these configure flags, with them I >>> cannot link any application codes. >>> >>> $ mpicc -ghello_c.c -o hello_c >>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib/libopen-pal.a(opal_pty.o): >>> In function `opal_openpty': >>> opal_pty.c:(.text+0x1): undefined reference to `openpty' >>> >>> I checked "make openpty" and the manpage says to link with '-lutil'. >>> The '-showme' does not show libutil: >>> >>> $ mpicc -showme hello_c.c >>> gcc hello_c.c >>> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/include >>> -pthread -L/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >>> -Wl,/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib >>> -Wl,--enable-new-dtags >>> -L/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib >>> -lmpi -lopen-rte -lopen-pal -lm -ldl -ltorque -libverbs -lrdmacm >>> >>> >>> It looks like configure is doing the right thing on some level, but >>> failing to add '-lutil' to the appropriate list of libs >>
[OMPI devel] oshmem enabled by default
In both trunk and 1.8.2rc3 the behavior is to enable oshmem by default. In the 1.8.2rc3 tarball the configure help output matches the behavior. HOWEVER, in the trunk the configure help output still says oshmem is DISabled by default. {~/OMPI/ompi-trunk}$ svn info | grep "Revision" Revision: 32422 {~/OMPI/ompi-trunk}$ ./configure --help | grep -A1 'enable-oshmem ' --enable-oshmem Enable building the OpenSHMEM interface (disabled by default) -Paul On Thu, Jul 24, 2014 at 2:09 PM, Ralph Castain wrote: > Actually, it already is set correctly - the help message was out of date, > so I corrected that. > > On Jul 24, 2014, at 10:58 AM, Marco Atzeri wrote: > > > On 24/07/2014 15:52, Ralph Castain wrote: > >> Oshmem should be enabled by default now > > > > Ok, > > so please reverse the configure switch > > > > --enable-oshmem Enable building the OpenSHMEM interface > (disabled by default) > > > > I will test enabling it in the meantime. > > > > Regards > > Marco > > > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15254.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15261.php > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO
Okay, I filed a blocker on this for 1.8.2 and assigned it to Jeff. I took a crack at fixing it, but came up short :-( On Aug 3, 2014, at 10:46 PM, Paul Hargrove wrote: > I've identified the difference between the platform that does link libutil > and the one that does not. > > 1) libutil is linked (as an OMPI dependency) only on the working system: > > Working system: > $ grep 'checking for .* LIBS' configure.out > checking for OPAL LIBS... -lm -lpciaccess -ldl > checking for ORTE LIBS... -lm -lpciaccess -ldl -ltorque > checking for OMPI LIBS... -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil > > NON-working system: > $ grep 'checking for .* LIBS' configure.out > checking for OPAL LIBS... -lm -ldl > checking for ORTE LIBS... -lm -ldl -ltorque > checking for OMPI LIBS... -lm -ldl -ltorque > > So, the working system that does link libutil is doing so as an OMPI > dependency. > However it is also needed for opal (only caller of openpty is > opal/util/open_pty.c). > > 2) Only the working system is building ROMIO: > > Comparing the 'checking if * can compile' lines of configure output shows > only ONE difference: > > checking if MCA component fs:ufs can compile... yes > checking if MCA component fs:pvfs2 can compile... no > checking if MCA component io:ompio can compile... yes > -checking if MCA component io:romio can compile... no > +checking if MCA component io:romio can compile... yes > checking if MCA component mpool:grdma can compile... yes > checking if MCA component mpool:sm can compile... yes > checking if MCA component mpool:udreg can compile... no > > So, it appears that *if* ROMIO is configured in, then "-lutil" gets added to > OMPI_WRAPPER_EXTRA_LIBS. > This masks the fact that it is missing from OPAL_WRAPPER_EXTRA_LIBS. > > > I have confirmed that I can reproduce the static linking failure by adding > --disable-io-romio to the configure options of the system that worked > previously. > > So, I update my report (and the email subject line) to: >Static linking fails on Linux when not building ROMIO > > -Paul > > > > On Sun, Aug 3, 2014 at 6:22 PM, Paul Hargrove wrote: > Hmm, > > On a different Linux/x86-64 host things work as expected with '-lutil' linked > explicitly: > > $ ./INST/bin/mpicc -showme BLD/examples/hello_c.c > pgcc BLD/examples/hello_c.c > -I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/include > -L/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib > -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath > -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib > -Wl,-rpath > -Wl,/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib > > -L/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib > -lmpi -lopen-rte -lopen-pal -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil > > Searching for relevant differences now... > > -Paul > > > On Sun, Aug 3, 2014 at 4:58 PM, Paul Hargrove wrote: > > I've configured the 1.8.2rc3 tarball with "--enable-static --disable-shared" > on a fairly standard Linux/x86-64 platform. While there are no problems on > the same platform w/o these configure flags, with them I cannot link any > application codes. > > $ mpicc -ghello_c.c -o hello_c > /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib/libopen-pal.a(opal_pty.o): > In function `opal_openpty': > opal_pty.c:(.text+0x1): undefined reference to `openpty' > > I checked "make openpty" and the manpage says to link with '-lutil'. > The '-showme' does not show libutil: > > $ mpicc -showme hello_c.c > gcc hello_c.c > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/include > -pthread -L/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath > -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath > -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath > -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath > -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath > -Wl,/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib > -Wl,--enable-new-dtags > -L/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib > -lmpi -lopen-rte -lopen-pal -lm -ldl -ltorque -libverbs -lrdmacm > > > It looks like configure is doing the right thing on some level, but failing > to add '-lutil' to the appropriate list of libs (OPAL_WRAPPER_EXTRA_LIBS?): > > > == Library and Function tests > > checking if we need -lutil for openpty... yes > checking for openpty... yes > > > -Paul > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-69
Re: [OMPI devel] opal_config_bottom.h question again
Yeah, I think Howard's commit isn't correct. We shouldn't have to put opal_config.h after the system includes. I think the real problem is that you aren't supposed to directly include the system malloc.h as that hoses the entire memory interceptor stuff. Howard: can you try reverting your commit and just removing the system malloc.h as per Adrian's patch? I think that will correctly solve the problem. On Aug 4, 2014, at 9:55 AM, Adrian Reber wrote: > And with following change I can get it to compile again: > > diff --git a/opal/mca/mpool/base/mpool_base_frame.c > b/opal/mca/mpool/base/mpool_base_frame.c > index c1b044b..f94b8a5 100644 > --- a/opal/mca/mpool/base/mpool_base_frame.c > +++ b/opal/mca/mpool/base/mpool_base_frame.c > @@ -21,12 +21,10 @@ > #include "opal_config.h" > > #include > +#include > #ifdef HAVE_UNISTD_H > #include > #endif /* HAVE_UNISTD_H */ > -#ifdef HAVE_MALLOC_H > -#include > -#endif > > #include "opal/mca/mca.h" > #include "opal/mca/base/base.h" > diff --git a/opal/util/malloc.h b/opal/util/malloc.h > index db5a4d0..efeaf98 100644 > --- a/opal/util/malloc.h > +++ b/opal/util/malloc.h > @@ -21,7 +21,7 @@ > #ifndef OPAL_MALLOC_H > #define OPAL_MALLOC_H > > -#include "opal_config.h" > +#include > #include > > /* > > > On Mon, Aug 04, 2014 at 06:39:13PM +0200, Adrian Reber wrote: >> I can confirm this on Fedora 20 with gcc 4.8.3. >> >> Running ./configure without any options gives me the same error. >> >> On Mon, Aug 04, 2014 at 04:24:29PM +, Pritchard Jr., Howard wrote: >>> Hi Ralph, >>> >>> Nope that doesn't fix the problem I'm hitting. I tried to build the opmi >>> trunk >>> on a system with a much older gcc compiler (4.4.7) and it compiled :)! But >>> I'd like to be able to compile opmi with a newer gcc like the one on my >>> opensuse >>> 13.1 box. >>> >>> The preprocessor is pulling in the system malloc.h and that's where things >>> blow up: >>> >>> CC base/mpool_base_frame.lo >>> In file included from ../../../opal/include/opal_config.h:2750:0, >>> from base/mpool_base_frame.c:21: >>> ../../../opal/include/opal_config_bottom.h:381:38: error: expected >>> declaration specifiers or '...' before '(' token >>> #define malloc(size) opal_malloc((size), __FILE__, __LINE__) >>> ^ >>> In file included from base/mpool_base_frame.c:28:0: >>> /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' >>> before string constant >>> extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur; >>> ^ >>> /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' >>> before numeric constant >>> In file included from ../../../opal/include/opal_config.h:2750:0, >>> from base/mpool_base_frame.c:21: >>> ../../../opal/include/opal_config_bottom.h:385:48: error: expected >>> declaration specifiers or '...' before '(' token >>> #define calloc(nmembers, size) opal_calloc((nmembers), (size), >>> __FILE__, __LINE__) >>>^ >>> ../../../opal/include/opal_config_bottom.h:385:60: error: expected >>> declaration specifiers or '...' before '(' token >>> #define calloc(nmembers, size) opal_calloc((nmembers), (size), >>> __FILE__, __LINE__) >>>^ >>> In file included from base/mpool_base_frame.c:28:0: >>> /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' >>> before string constant >>> extern void *calloc (size_t __nmemb, size_t __size) >>> ^ >>> /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' >>> before numeric constant >>> In file included from ../../../opal/include/opal_config.h:2750:0, >>> from base/mpool_base_frame.c:21: >>> ../../../opal/include/opal_config_bottom.h:389:45: error: expected >>> declaration specifiers or '...' before '(' token >>> #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, >>> __LINE__) >>> ^ >>> ../../../opal/include/opal_config_bottom.h:389:52: error: expected >>> declaration specifiers or '...' before '(' token >>> #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, >>> __LINE__) >>>^ >>> In file included from base/mpool_base_frame.c:28:0: >>> /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' >>> before string constant >>> extern void *realloc (void *__ptr, size_t __size) >>> ^ >>> /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' >>> before numeric constant >>> In file included from ../../../opal/include/opal_config.h:2750:0, >>> from base/mpool_base_frame.c:21: >>> ../../../opal/include/opal_config_bottom.h:393:33: error: expected >>> declaration specifiers or '...' before '(' token >>> #define free(ptr) opal_free((ptr), __FI
Re: [OMPI devel] opal_config_bottom.h question again
And with following change I can get it to compile again: diff --git a/opal/mca/mpool/base/mpool_base_frame.c b/opal/mca/mpool/base/mpool_base_frame.c index c1b044b..f94b8a5 100644 --- a/opal/mca/mpool/base/mpool_base_frame.c +++ b/opal/mca/mpool/base/mpool_base_frame.c @@ -21,12 +21,10 @@ #include "opal_config.h" #include +#include #ifdef HAVE_UNISTD_H #include #endif /* HAVE_UNISTD_H */ -#ifdef HAVE_MALLOC_H -#include -#endif #include "opal/mca/mca.h" #include "opal/mca/base/base.h" diff --git a/opal/util/malloc.h b/opal/util/malloc.h index db5a4d0..efeaf98 100644 --- a/opal/util/malloc.h +++ b/opal/util/malloc.h @@ -21,7 +21,7 @@ #ifndef OPAL_MALLOC_H #define OPAL_MALLOC_H -#include "opal_config.h" +#include #include /* On Mon, Aug 04, 2014 at 06:39:13PM +0200, Adrian Reber wrote: > I can confirm this on Fedora 20 with gcc 4.8.3. > > Running ./configure without any options gives me the same error. > > On Mon, Aug 04, 2014 at 04:24:29PM +, Pritchard Jr., Howard wrote: > > Hi Ralph, > > > > Nope that doesn't fix the problem I'm hitting. I tried to build the opmi > > trunk > > on a system with a much older gcc compiler (4.4.7) and it compiled :)! But > > I'd like to be able to compile opmi with a newer gcc like the one on my > > opensuse > > 13.1 box. > > > > The preprocessor is pulling in the system malloc.h and that's where things > > blow up: > > > > CC base/mpool_base_frame.lo > > In file included from ../../../opal/include/opal_config.h:2750:0, > > from base/mpool_base_frame.c:21: > > ../../../opal/include/opal_config_bottom.h:381:38: error: expected > > declaration specifiers or '...' before '(' token > > #define malloc(size) opal_malloc((size), __FILE__, __LINE__) > > ^ > > In file included from base/mpool_base_frame.c:28:0: > > /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' > > before string constant > > extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur; > > ^ > > /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' > > before numeric constant > > In file included from ../../../opal/include/opal_config.h:2750:0, > > from base/mpool_base_frame.c:21: > > ../../../opal/include/opal_config_bottom.h:385:48: error: expected > > declaration specifiers or '...' before '(' token > > #define calloc(nmembers, size) opal_calloc((nmembers), (size), > > __FILE__, __LINE__) > > ^ > > ../../../opal/include/opal_config_bottom.h:385:60: error: expected > > declaration specifiers or '...' before '(' token > > #define calloc(nmembers, size) opal_calloc((nmembers), (size), > > __FILE__, __LINE__) > > ^ > > In file included from base/mpool_base_frame.c:28:0: > > /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' > > before string constant > > extern void *calloc (size_t __nmemb, size_t __size) > > ^ > > /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' > > before numeric constant > > In file included from ../../../opal/include/opal_config.h:2750:0, > > from base/mpool_base_frame.c:21: > > ../../../opal/include/opal_config_bottom.h:389:45: error: expected > > declaration specifiers or '...' before '(' token > > #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, > > __LINE__) > > ^ > > ../../../opal/include/opal_config_bottom.h:389:52: error: expected > > declaration specifiers or '...' before '(' token > > #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, > > __LINE__) > > ^ > > In file included from base/mpool_base_frame.c:28:0: > > /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' > > before string constant > > extern void *realloc (void *__ptr, size_t __size) > > ^ > > /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' > > before numeric constant > > In file included from ../../../opal/include/opal_config.h:2750:0, > > from base/mpool_base_frame.c:21: > > ../../../opal/include/opal_config_bottom.h:393:33: error: expected > > declaration specifiers or '...' before '(' token > > #define free(ptr) opal_free((ptr), __FILE__, __LINE__) > > ^ > > In file included from base/mpool_base_frame.c:28:0: > > /usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' > > before string constant > > extern void free (void *__ptr) __THROW; > > ^ > > /usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' > > before numeric constant > > > > > > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain > > Sent: Monday, August 04, 2014 10:09 AM > > T
Re: [OMPI devel] opal_config_bottom.h question again
I can confirm this on Fedora 20 with gcc 4.8.3. Running ./configure without any options gives me the same error. On Mon, Aug 04, 2014 at 04:24:29PM +, Pritchard Jr., Howard wrote: > Hi Ralph, > > Nope that doesn't fix the problem I'm hitting. I tried to build the opmi > trunk > on a system with a much older gcc compiler (4.4.7) and it compiled :)! But > I'd like to be able to compile opmi with a newer gcc like the one on my > opensuse > 13.1 box. > > The preprocessor is pulling in the system malloc.h and that's where things > blow up: > > CC base/mpool_base_frame.lo > In file included from ../../../opal/include/opal_config.h:2750:0, > from base/mpool_base_frame.c:21: > ../../../opal/include/opal_config_bottom.h:381:38: error: expected > declaration specifiers or '...' before '(' token > #define malloc(size) opal_malloc((size), __FILE__, __LINE__) > ^ > In file included from base/mpool_base_frame.c:28:0: > /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' > before string constant > extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur; > ^ > /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' > before numeric constant > In file included from ../../../opal/include/opal_config.h:2750:0, > from base/mpool_base_frame.c:21: > ../../../opal/include/opal_config_bottom.h:385:48: error: expected > declaration specifiers or '...' before '(' token > #define calloc(nmembers, size) opal_calloc((nmembers), (size), __FILE__, > __LINE__) > ^ > ../../../opal/include/opal_config_bottom.h:385:60: error: expected > declaration specifiers or '...' before '(' token > #define calloc(nmembers, size) opal_calloc((nmembers), (size), __FILE__, > __LINE__) > ^ > In file included from base/mpool_base_frame.c:28:0: > /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' > before string constant > extern void *calloc (size_t __nmemb, size_t __size) > ^ > /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' > before numeric constant > In file included from ../../../opal/include/opal_config.h:2750:0, > from base/mpool_base_frame.c:21: > ../../../opal/include/opal_config_bottom.h:389:45: error: expected > declaration specifiers or '...' before '(' token > #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, __LINE__) > ^ > ../../../opal/include/opal_config_bottom.h:389:52: error: expected > declaration specifiers or '...' before '(' token > #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, __LINE__) > ^ > In file included from base/mpool_base_frame.c:28:0: > /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' > before string constant > extern void *realloc (void *__ptr, size_t __size) > ^ > /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' > before numeric constant > In file included from ../../../opal/include/opal_config.h:2750:0, > from base/mpool_base_frame.c:21: > ../../../opal/include/opal_config_bottom.h:393:33: error: expected > declaration specifiers or '...' before '(' token > #define free(ptr) opal_free((ptr), __FILE__, __LINE__) > ^ > In file included from base/mpool_base_frame.c:28:0: > /usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' > before string constant > extern void free (void *__ptr) __THROW; > ^ > /usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' > before numeric constant > > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Monday, August 04, 2014 10:09 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] opal_config_bottom.h question again > > I believe the issue is actually in opal/util/malloc.h, Howard. I noticed this > while looking around this weekend - someone included opal_config.h in the > malloc.h file even though it explicitly says "DON'T DO THIS" in that header > file. > > #ifndef OPAL_MALLOC_H > #define OPAL_MALLOC_H > > #include "opal_config.h" > #include > > /* > * THIS FILE CANNOT INCLUDE ANY OTHER OPAL HEADER FILES!!! > * > * It is included via . Hence, it should not > * include ANY other files, nor should it include "opal_config.h". > * > */ > > Don't know why someone did that, but you might see if it fixes your problem > > > On Aug 4, 2014, at 9:00 AM, Pritchard Jr., Howard > mailto:howa...@lanl.gov>> wrote: > > > Hi Folks, > > As I said last week, I'm noticing now that on my opensuse 13.1 system and gcc > 4.8.1, when I do a fresh > checkout of trunk ompi and try to build, without any configure opt
Re: [OMPI devel] opal_config_bottom.h question again
Hi Ralph, Nope that doesn't fix the problem I'm hitting. I tried to build the opmi trunk on a system with a much older gcc compiler (4.4.7) and it compiled :)! But I'd like to be able to compile opmi with a newer gcc like the one on my opensuse 13.1 box. The preprocessor is pulling in the system malloc.h and that's where things blow up: CC base/mpool_base_frame.lo In file included from ../../../opal/include/opal_config.h:2750:0, from base/mpool_base_frame.c:21: ../../../opal/include/opal_config_bottom.h:381:38: error: expected declaration specifiers or '...' before '(' token #define malloc(size) opal_malloc((size), __FILE__, __LINE__) ^ In file included from base/mpool_base_frame.c:28:0: /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' before string constant extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur; ^ /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' before numeric constant In file included from ../../../opal/include/opal_config.h:2750:0, from base/mpool_base_frame.c:21: ../../../opal/include/opal_config_bottom.h:385:48: error: expected declaration specifiers or '...' before '(' token #define calloc(nmembers, size) opal_calloc((nmembers), (size), __FILE__, __LINE__) ^ ../../../opal/include/opal_config_bottom.h:385:60: error: expected declaration specifiers or '...' before '(' token #define calloc(nmembers, size) opal_calloc((nmembers), (size), __FILE__, __LINE__) ^ In file included from base/mpool_base_frame.c:28:0: /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' before string constant extern void *calloc (size_t __nmemb, size_t __size) ^ /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' before numeric constant In file included from ../../../opal/include/opal_config.h:2750:0, from base/mpool_base_frame.c:21: ../../../opal/include/opal_config_bottom.h:389:45: error: expected declaration specifiers or '...' before '(' token #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, __LINE__) ^ ../../../opal/include/opal_config_bottom.h:389:52: error: expected declaration specifiers or '...' before '(' token #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, __LINE__) ^ In file included from base/mpool_base_frame.c:28:0: /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' before string constant extern void *realloc (void *__ptr, size_t __size) ^ /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' before numeric constant In file included from ../../../opal/include/opal_config.h:2750:0, from base/mpool_base_frame.c:21: ../../../opal/include/opal_config_bottom.h:393:33: error: expected declaration specifiers or '...' before '(' token #define free(ptr) opal_free((ptr), __FILE__, __LINE__) ^ In file included from base/mpool_base_frame.c:28:0: /usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' before string constant extern void free (void *__ptr) __THROW; ^ /usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' before numeric constant From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Monday, August 04, 2014 10:09 AM To: Open MPI Developers Subject: Re: [OMPI devel] opal_config_bottom.h question again I believe the issue is actually in opal/util/malloc.h, Howard. I noticed this while looking around this weekend - someone included opal_config.h in the malloc.h file even though it explicitly says "DON'T DO THIS" in that header file. #ifndef OPAL_MALLOC_H #define OPAL_MALLOC_H #include "opal_config.h" #include /* * THIS FILE CANNOT INCLUDE ANY OTHER OPAL HEADER FILES!!! * * It is included via . Hence, it should not * include ANY other files, nor should it include "opal_config.h". * */ Don't know why someone did that, but you might see if it fixes your problem On Aug 4, 2014, at 9:00 AM, Pritchard Jr., Howard mailto:howa...@lanl.gov>> wrote: Hi Folks, As I said last week, I'm noticing now that on my opensuse 13.1 system and gcc 4.8.1, when I do a fresh checkout of trunk ompi and try to build, without any configure options, mca_base_mpool_frame.c does not compile. The reason is there is a conflict in opal_config_bottom.h and the contents of malloc.h, which for my system is pulled in by the preprocessor. If I undefine HAVE_MALLOC_H in this file, the code compiles fine. Alternatively, one can also move the malloc.h include prior to the opal_config.h include and things work. Alternatively, one can add the OPAL_DISABLE_EN
Re: [OMPI devel] opal_config_bottom.h question again
I believe the issue is actually in opal/util/malloc.h, Howard. I noticed this while looking around this weekend - someone included opal_config.h in the malloc.h file even though it explicitly says "DON'T DO THIS" in that header file. #ifndef OPAL_MALLOC_H #define OPAL_MALLOC_H #include "opal_config.h" #include /* * THIS FILE CANNOT INCLUDE ANY OTHER OPAL HEADER FILES!!! * * It is included via . Hence, it should not * include ANY other files, nor should it include "opal_config.h". * */ Don't know why someone did that, but you might see if it fixes your problem On Aug 4, 2014, at 9:00 AM, Pritchard Jr., Howard wrote: > Hi Folks, > > As I said last week, I’m noticing now that on my opensuse 13.1 system and gcc > 4.8.1, when I do a fresh > checkout of trunk ompi and try to build, without any configure options, > > mca_base_mpool_frame.c > > does not compile. > > The reason is there is a conflict in opal_config_bottom.h and the contents of > malloc.h, > which for my system is pulled in by the preprocessor. > > If I undefine HAVE_MALLOC_H in this file, the code compiles fine. > Alternatively, > one can also move the malloc.h include prior to the opal_config.h include and > things > work. Alternatively, one can add the OPAL_DISABLE_ENABLE_MEM_DEBUG define > as in mpool_base_lookup.c , and the compile problem similarly goes away. > > I’d like to check in a fix for this. I’d prefer to just move the std include > files ahead > of the opal_config.h include. I’d like to do this today unless someone > objects. > > I’m somewhat surprised I’m the only one seeing this though. > > Howard > > > - > Howard Pritchard > HPC-5 > Los Alamos National Laboratory > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15495.php
[OMPI devel] opal_config_bottom.h question again
Hi Folks, As I said last week, I'm noticing now that on my opensuse 13.1 system and gcc 4.8.1, when I do a fresh checkout of trunk ompi and try to build, without any configure options, mca_base_mpool_frame.c does not compile. The reason is there is a conflict in opal_config_bottom.h and the contents of malloc.h, which for my system is pulled in by the preprocessor. If I undefine HAVE_MALLOC_H in this file, the code compiles fine. Alternatively, one can also move the malloc.h include prior to the opal_config.h include and things work. Alternatively, one can add the OPAL_DISABLE_ENABLE_MEM_DEBUG define as in mpool_base_lookup.c , and the compile problem similarly goes away. I'd like to check in a fix for this. I'd prefer to just move the std include files ahead of the opal_config.h include. I'd like to do this today unless someone objects. I'm somewhat surprised I'm the only one seeing this though. Howard - Howard Pritchard HPC-5 Los Alamos National Laboratory
[OMPI devel] canceling buffered send request with pml/cm
Hi, Seems like it's impossible to cancel buffered sends with pml/cm. >From one hand, pml/cm completes the buffered send immediately >(MCA_PML_CM_HVY_SEND_REQUEST_START): if(OMPI_SUCCESS == ret && \ sendreq->req_send.req_send_mode == MCA_PML_BASE_SEND_BUFFERED) { \ sendreq->req_send.req_base.req_ompi.req_status.MPI_ERROR = 0; \ ompi_request_complete(&(sendreq)->req_send.req_base.req_ompi, true); \ } So, if the user is doing Bsend()/Cancel()/Wait()/Test_canceled(), the Wait() would be a no-op. Therefore when mtl_cancel() was called, it had to either cancel/guarantee completion *immediately*, otherwise the return from Test_canceled would be undefined. However, it's not always possible to cancel immediately, because need to make sure the peer has not matched it yet (fox example, with mtl mxm). IMHO it's wrong for pml_cm to complete a buffered send immediately. What do you think? --Yossi
[OMPI devel] 1.8.2rc3 cosmetic issues in configure
It looks like four instances of AC_MSG_CHECKING are missing an AC_MSG_RESULT or have other configure macros improperly nested between the two: checking for epoll support... checking for epoll_ctl... yes yes checking for working epoll library interface... yes yes checking if user requested CMA build... checking --with-knem value... simple ok (unspecified) checking if user requested CMA build... checking if MCA component btl:vader can compile... yes checking orte configuration args... checking if MCA component dpm:orte can compile... yes -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO
I've identified the difference between the platform that does link libutil and the one that does not. 1) libutil is linked (as an OMPI dependency) only on the working system: Working system: $ grep 'checking for .* LIBS' configure.out checking for OPAL LIBS... -lm -lpciaccess -ldl checking for ORTE LIBS... -lm -lpciaccess -ldl -ltorque checking for OMPI LIBS... -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil NON-working system: $ grep 'checking for .* LIBS' configure.out checking for OPAL LIBS... -lm -ldl checking for ORTE LIBS... -lm -ldl -ltorque checking for OMPI LIBS... -lm -ldl -ltorque So, the working system that does link libutil is doing so as an OMPI dependency. However it is also needed for opal (only caller of openpty is opal/util/open_pty.c). 2) Only the working system is building ROMIO: Comparing the 'checking if * can compile' lines of configure output shows only ONE difference: checking if MCA component fs:ufs can compile... yes checking if MCA component fs:pvfs2 can compile... no checking if MCA component io:ompio can compile... yes -checking if MCA component io:romio can compile... no +checking if MCA component io:romio can compile... yes checking if MCA component mpool:grdma can compile... yes checking if MCA component mpool:sm can compile... yes checking if MCA component mpool:udreg can compile... no So, it appears that *if* ROMIO is configured in, then "-lutil" gets added to OMPI_WRAPPER_EXTRA_LIBS. This masks the fact that it is missing from OPAL_WRAPPER_EXTRA_LIBS. I have confirmed that I can reproduce the static linking failure by adding --disable-io-romio to the configure options of the system that worked previously. So, I update my report (and the email subject line) to: Static linking fails on Linux when not building ROMIO -Paul On Sun, Aug 3, 2014 at 6:22 PM, Paul Hargrove wrote: > Hmm, > > On a different Linux/x86-64 host things work as expected with '-lutil' > linked explicitly: > > $ ./INST/bin/mpicc -showme BLD/examples/hello_c.c > pgcc BLD/examples/hello_c.c > -I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/include > -L/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib > -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath > -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib > -Wl,-rpath > -Wl,/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib > -L/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib > -lmpi -lopen-rte -lopen-pal -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil > > Searching for relevant differences now... > > -Paul > > > On Sun, Aug 3, 2014 at 4:58 PM, Paul Hargrove wrote: > >> >> I've configured the 1.8.2rc3 tarball with "--enable-static >> --disable-shared" on a fairly standard Linux/x86-64 platform. While there >> are no problems on the same platform w/o these configure flags, with them I >> cannot link any application codes. >> >> $ mpicc -ghello_c.c -o hello_c >> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib/libopen-pal.a(opal_pty.o): >> In function `opal_openpty': >> opal_pty.c:(.text+0x1): undefined reference to `openpty' >> >> I checked "make openpty" and the manpage says to link with '-lutil'. >> The '-showme' does not show libutil: >> >> $ mpicc -showme hello_c.c >> gcc hello_c.c >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/include >> -pthread -L/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath >> -Wl,/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib >> -Wl,--enable-new-dtags >> -L/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib >> -lmpi -lopen-rte -lopen-pal -lm -ldl -ltorque -libverbs -lrdmacm >> >> >> It looks like configure is doing the right thing on some level, but >> failing to add '-lutil' to the appropriate list of libs >> (OPAL_WRAPPER_EXTRA_LIBS?): >> >> >> >> == Library and Function tests >> >> >> checking if we need -lutil for openpty... yes >> checking for openpty... yes >> >> >> -Paul >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Paul H. Hargrove
Re: [OMPI devel] 1.8.2rc3 now out
Fixed in r32409 : %d and %s were swapped in a MLERROR (printf like) Gilles On 2014/08/02 11:07, Gilles Gouaillardet wrote: > Paul, > > about the second point : > mmap is called with the MAP_FIXED flag, before the fix, the > required address was not aligned on a page size and hence > mmap failed. > the mmap failure was immediatly handled, but for some reasons > i did not fully investigate yet, this failure was not correctly propagated, > leading to a SIGSEGV later in lmngr_register (if i remember correctly) > > i will add this to my todo list : investigate why the error is not correctly > propagated and handled. > > Cheers, > > Gilles > > On Sat, Aug 2, 2014 at 6:05 AM, Paul Hargrove wrote: > >> Regarding review of the coll/ml fix: >> >> While the fix Gilles worked out overnight proved sufficient on >> Solaris/SPARC, Linux/PPC64 and Linux/IA64, I had two concerns: >> >> 1) As I already voiced on the list, I am concerned with the portability of >> _SC_PAGESIZE vs _SC_PAGE_SIZE (vs get_pagesize()). >> >> 2) Though I have not tried to trace the code, the fact that fixing the >> alignment prevents a SEGV strongly suggests that there was a mmap (or >> something else sensitive to page size) call failing. So, there should >> probably be a check added for failure of that call to produce a cleaner >> failure than SEGV. >> >> Just my USD 0.02. >> -Paul >> >> >> On Fri, Aug 1, 2014 at 6:39 AM, Ralph Castain wrote: >> >>> Okay, I fixed those two and will release rc4 once the coll/ml fix has >>> been reviewed. Thanks >>> >>> On Aug 1, 2014, at 2:46 AM, Mike Dubman wrote: >>> >>> Also, latest commit into openib (origin/v1.8 >>> https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something: >>> >>> *11:45:01* + timeout -s SIGSEGV 3m >>> /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/bin/mpirun >>> -np 8 -mca pml ob1 -mca btl self,openib >>> /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/examples/hello_usempi*11:45:01* >>> >>> --*11:45:01* >>> WARNING: There are more than one active ports on host 'hpctest', but >>> the*11:45:01* default subnet GID prefix was detected on more than one of >>> these*11:45:01* ports. If these ports are connected to different physical >>> IB*11:45:01* networks, this configuration will fail in Open MPI. This >>> version of*11:45:01* Open MPI requires that every physically separate IB >>> subnet that is*11:45:01* used between connected MPI processes must have >>> different subnet ID*11:45:01* values.*11:45:01* *11:45:01* Please see this >>> FAQ entry for more details:*11:45:01* *11:45:01* >>> http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid*11:45:01* >>> *11:45:01* NOTE: You can turn off this warning by setting the MCA >>> parameter*11:45:01* btl_openib_warn_default_gid_prefix to >>> 0.*11:45:01* >>> --*11:45:01* >>> >>> --*11:45:01* >>> WARNING: No queue pairs were defined in the >>> btl_openib_receive_queues*11:45:01* MCA parameter. At least one queue pair >>> must be defined. The*11:45:01* OpenFabrics (openib) BTL will therefore be >>> deactivated for this run.*11:45:01* *11:45:01* Local host: >>> hpctest*11:45:01* >>> --*11:45:01* >>> >>> --*11:45:01* >>> At least one pair of MPI processes are unable to reach each other >>> for*11:45:01* MPI communications. This means that no Open MPI device has >>> indicated*11:45:01* that it can be used to communicate between these >>> processes. This is*11:45:01* an error; Open MPI requires that all MPI >>> processes be able to reach*11:45:01* each other. This error can sometimes >>> be the result of forgetting to*11:45:01* specify the "self" BTL.*11:45:01* >>> *11:45:01* Process 1 ([[55281,1],1]) is on host: hpctest*11:45:01* >>> Process 2 ([[55281,1],0]) is on host: hpctest*11:45:01* BTLs attempted: >>> self*11:45:01* *11:45:01* Your MPI job is now going to abort; >>> sorry.*11:45:01* >>> --*11:45:01* >>> >>> --*11:45:01* >>> MPI_INIT has failed because at least one MPI process is >>> unreachable*11:45:01* from another. This *usually* means that an >>> underlying communication*11:45:01* plugin -- such as a BTL or an MTL -- has >>> either not loaded or not*11:45:01* allowed itself to be used. Your MPI job >>> will now abort.*11:45:01* *11:45:01* You may wish to try to narrow down the >>> problem;*11:45:01* *11:45:01* * Check the output of ompi_info to see which >>> BTL/MTL plugins are*11:45:01*available.*11:45:01*