I *can* reproduce the problem on SPARC/Solaris-10 with the SS12.3 compiler and an ALMOST vanilla configure: $ [path_to]configure \ --prefix=[blah] CC=cc CXX=CC F77=f77 FC=f90 \ CFLAGS="-m64" --with-wrapper-cflags="-m64" CXXFLAGS="-m64" --with-wrapper-cxxflags="-m64" \ FFLAGS="-m64" --with-wrapper-fflags="-m64" FCFLAGS="-m64" --with-wrapper-fcflags="-m64" \ CXXFLAGS="-m64 -library=stlport4"
I did NOT manage to reproduce on AMD64/Solaris-11, which completed a build w/ VT disabled. Unfortunately I have neither SPARC/Solaris-11 nor AMD64/Solaris-10 readily available to disambiguate the key factor. Hopefully it is enough to know that the problem is reproducible w/o Oracle's massive configure commandline. The build isn't complete, but I can already see that the symbol has "leaked" into libmpi: $ grep -arl mca_coll_ml_memsync_intra BLD/ BLD/ompi/mca/bcol/.libs/libmca_bcol.a BLD/ompi/mca/bcol/base/.libs/bcol_base_open.o BLD/ompi/.libs/libmpi.so.0.0.0 BLD/ompi/.libs/libmpi.so BLD/ompi/.libs/libmpi.so.0 It is referenced by mca_coll_ml_generic_collectives_launcher: $ nm BLD/ompi/.libs/libmpi.so.0.0.0 | grep -B1 mca_coll_ml_memsync_intra 00000000006a6088 t mca_coll_ml_generic_collectives_launcher U mca_coll_ml_memsync_intra This is coming from libmca_bcol.a: $ nm BLD/ompi/mca/bcol/.libs/libmca_bcol.a | grep -B1 mca_coll_ml_memsync_intra 0000000000005248 t mca_coll_ml_generic_collectives_launcher U mca_coll_ml_memsync_intra This appears to be via the following chain of calls within coll_ml.h: mca_coll_ml_generic_collectives_launcher mca_coll_ml_task_completion_processing coll_ml_fragment_completion_processing mca_coll_ml_buffer_recycling mca_coll_ml_memsync_intra All of which are marked as "static inline __opal_attribute_always_inline__". -Paul On Fri, Aug 24, 2012 at 4:55 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > OK, I have a vanilla configure+make running on both SPARC/Solaris-10 and > AMD64/Solaris-11. > I am using the 12.3 Oracle compilers in both cases to match the original > report. > I'll post the results when they complete. > > In the meantime, I took a quick look at the code and have a pretty > reasonable guess as to the cause. > Looking at ompi/mca/coll/ml/coll_ml.h I see: > > 827 int mca_coll_ml_memsync_intra(mca_coll_ml_module_t *module, int > bank_index); > [...] > 996 static inline __opal_attribute_always_inline__ > 997 int > mca_coll_ml_buffer_recycling(mca_coll_ml_collective_operation_progress_t > *ml_request) > 998 { > [...] > 1023 rc = mca_coll_ml_memsync_intra(ml_module, > ml_memblock->memsync_counter); > [...] > 1041 } > > Based on past experience w/ the Sun/Oracle compilers on another project > (See http://bugzilla.hcs.ufl.edu/cgi-bin/bugzilla3/show_bug.cgi?id=193 ), > I suspect that this static-inline-always function is being emitted by the > compiler in every object which includes this header even if they don't call > it.. The call on line 1023 then results in the undefined reference > to mca_coll_ml_memsync_intra. Basically it is not safe for an inline > function in a header to call an extern function that isn't available to > every object that includes the header REGARDLESS of whether the object > invokes the inline function or not. > > -Paul > > > > On Fri, Aug 24, 2012 at 4:40 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Oracle uses an abysmally complicated configure line, but nearly all of it >> is irrelevant to the problem here. For this, I would suggest just doing a >> vanilla ./configure - if the component gets pulled into libmpi, then we >> know there is a problem. >> >> Thanks! >> >> Just FYI: here is there actual configure line, just in case you spot >> something problematic: >> >> CC=cc CXX=CC F77=f77 FC=f90 --with-openib --enable-openib-connectx-xrc >> --without-udapl >> --disable-openib-ibcm --enable-btl-openib-failover --without-dtrace >> --enable-heterogeneous >> --enable-cxx-exceptions --enable-shared --enable-orterun-prefix-by-default >> --with-sge >> --enable-mpi-f90 --with-mpi-f90-size=small --disable-peruse --disable-state >> --disable-mpi-thread-multiple --disable-debug --disable-mem-debug >> --disable-mem-profile >> CFLAGS="-xtarget=ultra3 -m32 -xarch=sparcvis2 -xprefetch -xprefetch_level=2 >> -xvector=lib -Qoption >> cg -xregs=no%appl -xdepend=yes -xbuiltin=%all -xO5" >> CXXFLAGS="-xtarget=ultra3 -m32 >> -xarch=sparcvis2 -xprefetch -xprefetch_level=2 -xvector=lib -Qoption cg >> -xregs=no%appl -xdepend=yes >> -xbuiltin=%all -xO5 -Bstatic -lCrun -lCstd -Bdynamic" >> FFLAGS="-xtarget=ultra3 -m32 -xarch=sparcvis2 >> -xprefetch -xprefetch_level=2 -xvector=lib -Qoption cg -xregs=no%appl >> -stackvar -xO5" >> FCFLAGS="-xtarget=ultra3 -m32 -xarch=sparcvis2 -xprefetch -xprefetch_level=2 >> -xvector=lib -Qoption >> cg -xregs=no%appl -stackvar -xO5" >> --prefix=/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/installs/JA08/install >> --mandir=${prefix}/man --bindir=${prefix}/bin --libdir=${prefix}/lib >> --includedir=${prefix}/include >> --with-tm=/ws/ompi-tools/orte/torque/current/shared-install32 >> --enable-contrib-no-build=vt --with-package-string="Oracle Message Passing >> Toolkit " >> --with-ident-string="@(#)RELEASE VERSION 1.9openmpi-1.5.4-r1.9a1r27092" >> >> >> and the error he gets is: >> >> make[2]: Entering directory >> `/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/mpi-install/s3rI/src/openmpi-1.9a1r27092/ompi/tools/ompi_info' >> CCLD ompi_info >> Undefined first referenced >> symbol in file >> mca_coll_ml_memsync_intra ../../../ompi/.libs/libmpi.so >> ld: fatal: symbol referencing errors. No output written to .libs/ompi_info >> make[2]: *** [ompi_info] Error 2 >> make[2]: Leaving directory >> `/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/mpi-install/s3rI/src/openmpi-1.9a1r27092/ompi/tools/ompi_info' >> make[1]: *** [install-recursive] Error 1 >> make[1]: Leaving directory >> `/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/mpi-install/s3rI/src/openmpi-1.9a1r27092/ompi' >> make: *** [install-recursive] Error 1 >> >> >> On Aug 24, 2012, at 4:30 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >> >> I have access to a few different Solaris machines and can offer to build >> the trunk if somebody tells me what configure flags are desired. >> >> -Paul >> >> On Fri, Aug 24, 2012 at 8:54 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Eugene - can you confirm that this is only happening on the one Solaris >>> system? In other words, is this a general issue or something specific to >>> that one machine? >>> >>> I'm wondering because if it is just the one machine, then it might be >>> something strange about how it is setup - perhaps the version of Solaris, >>> or it is configuring --enable-static, or... >>> >>> Just trying to assess how general a problem this might be, and thus if >>> this should be a blocker or not. >>> >>> On Aug 24, 2012, at 8:00 AM, Eugene Loh <eugene....@oracle.com> wrote: >>> >>> > On 08/24/12 09:54, Shamis, Pavel wrote: >>> >> Maybe there is a chance to get direct access to this system ? >>> > No. >>> > >>> > But I'm attaching compressed log files from configure/make. >>> > >>> > >>> <tarball-of-log-files.tar.bz2>_______________________________________________ >>> > devel mailing list >>> > de...@open-mpi.org >>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900