Nice catch!

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 24 Aug 2012, at 4:55 PM, Paul Hargrove wrote:

> OK, I have a vanilla configure+make running on both SPARC/Solaris-10 and 
> AMD64/Solaris-11.
> I am using the 12.3 Oracle compilers in both cases to match the original 
> report.
> I'll post the results when they complete.
> 
> In the meantime, I took a quick look at the code and have a pretty reasonable 
> guess as to the cause.
> Looking at ompi/mca/coll/ml/coll_ml.h I see:
> 
>    827  int mca_coll_ml_memsync_intra(mca_coll_ml_module_t *module, int 
> bank_index);
> [...]
>    996  static inline __opal_attribute_always_inline__
>    997          int 
> mca_coll_ml_buffer_recycling(mca_coll_ml_collective_operation_progress_t 
> *ml_request)
>    998  {
> [...]
>   1023                  rc = mca_coll_ml_memsync_intra(ml_module, 
> ml_memblock->memsync_counter);
> [...]
>   1041  }
> 
> Based on past experience w/ the Sun/Oracle compilers on another project (See 
> http://bugzilla.hcs.ufl.edu/cgi-bin/bugzilla3/show_bug.cgi?id=193 ), I 
> suspect that this static-inline-always function is being emitted by the 
> compiler in every object which includes this header even if they don't call 
> it..  The call on line 1023 then results in the undefined reference to 
> mca_coll_ml_memsync_intra.  Basically it is not safe for an inline function 
> in a header to call an extern function that isn't available to every object 
> that includes the header REGARDLESS of whether the object invokes the inline 
> function or not.
> 
> -Paul
> 
> 
> 
> On Fri, Aug 24, 2012 at 4:40 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Oracle uses an abysmally complicated configure line, but nearly all of it is 
> irrelevant to the problem here. For this, I would suggest just doing a 
> vanilla ./configure - if the component gets pulled into libmpi, then we know 
> there is a problem.
> 
> Thanks!
> 
> Just FYI: here is there actual configure line, just in case you spot 
> something problematic:
> 
> CC=cc CXX=CC F77=f77 FC=f90  --with-openib  --enable-openib-connectx-xrc  
> --without-udapl 
> --disable-openib-ibcm  --enable-btl-openib-failover   --without-dtrace  
> --enable-heterogeneous
> --enable-cxx-exceptions --enable-shared --enable-orterun-prefix-by-default 
> --with-sge
> --enable-mpi-f90 --with-mpi-f90-size=small  --disable-peruse --disable-state 
> --disable-mpi-thread-multiple   --disable-debug  --disable-mem-debug  
> --disable-mem-profile 
> CFLAGS="-xtarget=ultra3 -m32 -xarch=sparcvis2 -xprefetch -xprefetch_level=2 
> -xvector=lib -Qoption
> cg -xregs=no%appl -xdepend=yes -xbuiltin=%all -xO5"  
> CXXFLAGS="-xtarget=ultra3 -m32
> -xarch=sparcvis2 -xprefetch -xprefetch_level=2 -xvector=lib -Qoption cg 
> -xregs=no%appl -xdepend=yes
> -xbuiltin=%all -xO5 -Bstatic -lCrun -lCstd -Bdynamic"  
> FFLAGS="-xtarget=ultra3 -m32 -xarch=sparcvis2
> -xprefetch -xprefetch_level=2 -xvector=lib -Qoption cg -xregs=no%appl 
> -stackvar -xO5" 
> FCFLAGS="-xtarget=ultra3 -m32 -xarch=sparcvis2 -xprefetch -xprefetch_level=2 
> -xvector=lib -Qoption
> cg -xregs=no%appl -stackvar -xO5"  
> --prefix=/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/installs/JA08/install
>  
> --mandir=${prefix}/man  --bindir=${prefix}/bin  --libdir=${prefix}/lib 
> --includedir=${prefix}/include   
> --with-tm=/ws/ompi-tools/orte/torque/current/shared-install32 
> --enable-contrib-no-build=vt --with-package-string="Oracle Message Passing 
> Toolkit "
> --with-ident-string="@(#)RELEASE VERSION 1.9openmpi-1.5.4-r1.9a1r27092"
> 
> and the error he gets is:
> 
> make[2]: Entering directory
> `/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/mpi-install/s3rI/src/openmpi-1.9a1r27092/ompi/tools/ompi_info'
>   CCLD     ompi_info
> Undefined                     first referenced
>  symbol                           in file
> mca_coll_ml_memsync_intra           ../../../ompi/.libs/libmpi.so
> ld: fatal: symbol referencing errors. No output written to .libs/ompi_info
> make[2]: *** [ompi_info] Error 2
> make[2]: Leaving directory
> `/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/mpi-install/s3rI/src/openmpi-1.9a1r27092/ompi/tools/ompi_info'
> make[1]: *** [install-recursive] Error 1
> make[1]: Leaving directory
> `/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/mpi-install/s3rI/src/openmpi-1.9a1r27092/ompi'
> make: *** [install-recursive] Error 1
> 
> On Aug 24, 2012, at 4:30 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
>> I have access to a few different Solaris machines and can offer to build the 
>> trunk if somebody tells me what configure flags are desired.
>> 
>> -Paul
>> 
>> On Fri, Aug 24, 2012 at 8:54 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> Eugene - can you confirm that this is only happening on the one Solaris 
>> system? In other words, is this a general issue or something specific to 
>> that one machine?
>> 
>> I'm wondering because if it is just the one machine, then it might be 
>> something strange about how it is setup - perhaps the version of Solaris, or 
>> it is configuring --enable-static, or...
>> 
>> Just trying to assess how general a problem this might be, and thus if this 
>> should be a blocker or not.
>> 
>> On Aug 24, 2012, at 8:00 AM, Eugene Loh <eugene....@oracle.com> wrote:
>> 
>> > On 08/24/12 09:54, Shamis, Pavel wrote:
>> >> Maybe there is a chance to get direct access to this system ?
>> > No.
>> >
>> > But I'm attaching compressed log files from configure/make.
>> >
>> > <tarball-of-log-files.tar.bz2>_______________________________________________
>> > devel mailing list
>> > de...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> 
>> -- 
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department     Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to