As usual for this thread, a bunch of time elapsed.  Sorry for the delay!  :-(

FWIW: --disable-dlopen does 2 things:

1. Disables the dlopen code in Open MPI (i.e., it won't load plugins)
2. Slurps all plugins into libmpi.so (and friends)

The --disable-mca-dso switch only does #2.  So just FYI: you probably only need 
the --disable-dlopen option.


> On Jan 5, 2015, at 2:05 PM, Paul Kapinos <kapi...@itc.rwth-aachen.de> wrote:
> 
> Recap:
> 1) - the error is related to configure with '--disable-dlopen 
> --disable-mca-dso'
> 2) - the error vanishes when added   '-Wl,--as-needed' to the link line
> 3) - the error is not special to any compiler or version
> 4) - the error is not related to LSF but linking with these libs just shut 
> down it due to some symbols mess
> 
> Well, I'm not really sure that (2) is the true workaround, or just starts 
> some more library deep search and binds to LSF libs linked in somewhere in 
> the bush.

Here's the description of --as-needed from ld(1):

       --as-needed
       --no-as-needed
           This option affects ELF DT_NEEDED tags for dynamic libraries
           mentioned on the command line after the --as-needed option.
           Normally the linker will add a DT_NEEDED tag for each dynamic
           library mentioned on the command line, regardless of whether the
           library is actually needed or not.  --as-needed causes a DT_NEEDED
           tag to only be emitted for a library that satisfies an undefined
           symbol reference from a regular object file or, if the library is
           not found in the DT_NEEDED lists of other libraries linked up to
           that point, an undefined symbol reference from another dynamic
           library.  --no-as-needed restores the default behaviour.

This doesn't feel like the correct solution, either.

Here's a web page that may help:

    https://www.technovelty.org/c/relocation-truncated-to-fit-wtf.html

It describes what the "... unresolvable R_X86_64_PC32 relocation against symbol 
..." error actually means.

In short: do you have giant amounts of static data in your application?  E.g., 
giant common blocks, or giant arrays on the stack (not heap)?  If so, that page 
suggests the -m<model> option to gcc.  

It may also be that -fPIC is sufficient.  I don't fully understand the 
description of -fPIC in gcc(1), but it sounds like when OMPI's plugins are 
slurped into libmpi.so (and libopen-rte.so and libopen-pal.so), that increases 
the code size of your executable such that jumps to some symbols may be out of 
32 bit addressing range.

However, when you *don't* slurp in all of OMPI's plugins, libmpi.so (and 
friends) are considerably smaller, so the executable code generated by the 
compiler is significantly smaller, and therefore 32 bit addressable jumps are 
sufficient.

Note that when OMPI dlopen's its plugins at run time, that code is jumped to 
via function pointer through the Global Offset Table (GOT).  And that *sounds* 
kinda like what the description of -fPIC in gcc(1) says.  But I'm really not 
sure.

My takeaways here:

1. Every time I think I understand linkers, I find out that I don't know 
anything about linkers.

2. You should investigate -m<model> in gcc(1) (there *may* be a performance 
penalty associated with this, but some of the text in gcc(1) looks to be fairly 
ancient, so modern x86_64 chips may or may not have this penalty).  

3. You should also investigate -fPIC.

4. If the app has giant stack/common blocks, perhaps it also would be 
sufficient to change those to allocate those data blocks from the heap.

Hope that helps!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to