Re: [OMPI devel] RFC: add support for large counts using derived datatypes

Jeff Squyres (jsquyres) Wed, 17 Jul 2013 11:02:21 -0400

On Jul 17, 2013, at 10:48 AM, Nathan Hjelm <hje...@lanl.gov> wrote:

> I must be missing something here. type_size.c contains MPI_Type_size and 
> MPI_Type_size_x and I see all the MPI and PMPI variants in the resulting .so, 
> .dylib, and .a.



If you have a nathan.c file with:

-----
void MPI_foo() { ... }
void MPI_bar() { ... }
-----

This will result in defining both symbols in that nathan.o file, which ends up 
in libmpi.so.

Then if someone writes a code like this:

-----
int main() {
    MPI_Init();
    MPI_Foo();
    MPI_Bar();
    MPI_Finalize();
    return 0;
}
-----

And then they interpose their own version of MPI_Bar() with their 
libinterposition.so, *it won't work* (meaning their version of MPI_Bar() won't 
be called).  

This happens because the linker will first see MPI_Foo() in main and resolves 
it.  When it resolves the MPI_Foo symbol, it pulls *all* symbols out of the .o 
from where MPI_Foo came (i.e., nathan.o in libmpi.so) -- i.e., including 
MPI_Bar.  

So when MPI_Bar goes to get executed, it's *already been resolved* to the one 
in nathan.o/libmpi.so, not the one from libinterposition.so.

Even worse, if they reversed the order of foo/bar in main, then the linker 
would likely give you a duplicate symbol error because it will first resolve 
MPI_Bar from libinterposition.so, and then later resolve MPI_Foo from 
libmpi.so, but it will also pull MPI_Bar from libmpi.so -- kaboom.

Linkers are insanely complicated.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] RFC: add support for large counts using derived datatypes

Reply via email to