Re: [O-MPI devel] 64bit shared library problems
Hi Nathan, Nathan DeBardeleben writes: I've been having this problem for a week or so and I've been asking other people to weigh in if they know what I'm doing wrong. I've gotten no where on this so I figure I'll finally drop it out on the list. First, here's the important info: The machine: [sparkplug]~ > cat /etc/issue Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l). [sparkplug]~ > uname -a Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64 x86_64 x86_64 GNU/Linux My versions of libtool, autoconf, automake: [sparkplug]~ > libtool --version ltmain.sh (GNU libtool) 1.5.20 (1.1220.2.287 2005/08/31 18:54:15) *snip* My ompi version: 7322 - but this has been going on for a few days like I said and I've been updating a lot, with no progress. Configured using: $ ./configure --enable-static --disable-shared --without-threads --prefix=/home/ndebard/local/ompi --with-devel-headers --enable-mca-no-build=ptl-gm Simple C file which I will compile into a shared library: int test_compile(int x) { int rc; rc = orte_init(true); printf("rc = %d\n", rc); return x + 1; } Above file is named 'testlib.c' OK, so let's build this: [sparkplug]~/ompi-test > mpicc -c testlib.c [sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld: testlib.o: relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC testlib.o: could not read symbols: Bad value collect2: ld returned 1 exit status OK, I don't have time to reproduce this at the moment, but I see several issues: First, testlib.o needs to be compiled PIC (you noticed that already). OK so relocation problems. Maybe I'll follow the directions and -fPIC my file myself: [sparkplug]~/ompi-test > mpicc -c testlib.c -fPIC [sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld: /home/ndebard/local/ompi/lib/liborte.a(orte_init.o): relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC /home/ndebard/local/ompi/lib/liborte.a: could not read symbols: Bad value collect2: ld returned 1 exit status This is the second issue: orte_init.o is not compiled PIC (surely, as you --disable-shared). But the error here is that it tries to link the static library into the shared one, which is wrong. Either a Libtool or an OpenMPI bug. Please show what both of the above mpicc calls generate. OK so I read this as there's a relocation problem in 'liborte.a'. I un-arred liborte.a and checked some of the files with 'file' and it says 64bit. I havn't yet written a script to check every file in here, but here's orte_init.o: [sparkplug]~/<1>tmp > file orte_init.o orte_init.o: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (SYSV), not stripped So that at least says it's 64bit. And to confirm, my mpicc's 64bit too: [sparkplug]~/<1>tmp > which mpicc /home/ndebard/local/ompi/bin/mpicc [sparkplug]~/<1>tmp > file /home/ndebard/local/ompi/bin/mpicc /home/ndebard/local/ompi/bin/mpicc: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), not stripped Someone suggested I take out the 'disabled-shared' from the configure line, so I did. The result was the same. Are you sure you really rebuilt the library afterwards (I believe a "make clean" in between is necessary)? Please show the link line of liborte.la. (You can do a full build, then delete liborte.la and type "make" again to capture its output more easily.) So the result is that I can not build a shared library on a 64bit linux machine that uses orte calls. So then I tried taking out the orte calls and instead use MPI calls. Sure, this function makes no sense but here it is now: #include "orte_config.h" #include int test_compile(int x) { MPI_Comm_rank(MPI_COMM_WORLD, &x); return x + 1; } And now, when I try and make a shared object I get relocation errors: Should be the same issue. /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin /ld: /home/ndebard/local/ompi/lib/libmpi.a(comm_init.o): relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC /home/ndebard/local/ompi/lib/libmpi.a: could not read symbols: Bad value So... could perhaps the build be messed up and not be really using 64bit code? Am I the only one seeing this? It's a trivial test for those of you with access to a 64bit machine if you wouldn't mind testing for me. As I said, I can probably only test this a few days from now. Cheers, Ralf
[O-MPI devel] OMPI compile failing
Compiling I get: gcc -DHAVE_CONFIG_H -I. -I. -I../../../../include -I../../../../include -I../../../../include -I../../../.. -I../../../.. -I../../../../include -I../../../../opal -I../../../../orte -I../../../../ompi -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict-aliasing -MT btl_gm.lo -MD -MP -MF .deps/btl_gm.Tpo -c btl_gm.c -fPIC -DPIC -o .libs/btl_gm.o btl_gm.c: In function `mca_btl_gm_prepare_src': btl_gm.c:237: error: `gm_btl' undeclared (first use in this function) btl_gm.c:237: error: (Each undeclared identifier is reported only once btl_gm.c:237: error: for each function it appears in.) btl_gm.c: In function `mca_btl_gm_prepare_dst': btl_gm.c:398: warning: ISO C89 forbids mixed declarations and code btl_gm.c:404: error: structure has no member named `mpoo_retain' btl_gm.c:381: warning: unused variable `gm_btl' make[4]: *** [btl_gm.lo] Error 1 make[4]: Leaving directory `/home/ndebard/ompi/ompi/mca/btl/gm' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca/btl' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/ndebard/ompi/ompi' make: *** [all-recursive] Error 1 [sparkplug]~/ompi > I've configured using the option I thought to disable this: --enable-mca-no-build=ptl-gm I even tried --enable-mca-no-build=btl-gm. No luck. -- -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov -
Re: [O-MPI devel] OMPI compile failing
Nathan - What machine are you on? Galen - have you tried GM w/ your changes? Nathan DeBardeleben wrote: Compiling I get: gcc -DHAVE_CONFIG_H -I. -I. -I../../../../include -I../../../../include -I../../../../include -I../../../.. -I../../../.. -I../../../../include -I../../../../opal -I../../../../orte -I../../../../ompi -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict-aliasing -MT btl_gm.lo -MD -MP -MF .deps/btl_gm.Tpo -c btl_gm.c -fPIC -DPIC -o .libs/btl_gm.o btl_gm.c: In function `mca_btl_gm_prepare_src': btl_gm.c:237: error: `gm_btl' undeclared (first use in this function) btl_gm.c:237: error: (Each undeclared identifier is reported only once btl_gm.c:237: error: for each function it appears in.) btl_gm.c: In function `mca_btl_gm_prepare_dst': btl_gm.c:398: warning: ISO C89 forbids mixed declarations and code btl_gm.c:404: error: structure has no member named `mpoo_retain' btl_gm.c:381: warning: unused variable `gm_btl' make[4]: *** [btl_gm.lo] Error 1 make[4]: Leaving directory `/home/ndebard/ompi/ompi/mca/btl/gm' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca/btl' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/ndebard/ompi/ompi' make: *** [all-recursive] Error 1 [sparkplug]~/ompi > I've configured using the option I thought to disable this: --enable-mca-no-build=ptl-gm I even tried --enable-mca-no-build=btl-gm. No luck.
Re: [O-MPI devel] OMPI compile failing
I'm trying this on sparkplug. I have no real desire to use GM, so if it can be disabled then that'd be great. -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov - Tim S. Woodall wrote: Nathan - What machine are you on? Galen - have you tried GM w/ your changes? Nathan DeBardeleben wrote: Compiling I get: gcc -DHAVE_CONFIG_H -I. -I. -I../../../../include -I../../../../include -I../../../../include -I../../../.. -I../../../.. -I../../../../include -I../../../../opal -I../../../../orte -I../../../../ompi -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict-aliasing -MT btl_gm.lo -MD -MP -MF .deps/btl_gm.Tpo -c btl_gm.c -fPIC -DPIC -o .libs/btl_gm.o btl_gm.c: In function `mca_btl_gm_prepare_src': btl_gm.c:237: error: `gm_btl' undeclared (first use in this function) btl_gm.c:237: error: (Each undeclared identifier is reported only once btl_gm.c:237: error: for each function it appears in.) btl_gm.c: In function `mca_btl_gm_prepare_dst': btl_gm.c:398: warning: ISO C89 forbids mixed declarations and code btl_gm.c:404: error: structure has no member named `mpoo_retain' btl_gm.c:381: warning: unused variable `gm_btl' make[4]: *** [btl_gm.lo] Error 1 make[4]: Leaving directory `/home/ndebard/ompi/ompi/mca/btl/gm' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca/btl' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/ndebard/ompi/ompi' make: *** [all-recursive] Error 1 [sparkplug]~/ompi > I've configured using the option I thought to disable this: --enable-mca-no-build=ptl-gm I even tried --enable-mca-no-build=btl-gm. No luck. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [O-MPI devel] OMPI compile failing
Please update again (rev 7352). I run on the same problems yesterday when I compile on thor, but I didn't commit as I was thinking that I'm the only one still using GM. BTW I think the correct option to not compile GM is --without-gm at configure time. george. On Sep 13, 2005, at 4:07 PM, Nathan DeBardeleben wrote: I'm trying this on sparkplug. I have no real desire to use GM, so if it can be disabled then that'd be great. -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov - Tim S. Woodall wrote: Nathan - What machine are you on? Galen - have you tried GM w/ your changes? Nathan DeBardeleben wrote: Compiling I get: gcc -DHAVE_CONFIG_H -I. -I. -I../../../../include -I../../../../include -I../../../../include -I../../../.. -I../../../.. -I../../../../include -I../../../../opal -I../../../../orte -I../../../../ompi -g -Wall -Wundef -Wno-long- long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict- aliasing -MT btl_gm.lo -MD -MP -MF .deps/btl_gm.Tpo -c btl_gm.c -fPIC - DPIC -o .libs/btl_gm.o btl_gm.c: In function `mca_btl_gm_prepare_src': btl_gm.c:237: error: `gm_btl' undeclared (first use in this function) btl_gm.c:237: error: (Each undeclared identifier is reported only once btl_gm.c:237: error: for each function it appears in.) btl_gm.c: In function `mca_btl_gm_prepare_dst': btl_gm.c:398: warning: ISO C89 forbids mixed declarations and code btl_gm.c:404: error: structure has no member named `mpoo_retain' btl_gm.c:381: warning: unused variable `gm_btl' make[4]: *** [btl_gm.lo] Error 1 make[4]: Leaving directory `/home/ndebard/ompi/ompi/mca/btl/gm' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca/ btl' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/ndebard/ompi/ompi' make: *** [all-recursive] Error 1 [sparkplug]~/ompi > I've configured using the option I thought to disable this: --enable-mca-no-build=ptl-gm I even tried --enable-mca-no-build=btl-gm. No luck. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel "Half of what I say is meaningless; but I say it so that the other half may reach you" Kahlil Gibran
Re: [O-MPI devel] OMPI compile failing
Looking into it now.. looks like a type or two.. On Sep 13, 2005, at 1:50 PM, Nathan DeBardeleben wrote: Compiling I get: gcc -DHAVE_CONFIG_H -I. -I. -I../../../../include -I../../../../include -I../../../../include -I../../../.. -I../../../.. -I../../../../include -I../../../../opal -I../../../../orte -I../../../../ompi -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict-aliasing -MT btl_gm.lo -MD -MP -MF .deps/btl_gm.Tpo -c btl_gm.c -fPIC - DPIC -o .libs/btl_gm.o btl_gm.c: In function `mca_btl_gm_prepare_src': btl_gm.c:237: error: `gm_btl' undeclared (first use in this function) btl_gm.c:237: error: (Each undeclared identifier is reported only once btl_gm.c:237: error: for each function it appears in.) btl_gm.c: In function `mca_btl_gm_prepare_dst': btl_gm.c:398: warning: ISO C89 forbids mixed declarations and code btl_gm.c:404: error: structure has no member named `mpoo_retain' btl_gm.c:381: warning: unused variable `gm_btl' make[4]: *** [btl_gm.lo] Error 1 make[4]: Leaving directory `/home/ndebard/ompi/ompi/mca/btl/gm' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca/btl' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/ndebard/ompi/ompi' make: *** [all-recursive] Error 1 [sparkplug]~/ompi > I've configured using the option I thought to disable this: --enable-mca-no-build=ptl-gm I even tried --enable-mca-no-build=btl-gm. No luck. -- -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov - ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [O-MPI devel] OMPI compile failing
thanks george, I didn't get a chance to test this from yesterday's merge, I will do so and commit any other needed changes.. On Sep 13, 2005, at 2:18 PM, George Bosilca wrote: Please update again (rev 7352). I run on the same problems yesterday when I compile on thor, but I didn't commit as I was thinking that I'm the only one still using GM. BTW I think the correct option to not compile GM is --without-gm at configure time. george. On Sep 13, 2005, at 4:07 PM, Nathan DeBardeleben wrote: I'm trying this on sparkplug. I have no real desire to use GM, so if it can be disabled then that'd be great. -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov - Tim S. Woodall wrote: Nathan - What machine are you on? Galen - have you tried GM w/ your changes? Nathan DeBardeleben wrote: Compiling I get: gcc -DHAVE_CONFIG_H -I. -I. -I../../../../include -I../../../../include -I../../../../include -I../../../.. -I../../../.. -I../../../../include -I../../../../opal -I../../../../orte -I../../../../ompi -g -Wall -Wundef -Wno-long- long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict- aliasing -MT btl_gm.lo -MD -MP -MF .deps/btl_gm.Tpo -c btl_gm.c -fPIC - DPIC -o .libs/btl_gm.o btl_gm.c: In function `mca_btl_gm_prepare_src': btl_gm.c:237: error: `gm_btl' undeclared (first use in this function) btl_gm.c:237: error: (Each undeclared identifier is reported only once btl_gm.c:237: error: for each function it appears in.) btl_gm.c: In function `mca_btl_gm_prepare_dst': btl_gm.c:398: warning: ISO C89 forbids mixed declarations and code btl_gm.c:404: error: structure has no member named `mpoo_retain' btl_gm.c:381: warning: unused variable `gm_btl' make[4]: *** [btl_gm.lo] Error 1 make[4]: Leaving directory `/home/ndebard/ompi/ompi/mca/btl/gm' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca/ btl' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/ndebard/ompi/ompi/dynamic-mca' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/ndebard/ompi/ompi' make: *** [all-recursive] Error 1 [sparkplug]~/ompi > I've configured using the option I thought to disable this: --enable-mca-no-build=ptl-gm I even tried --enable-mca-no-build=btl-gm. No luck. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel "Half of what I say is meaningless; but I say it so that the other half may reach you" Kahlil Gibran ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[O-MPI devel] Startup/shutdown performance
Yo folks Josh ran some tests for me on Odin earlier today - the results show a major improvement in our startup/shutdown performance. As you may recall, our times grew roughly exponentially before - as the attached graph shows, they now grow roughly linearly. The data also shows that the MPI_INIT penalty is fairly small. This is due to the data exchange being "encapsulated" in the initial data sent back at the stage_1 trigger, thus avoiding any further overhead as the number of processes grows. The data was taken using the rsh launcher. We should be able to further improve our scalability once we (a) incorporate a tree-based scheme into the rsh launcher and (b) utilize a tree-based (or better) broadcast mechanism for sending the trigger messages (right now, we send them linearly across the processes). Anyway, thought you might find this of interest. Ralph []