[O-MPI devel] ompi_info Seg Fault, missing component -- linux
Sorry if this is old news, configuration problem, or whatever. I have been tied up with other things, and have not been able to follow ompi very closely. I just built openmpi-1.0a1r7305 for testing, and notice that ompi_info (and all other ompi tests) give mca: base: components_open: component linux open function failed and eventually terminate with a Seg Fault. Interestingly, the programs do seem to run pretty much correctly otherwise. System is sparc/linux (sparc64 in 32-bit user mode, SB1000). Output from 'ompi_info -a' is attached. Regards, -- Ferris McCormick (P44646, MI) Developer, Gentoo Linux (Sparc, Devrel) mca: base: components_open: component linux open function failed Open MPI: 1.0a1r7305 Open MPI SVN revision: r7305 Open RTE: 1.0a1r7305 Open RTE SVN revision: r7305 OPAL: 1.0a1r7305 OPAL SVN revision: r7305 MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component v1.0) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.0) MCA coll: self (MCA v1.0, API v1.0, Component v1.0) MCA io: romio (MCA v1.0, API v1.0, Component v1.0) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0) MCA pml: teg (MCA v1.0, API v1.0, Component v1.0) MCA pml: uniq (MCA v1.0, API v1.0, Component v1.0) MCA ptl: self (MCA v1.0, API v1.0, Component v1.0) MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0) MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA btl: self (MCA v1.0, API v1.0, Component v1.0) MCA btl: sm (MCA v1.0, API v1.0, Component v1.0) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.0) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0) MCA iof: svc (MCA v1.0, API v1.0, Component v1.0) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0) MCA ns: replica (MCA v1.0, API v1.0, Component v1.0) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: host (MCA v1.0, API v1.0, Component v1.0) MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0) MCA ras: tm (MCA v1.0, API v1.0, Component v1.0) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0) MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0) MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0) MCA rml: oob (MCA v1.0, API v1.0, Component v1.0) MCA pls: fork (MCA v1.0, API v1.0, Component v1.0) MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0) MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0) MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0) MCA pls: tm (MCA v1.0, API v1.0, Component v1.0) MCA sds: env (MCA v1.0, API v1.0, Component v1.0) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0) MCA sds: seed (MCA v1.0, API v1.0, Component v1.0) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0) MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0) Prefix: /homes/cache Bindir: /homes/cache/bin Libdir: /homes/cache/lib Incdir: /homes/cache/include Pkglibdir: /homes/cache/lib/openmpi Sysconfdir: /homes/cache/etc Configured architecture: sparc64-unknown-linux-gnu Configured by: fmccor Configured on: Mon Sep 12 14:24:23 UTC 2005 Configure host: polylepis Built by: ferris Built on: Mon Sep 12 14:42:46 UTC 2005 Built host: polylepis C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: no C compiler: gcc C compiler absolute: /usr/bin/gcc C char size: 1 C bool size: 1 C short size: 2 C int size: 4 C long size: 4 C float size: 4
Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux
Thanks for the heads up. We are not seeing this on other platforms, so it might be a Sparc-specific issue. Any chance you could compile with debugging symbols and generate a backtrace? Also, could you send the contents of /proc/cpuinfo (long story...)? Thanks! Brian On Sep 12, 2005, at 10:23 AM, Ferris McCormick wrote: Sorry if this is old news, configuration problem, or whatever. I have been tied up with other things, and have not been able to follow ompi very closely. I just built openmpi-1.0a1r7305 for testing, and notice that ompi_info (and all other ompi tests) give mca: base: components_open: component linux open function failed and eventually terminate with a Seg Fault. Interestingly, the programs do seem to run pretty much correctly otherwise. System is sparc/linux (sparc64 in 32-bit user mode, SB1000). Output from 'ompi_info -a' is attached. Regards, -- Ferris McCormick (P44646, MI) Developer, Gentoo Linux (Sparc, Devrel) mca: base: components_open: component linux open function failed Open MPI: 1.0a1r7305 Open MPI SVN revision: r7305 Open RTE: 1.0a1r7305 Open RTE SVN revision: r7305 OPAL: 1.0a1r7305 OPAL SVN revision: r7305 MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component v1.0) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.0) MCA coll: self (MCA v1.0, API v1.0, Component v1.0) MCA io: romio (MCA v1.0, API v1.0, Component v1.0) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0) MCA pml: teg (MCA v1.0, API v1.0, Component v1.0) MCA pml: uniq (MCA v1.0, API v1.0, Component v1.0) MCA ptl: self (MCA v1.0, API v1.0, Component v1.0) MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0) MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA btl: self (MCA v1.0, API v1.0, Component v1.0) MCA btl: sm (MCA v1.0, API v1.0, Component v1.0) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.0) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0) MCA iof: svc (MCA v1.0, API v1.0, Component v1.0) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0) MCA ns: replica (MCA v1.0, API v1.0, Component v1.0) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: host (MCA v1.0, API v1.0, Component v1.0) MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0) MCA ras: tm (MCA v1.0, API v1.0, Component v1.0) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0) MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0) MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0) MCA rml: oob (MCA v1.0, API v1.0, Component v1.0) MCA pls: fork (MCA v1.0, API v1.0, Component v1.0) MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0) MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0) MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0) MCA pls: tm (MCA v1.0, API v1.0, Component v1.0) MCA sds: env (MCA v1.0, API v1.0, Component v1.0) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0) MCA sds: seed (MCA v1.0, API v1.0, Component v1.0) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0) MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0) Prefix: /homes/cache Bindir: /homes/cache/bin Libdir: /homes/cache/lib Incdir: /homes/cache/include Pkglibdir: /homes/cache/lib/openmpi Sysconfdir: /homes/cache/etc Configured architecture: sparc64-unknown-linux-gnu Configured by: fmccor Configured on: Mon Sep 12 14:24:23 UTC 2005 Configure host: polylepis Built by: ferris Built on: Mon Sep 12 14:42:46 UTC 2005 Built host: polylepis C
Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux
On Mon, 2005-09-12 at 11:14 -0500, Brian Barrett wrote: > Thanks for the heads up. We are not seeing this on other platforms, > so it might be a Sparc-specific issue. Any chance you could compile > with debugging symbols and generate a backtrace? Also, could you > send the contents of /proc/cpuinfo (long story...)? > > Thanks! > > Brian > Here's /proc/cpuinfo from the SB1000: = fmccor@polylepis AGT [93]% cat /proc/cpuinfo cpu : TI UltraSparc III (Cheetah) fpu : UltraSparc III integrated FPU promlib : Version 3 Revision 13 prom: 4.13.0 type: sun4u ncpus probed: 2 ncpus active: 2 Cpu0Bogo: 598.01 Cpu0ClkTck : 35a4e900 Cpu1Bogo: 598.01 Cpu1ClkTck : 35a4e900 MMU Type: Cheetah State: CPU0: online CPU1: online And here's a back-trace from ompi_info: == Program received signal SIGSEGV, Segmentation fault. opal_output_close (output_id=1880710872) at opal_object.h:409 409 for (i = 0; i < cls->cls_depth; i++) { Current language: auto; currently c (gdb) bt #0 opal_output_close (output_id=1880710872) at opal_object.h:4 #1 0x700d8e00 in mca_topo_base_close () at topo_base_close.c:46 #2 0x00016aa4 in close_components () at components.cc:254 #3 0x00018bbc in main (argc=1, argv=0xefa253f4) at ompi_info.cc:251 = HOWEVER: If I configure with --enable-debug, two things happen: 1. I have to build ompi/mca/rcache/rb by hand because of incorrect CFLAGS; 2. The SegFault disappears. (The line# in #0 above is incorrect; by accident I edited the email as I was writing it and erased too much. I can rebuild with '-g' but not with --enable-debug if necessary.) Other failing system: fmccor@lacewing openmpi-1.0a1r7305 [96]% cat /proc/cpuinfo cpu : TI UltraSparc II (BlackBird) fpu : UltraSparc II integrated FPU promlib : Version 3 Revision 19 prom: 3.19.0 type: sun4u ncpus probed: 2 ncpus active: 2 Cpu0Bogo: 799.53 Cpu0ClkTck : 17d746a8 Cpu1Bogo: 799.53 Cpu1ClkTck : 17d746a8 MMU Type: Spitfire State: CPU0: online CPU1: online === Regards, -- Ferris McCormick (P44646, MI) Developer, Gentoo Linux (Sparc, Devrel) signature.asc Description: This is a digitally signed message part
Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux
On Sep 12, 2005, at 2:05 PM, Ferris McCormick wrote: HOWEVER: If I configure with --enable-debug, two things happen: 1. I have to build ompi/mca/rcache/rb by hand because of incorrect CFLAGS; FWIW, the rcache guys are currently off working in a /tmp branch, and they have fixed this problem over there. The results of their work are expected to be brought over to the trunk "soon". -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux
Ok, I see what's happening, although I'm not sure the two problems are actually related. The first is that the component to provide high resolution timer support on Linux is disabling itself because: 1) it doesn't know how to figure out the clock rate of the CPU 2) there's no assembly for reading a CPU counter on SPARC chips The only reliable way to get CPU frequency is reading /proc/cpuinfo, and for Linux, each architecture seems to have a different format. So that part's covered with the information provided below. Now I just need to figure out how to get cycle counts out of a SPARC. So much easier on Solaris ;). Brian On Sep 12, 2005, at 1:05 PM, Ferris McCormick wrote: On Mon, 2005-09-12 at 11:14 -0500, Brian Barrett wrote: Thanks for the heads up. We are not seeing this on other platforms, so it might be a Sparc-specific issue. Any chance you could compile with debugging symbols and generate a backtrace? Also, could you send the contents of /proc/cpuinfo (long story...)? Thanks! Brian Here's /proc/cpuinfo from the SB1000: = fmccor@polylepis AGT [93]% cat /proc/cpuinfo cpu : TI UltraSparc III (Cheetah) fpu : UltraSparc III integrated FPU promlib : Version 3 Revision 13 prom: 4.13.0 type: sun4u ncpus probed: 2 ncpus active: 2 Cpu0Bogo: 598.01 Cpu0ClkTck : 35a4e900 Cpu1Bogo: 598.01 Cpu1ClkTck : 35a4e900 MMU Type: Cheetah State: CPU0: online CPU1: online And here's a back-trace from ompi_info: == Program received signal SIGSEGV, Segmentation fault. opal_output_close (output_id=1880710872) at opal_object.h:409 409 for (i = 0; i < cls->cls_depth; i++) { Current language: auto; currently c (gdb) bt #0 opal_output_close (output_id=1880710872) at opal_object.h:4 #1 0x700d8e00 in mca_topo_base_close () at topo_base_close.c:46 #2 0x00016aa4 in close_components () at components.cc:254 #3 0x00018bbc in main (argc=1, argv=0xefa253f4) at ompi_info.cc:251 = HOWEVER: If I configure with --enable-debug, two things happen: 1. I have to build ompi/mca/rcache/rb by hand because of incorrect CFLAGS; 2. The SegFault disappears. (The line# in #0 above is incorrect; by accident I edited the email as I was writing it and erased too much. I can rebuild with '-g' but not with --enable-debug if necessary.) Other failing system: fmccor@lacewing openmpi-1.0a1r7305 [96]% cat /proc/cpuinfo cpu : TI UltraSparc II (BlackBird) fpu : UltraSparc II integrated FPU promlib : Version 3 Revision 19 prom: 3.19.0 type: sun4u ncpus probed: 2 ncpus active: 2 Cpu0Bogo: 799.53 Cpu0ClkTck : 17d746a8 Cpu1Bogo: 799.53 Cpu1ClkTck : 17d746a8 MMU Type: Spitfire State: CPU0: online CPU1: online === Regards, -- Ferris McCormick (P44646, MI) Developer, Gentoo Linux (Sparc, Devrel) ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux
On Mon, 2005-09-12 at 13:34 -0500, Brian Barrett wrote: > Ok, I see what's happening, although I'm not sure the two problems > are actually related. The first is that the component to provide > high resolution timer support on Linux is disabling itself because: > >1) it doesn't know how to figure out the clock rate of the CPU >2) there's no assembly for reading a CPU counter on SPARC chips > > The only reliable way to get CPU frequency is reading /proc/cpuinfo, > and for Linux, each architecture seems to have a different format. > So that part's covered with the information provided below. Now I > just need to figure out how to get cycle counts out of a SPARC. So > much easier on Solaris ;). > > Brian > Some information that might help: The SB1000 is a (2x900MHz) Ultrasparc III, the second system is a (2x400MHz) Ultrasparc II. The SB1000 is well over twice as fast as the U2. Here is a (2x450MHz) Ultrasparc II (U60 system): fmccor@antaresia openmpi-1.0a1r7305 [33]% cat /proc/cpuinfo cpu : TI UltraSparc II (BlackBird) fpu : UltraSparc II integrated FPU promlib : Version 3 Revision 29 prom: 3.29.0 type: sun4u ncpus probed: 2 ncpus active: 2 Cpu0Bogo: 897.84 Cpu0ClkTck : 1ad2f5d5 Cpu2Bogo: 897.84 Cpu2ClkTck : 1ad2f5d5 MMU Type: Spitfire State: CPU0: online CPU2: online I think what you need to look at is the 'Cpu?ClkTck' values, if 900 --> 35a4e900 450 --> 1ad2f5d5 400 --> 17d746a8 is useful. If you need more, you can try joining #gentoo-sparc on IRC freenode and explaining exactly what you need; there are people there who probably can help. At this point, though, I am giving you more than I know, which can always be misleading. > >> > > Here's /proc/cpuinfo from the SB1000: > > = > > fmccor@polylepis AGT [93]% cat /proc/cpuinfo > > cpu : TI UltraSparc III (Cheetah) > > fpu : UltraSparc III integrated FPU > > promlib : Version 3 Revision 13 > > prom: 4.13.0 > > type: sun4u > > ncpus probed: 2 > > ncpus active: 2 > > Cpu0Bogo: 598.01 > > Cpu0ClkTck : 35a4e900 > > Cpu1Bogo: 598.01 > > Cpu1ClkTck : 35a4e900 > > MMU Type: Cheetah > > State: > > CPU0: online > > CPU1: online > > > > > > > > Other failing system: > > fmccor@lacewing openmpi-1.0a1r7305 [96]% cat /proc/cpuinfo > > cpu : TI UltraSparc II (BlackBird) > > fpu : UltraSparc II integrated FPU > > promlib : Version 3 Revision 19 > > prom: 3.19.0 > > type: sun4u > > ncpus probed: 2 > > ncpus active: 2 > > Cpu0Bogo: 799.53 > > Cpu0ClkTck : 17d746a8 > > Cpu1Bogo: 799.53 > > Cpu1ClkTck : 17d746a8 > > MMU Type: Spitfire > > State: > > CPU0: online > > CPU1: online > > === Regards, -- Ferris McCormick (P44646, MI) Developer, Gentoo Linux (Sparc, Devrel) signature.asc Description: This is a digitally signed message part
[O-MPI devel] svn merge: lessons learned
Lesson learned the hard way... If you're going to make a branch into /tmp, it is STRONGLY ADVISED to cp an ***UNMODIFIED /trunk*** (i.e., do not have any local edits on the /trunk that you're copying). Then make/apply all your changes in a new checkout of your /tmp tree and go from there. This will make it SIGNIFICANTLY easier to merge your /tmp branch back into the trunk when you're done. Just FYI. -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
[O-MPI devel] 64bit shared library problems
I've been having this problem for a week or so and I've been asking other people to weigh in if they know what I'm doing wrong. I've gotten no where on this so I figure I'll finally drop it out on the list. First, here's the important info: The machine: [sparkplug]~ > cat /etc/issue Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l). [sparkplug]~ > uname -a Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64 x86_64 x86_64 GNU/Linux My versions of libtool, autoconf, automake: [sparkplug]~ > libtool --version ltmain.sh (GNU libtool) 1.5.20 (1.1220.2.287 2005/08/31 18:54:15) Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [sparkplug]~ > autoconf --version autoconf (GNU Autoconf) 2.59 Written by David J. MacKenzie and Akim Demaille. Copyright (C) 2003 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [sparkplug]~ > automake --version automake (GNU automake) 1.8.5 Written by Tom Tromey . Copyright 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [sparkplug]~ > My ompi version: 7322 - but this has been going on for a few days like I said and I've been updating a lot, with no progress. Configured using: $ ./configure --enable-static --disable-shared --without-threads --prefix=/home/ndebard/local/ompi --with-devel-headers --enable-mca-no-build=ptl-gm Simple C file which I will compile into a shared library: int test_compile(int x) { int rc; rc = orte_init(true); printf("rc = %d\n", rc); return x + 1; } Above file is named 'testlib.c' OK, so let's build this: [sparkplug]~/ompi-test > mpicc -c testlib.c [sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld: testlib.o: relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC testlib.o: could not read symbols: Bad value collect2: ld returned 1 exit status OK so relocation problems. Maybe I'll follow the directions and -fPIC my file myself: [sparkplug]~/ompi-test > mpicc -c testlib.c -fPIC [sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld: /home/ndebard/local/ompi/lib/liborte.a(orte_init.o): relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC /home/ndebard/local/ompi/lib/liborte.a: could not read symbols: Bad value collect2: ld returned 1 exit status OK so I read this as there's a relocation problem in 'liborte.a'. I un-arred liborte.a and checked some of the files with 'file' and it says 64bit. I havn't yet written a script to check every file in here, but here's orte_init.o: [sparkplug]~/<1>tmp > file orte_init.o orte_init.o: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (SYSV), not stripped So that at least says it's 64bit. And to confirm, my mpicc's 64bit too: [sparkplug]~/<1>tmp > which mpicc /home/ndebard/local/ompi/bin/mpicc [sparkplug]~/<1>tmp > file /home/ndebard/local/ompi/bin/mpicc /home/ndebard/local/ompi/bin/mpicc: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), not stripped Someone suggested I take out the 'disabled-shared' from the configure line, so I did. The result was the same. So the result is that I can not build a shared library on a 64bit linux machine that uses orte calls. So then I tried taking out the orte calls and instead use MPI calls. Sure, this function makes no sense but here it is now: #include "orte_config.h" #include int test_compile(int x) { MPI_Comm_rank(MPI_COMM_WORLD, &x); return x + 1; } And now, when I try and make a shared object I get relocation errors: /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld: /home/ndebard/local/ompi/lib/libmpi.a(comm_init.o): relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC /home/ndebard/local/ompi/lib/libmpi.a: could not read symbols: Bad value So... could perhaps the build be messed up and not be really using 64bit code? Am I the only one seeing this? It's a trivial test for those of you with access to a 64bit machine if you wouldn't mind testing for me. Help would be greatly appreciated. -- -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov
Re: [O-MPI devel] 64bit shared library problems
Maybe I'm dense -- I thought you couldn't use --shared when linking to a static library...? If you want to build OMPI as a shared library, then ditch the --enable-static --disable-shared from your configure line (building OMPI as shared is the default, which is how I build 95% of the time). On Sep 12, 2005, at 5:47 PM, Nathan DeBardeleben wrote: I've been having this problem for a week or so and I've been asking other people to weigh in if they know what I'm doing wrong. I've gotten no where on this so I figure I'll finally drop it out on the list. First, here's the important info: The machine: [sparkplug]~ > cat /etc/issue Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l). [sparkplug]~ > uname -a Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64 x86_64 x86_64 GNU/Linux My versions of libtool, autoconf, automake: [sparkplug]~ > libtool --version ltmain.sh (GNU libtool) 1.5.20 (1.1220.2.287 2005/08/31 18:54:15) Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [sparkplug]~ > autoconf --version autoconf (GNU Autoconf) 2.59 Written by David J. MacKenzie and Akim Demaille. Copyright (C) 2003 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [sparkplug]~ > automake --version automake (GNU automake) 1.8.5 Written by Tom Tromey . Copyright 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [sparkplug]~ > My ompi version: 7322 - but this has been going on for a few days like I said and I've been updating a lot, with no progress. Configured using: $ ./configure --enable-static --disable-shared --without-threads --prefix=/home/ndebard/local/ompi --with-devel-headers --enable-mca-no-build=ptl-gm Simple C file which I will compile into a shared library: int test_compile(int x) { int rc; rc = orte_init(true); printf("rc = %d\n", rc); return x + 1; } Above file is named 'testlib.c' OK, so let's build this: [sparkplug]~/ompi-test > mpicc -c testlib.c [sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- linux/bin/ld: testlib.o: relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC testlib.o: could not read symbols: Bad value collect2: ld returned 1 exit status OK so relocation problems. Maybe I'll follow the directions and -fPIC my file myself: [sparkplug]~/ompi-test > mpicc -c testlib.c -fPIC [sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- linux/bin/ld: /home/ndebard/local/ompi/lib/liborte.a(orte_init.o): relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC /home/ndebard/local/ompi/lib/liborte.a: could not read symbols: Bad value collect2: ld returned 1 exit status OK so I read this as there's a relocation problem in 'liborte.a'. I un-arred liborte.a and checked some of the files with 'file' and it says 64bit. I havn't yet written a script to check every file in here, but here's orte_init.o: [sparkplug]~/<1>tmp > file orte_init.o orte_init.o: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (SYSV), not stripped So that at least says it's 64bit. And to confirm, my mpicc's 64bit too: [sparkplug]~/<1>tmp > which mpicc /home/ndebard/local/ompi/bin/mpicc [sparkplug]~/<1>tmp > file /home/ndebard/local/ompi/bin/mpicc /home/ndebard/local/ompi/bin/mpicc: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), not stripped Someone suggested I take out the 'disabled-shared' from the configure line, so I did. The result was the same. So the result is that I can not build a shared library on a 64bit linux machine that uses orte calls. So then I tried taking out the orte calls and instead use MPI calls. Sure, this function makes no sense but here it is now: #include "orte_config.h" #include int test_compile(int x) { MPI_Comm_rank(MPI_COMM_WORLD, &x); return x + 1; } And now, when I try and make a shared object I get relocation errors: /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- linux/bin/ld: /home/ndebard/local/ompi/lib/libmpi.a(comm_init.o): relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC /home/ndebard/local/ompi/lib/libmpi.a: could not read symbols: Bad value So... could perhaps the build be messed up and not be really using 64bit code? Am I the only one seeing this? It's a trivial test for those of you with access to