Hi Nathan,
Nathan DeBardeleben writes:
I've been having this problem for a week or so and I've been asking
other people to weigh in if they know what I'm doing wrong. I've gotten
no where on this so I figure I'll finally drop it out on the list.
First, here's the important info:
The machine:
[sparkplug]~ > cat /etc/issue
Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l).
[sparkplug]~ > uname -a
Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64
x86_64 x86_64 GNU/Linux
My versions of libtool, autoconf, automake:
[sparkplug]~ > libtool --version
ltmain.sh (GNU libtool) 1.5.20 (1.1220.2.287 2005/08/31 18:54:15)
*snip*
My ompi version: 7322 - but this has been going on for a few days like I
said and I've been updating a lot, with no progress.
Configured using:
$ ./configure --enable-static --disable-shared --without-threads
--prefix=/home/ndebard/local/ompi --with-devel-headers
--enable-mca-no-build=ptl-gm
Simple C file which I will compile into a shared library:
int test_compile(int x) {
int rc;
rc = orte_init(true);
printf("rc = %d\n", rc);
return x + 1;
}
Above file is named 'testlib.c'
OK, so let's build this:
[sparkplug]~/ompi-test > mpicc -c testlib.c
[sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld:
testlib.o: relocation R_X86_64_32 can not be used when making a shared
object; recompile with -fPIC
testlib.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
OK, I don't have time to reproduce this at the moment, but I see several
issues: First, testlib.o needs to be compiled PIC (you noticed that
already).
OK so relocation problems. Maybe I'll follow the directions and -fPIC
my file myself:
[sparkplug]~/ompi-test > mpicc -c testlib.c -fPIC
[sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld:
/home/ndebard/local/ompi/lib/liborte.a(orte_init.o): relocation
R_X86_64_32 can not be used when making a shared object; recompile
with -fPIC
/home/ndebard/local/ompi/lib/liborte.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
This is the second issue: orte_init.o is not compiled PIC (surely,
as you --disable-shared). But the error here is that it tries to
link the static library into the shared one, which is wrong.
Either a Libtool or an OpenMPI bug. Please show what both of the above
mpicc calls generate.
OK so I read this as there's a relocation problem in 'liborte.a'. I
un-arred liborte.a and checked some of the files with 'file' and it says
64bit. I havn't yet written a script to check every file in here, but
here's orte_init.o:
[sparkplug]~/<1>tmp > file orte_init.o
orte_init.o: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (SYSV),
not stripped
So that at least says it's 64bit.
And to confirm, my mpicc's 64bit too:
[sparkplug]~/<1>tmp > which mpicc
/home/ndebard/local/ompi/bin/mpicc
[sparkplug]~/<1>tmp > file /home/ndebard/local/ompi/bin/mpicc
/home/ndebard/local/ompi/bin/mpicc: ELF 64-bit LSB executable, AMD
x86-64, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked
(uses shared libs), not stripped
Someone suggested I take out the 'disabled-shared' from the configure
line, so I did. The result was the same.
Are you sure you really rebuilt the library afterwards (I believe a
"make clean" in between is necessary)? Please show the link line
of liborte.la. (You can do a full build, then delete liborte.la and
type "make" again to capture its output more easily.)
So the result is that I can not build a shared library on a 64bit linux
machine that uses orte calls.
So then I tried taking out the orte calls and instead use MPI calls.
Sure, this function makes no sense but here it is now:
#include "orte_config.h"
#include <mpi.h>
int test_compile(int x) {
MPI_Comm_rank(MPI_COMM_WORLD, &x);
return x + 1;
}
And now, when I try and make a shared object I get relocation errors:
Should be the same issue.
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin
/ld:
/home/ndebard/local/ompi/lib/libmpi.a(comm_init.o): relocation
R_X86_64_32 can not be used when making a shared object; recompile
with -fPIC
/home/ndebard/local/ompi/lib/libmpi.a: could not read symbols: Bad value
So... could perhaps the build be messed up and not be really using 64bit
code?
Am I the only one seeing this? It's a trivial test for those of you
with access to a 64bit machine if you wouldn't mind testing for me.
As I said, I can probably only test this a few days from now.
Cheers,
Ralf