[O-MPI devel] ompi_info Seg Fault, missing component -- linux

2005-09-12 Thread Ferris McCormick
Sorry if this is old news, configuration problem, or whatever.  I have
been tied up with other things, and have not been able to follow ompi
very closely.

I just built openmpi-1.0a1r7305 for testing, and notice that ompi_info
(and all other ompi tests) give
mca: base: components_open: component linux open function failed
and eventually terminate with a Seg Fault.  Interestingly, the programs
do seem to run pretty much correctly otherwise.

System is sparc/linux (sparc64 in 32-bit user mode, SB1000).

Output from 'ompi_info -a' is attached.

Regards,
-- 
Ferris McCormick (P44646, MI) 
Developer, Gentoo Linux (Sparc, Devrel)
mca: base: components_open: component linux open function failed
Open MPI: 1.0a1r7305
   Open MPI SVN revision: r7305
Open RTE: 1.0a1r7305
   Open RTE SVN revision: r7305
OPAL: 1.0a1r7305
   OPAL SVN revision: r7305
  MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component v1.0)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.0)
MCA coll: self (MCA v1.0, API v1.0, Component v1.0)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.0)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0)
 MCA pml: teg (MCA v1.0, API v1.0, Component v1.0)
 MCA pml: uniq (MCA v1.0, API v1.0, Component v1.0)
 MCA ptl: self (MCA v1.0, API v1.0, Component v1.0)
 MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0)
 MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA btl: self (MCA v1.0, API v1.0, Component v1.0)
 MCA btl: sm (MCA v1.0, API v1.0, Component v1.0)
 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0)
 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0)
 MCA iof: svc (MCA v1.0, API v1.0, Component v1.0)
  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0)
  MCA ns: replica (MCA v1.0, API v1.0, Component v1.0)
 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: host (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: tm (MCA v1.0, API v1.0, Component v1.0)
 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0)
 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0)
   MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0)
MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0)
MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0)
 MCA rml: oob (MCA v1.0, API v1.0, Component v1.0)
 MCA pls: fork (MCA v1.0, API v1.0, Component v1.0)
 MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0)
 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0)
 MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0)
 MCA pls: tm (MCA v1.0, API v1.0, Component v1.0)
 MCA sds: env (MCA v1.0, API v1.0, Component v1.0)
 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0)
 MCA sds: seed (MCA v1.0, API v1.0, Component v1.0)
 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0)
 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0)
  Prefix: /homes/cache
  Bindir: /homes/cache/bin
  Libdir: /homes/cache/lib
  Incdir: /homes/cache/include
   Pkglibdir: /homes/cache/lib/openmpi
  Sysconfdir: /homes/cache/etc
 Configured architecture: sparc64-unknown-linux-gnu
   Configured by: fmccor
   Configured on: Mon Sep 12 14:24:23 UTC 2005
  Configure host: polylepis
Built by: ferris
Built on: Mon Sep 12 14:42:46 UTC 2005
  Built host: polylepis
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: no
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
 C char size: 1
 C bool size: 1
C short size: 2
  C int size: 4
 C long size: 4
C float size: 4
  

Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux

2005-09-12 Thread Brian Barrett
Thanks for the heads up.  We are not seeing this on other platforms,  
so it might be a Sparc-specific issue.  Any chance you could compile  
with debugging symbols and generate a backtrace?  Also, could you  
send the contents of /proc/cpuinfo (long story...)?


Thanks!

Brian

On Sep 12, 2005, at 10:23 AM, Ferris McCormick wrote:


Sorry if this is old news, configuration problem, or whatever.  I have
been tied up with other things, and have not been able to follow ompi
very closely.

I just built openmpi-1.0a1r7305 for testing, and notice that ompi_info
(and all other ompi tests) give
mca: base: components_open: component linux open function failed
and eventually terminate with a Seg Fault.  Interestingly, the  
programs

do seem to run pretty much correctly otherwise.

System is sparc/linux (sparc64 in 32-bit user mode, SB1000).

Output from 'ompi_info -a' is attached.

Regards,
--
Ferris McCormick (P44646, MI) 
Developer, Gentoo Linux (Sparc, Devrel)
mca: base: components_open: component linux open function failed
Open MPI: 1.0a1r7305
   Open MPI SVN revision: r7305
Open RTE: 1.0a1r7305
   Open RTE SVN revision: r7305
OPAL: 1.0a1r7305
   OPAL SVN revision: r7305
  MCA memory: malloc_hooks (MCA v1.0, API v1.0,  
Component v1.0)

   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component  
v1.0)

   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.0)
MCA coll: self (MCA v1.0, API v1.0, Component v1.0)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.0)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0)
 MCA pml: teg (MCA v1.0, API v1.0, Component v1.0)
 MCA pml: uniq (MCA v1.0, API v1.0, Component v1.0)
 MCA ptl: self (MCA v1.0, API v1.0, Component v1.0)
 MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0)
 MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA btl: self (MCA v1.0, API v1.0, Component v1.0)
 MCA btl: sm (MCA v1.0, API v1.0, Component v1.0)
 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0)
 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0)
 MCA iof: svc (MCA v1.0, API v1.0, Component v1.0)
  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0)
  MCA ns: replica (MCA v1.0, API v1.0, Component v1.0)
 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: host (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: tm (MCA v1.0, API v1.0, Component v1.0)
 MCA rds: hostfile (MCA v1.0, API v1.0, Component  
v1.0)

 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0)
   MCA rmaps: round_robin (MCA v1.0, API v1.0,  
Component v1.0)

MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0)
MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0)
 MCA rml: oob (MCA v1.0, API v1.0, Component v1.0)
 MCA pls: fork (MCA v1.0, API v1.0, Component v1.0)
 MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0)
 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0)
 MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0)
 MCA pls: tm (MCA v1.0, API v1.0, Component v1.0)
 MCA sds: env (MCA v1.0, API v1.0, Component v1.0)
 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0)
 MCA sds: seed (MCA v1.0, API v1.0, Component v1.0)
 MCA sds: singleton (MCA v1.0, API v1.0, Component  
v1.0)

 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0)
  Prefix: /homes/cache
  Bindir: /homes/cache/bin
  Libdir: /homes/cache/lib
  Incdir: /homes/cache/include
   Pkglibdir: /homes/cache/lib/openmpi
  Sysconfdir: /homes/cache/etc
 Configured architecture: sparc64-unknown-linux-gnu
   Configured by: fmccor
   Configured on: Mon Sep 12 14:24:23 UTC 2005
  Configure host: polylepis
Built by: ferris
Built on: Mon Sep 12 14:42:46 UTC 2005
  Built host: polylepis
  C

Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux

2005-09-12 Thread Ferris McCormick
On Mon, 2005-09-12 at 11:14 -0500, Brian Barrett wrote:
> Thanks for the heads up.  We are not seeing this on other platforms,  
> so it might be a Sparc-specific issue.  Any chance you could compile  
> with debugging symbols and generate a backtrace?  Also, could you  
> send the contents of /proc/cpuinfo (long story...)?
> 
> Thanks!
> 
> Brian
> 
Here's /proc/cpuinfo from the SB1000:
=
fmccor@polylepis AGT [93]% cat /proc/cpuinfo
cpu : TI UltraSparc III (Cheetah)
fpu : UltraSparc III integrated FPU
promlib : Version 3 Revision 13
prom: 4.13.0
type: sun4u
ncpus probed: 2
ncpus active: 2
Cpu0Bogo: 598.01
Cpu0ClkTck  : 35a4e900
Cpu1Bogo: 598.01
Cpu1ClkTck  : 35a4e900
MMU Type: Cheetah
State:
CPU0:   online
CPU1:   online


And here's a back-trace from ompi_info:
==
Program received signal SIGSEGV, Segmentation fault.
opal_output_close (output_id=1880710872) at opal_object.h:409
409 for (i = 0; i < cls->cls_depth; i++) {
Current language:  auto; currently c
(gdb) bt
#0  opal_output_close (output_id=1880710872) at opal_object.h:4
#1  0x700d8e00 in mca_topo_base_close () at topo_base_close.c:46
#2  0x00016aa4 in close_components () at components.cc:254
#3  0x00018bbc in main (argc=1, argv=0xefa253f4) at ompi_info.cc:251
=
HOWEVER:  If I configure with --enable-debug, two things happen:
1.  I have to build ompi/mca/rcache/rb by hand because of incorrect
CFLAGS;
2.  The SegFault disappears.

(The line# in #0 above is incorrect; by accident I edited the email as I
was writing it and erased too much.  I can rebuild with '-g' but not
with --enable-debug if necessary.)

Other failing system:
fmccor@lacewing openmpi-1.0a1r7305 [96]% cat /proc/cpuinfo
cpu : TI UltraSparc II  (BlackBird)
fpu : UltraSparc II integrated FPU
promlib : Version 3 Revision 19
prom: 3.19.0
type: sun4u
ncpus probed: 2
ncpus active: 2
Cpu0Bogo: 799.53
Cpu0ClkTck  : 17d746a8
Cpu1Bogo: 799.53
Cpu1ClkTck  : 17d746a8
MMU Type: Spitfire
State:
CPU0:   online
CPU1:   online
===
Regards,


-- 
Ferris McCormick (P44646, MI) 
Developer, Gentoo Linux (Sparc, Devrel)


signature.asc
Description: This is a digitally signed message part


Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux

2005-09-12 Thread Jeff Squyres

On Sep 12, 2005, at 2:05 PM, Ferris McCormick wrote:


HOWEVER:  If I configure with --enable-debug, two things happen:
1.  I have to build ompi/mca/rcache/rb by hand because of incorrect
CFLAGS;


FWIW, the rcache guys are currently off working in a /tmp branch, and 
they have fixed this problem over there.


The results of their work are expected to be brought over to the trunk 
"soon".


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux

2005-09-12 Thread Brian Barrett
Ok, I see what's happening, although I'm not sure the two problems  
are actually related.  The first is that the component to provide  
high resolution timer support on Linux is disabling itself because:


  1) it doesn't know how to figure out the clock rate of the CPU
  2) there's no assembly for reading a CPU counter on SPARC chips

The only reliable way to get CPU frequency is reading /proc/cpuinfo,  
and for Linux, each architecture seems to have a different format.   
So that part's covered with the information provided below.  Now I  
just need to figure out how to get cycle counts out of a SPARC.  So  
much easier on Solaris ;).


Brian

On Sep 12, 2005, at 1:05 PM, Ferris McCormick wrote:


On Mon, 2005-09-12 at 11:14 -0500, Brian Barrett wrote:


Thanks for the heads up.  We are not seeing this on other platforms,
so it might be a Sparc-specific issue.  Any chance you could compile
with debugging symbols and generate a backtrace?  Also, could you
send the contents of /proc/cpuinfo (long story...)?

Thanks!

Brian



Here's /proc/cpuinfo from the SB1000:
=
fmccor@polylepis AGT [93]% cat /proc/cpuinfo
cpu : TI UltraSparc III (Cheetah)
fpu : UltraSparc III integrated FPU
promlib : Version 3 Revision 13
prom: 4.13.0
type: sun4u
ncpus probed: 2
ncpus active: 2
Cpu0Bogo: 598.01
Cpu0ClkTck  : 35a4e900
Cpu1Bogo: 598.01
Cpu1ClkTck  : 35a4e900
MMU Type: Cheetah
State:
CPU0:   online
CPU1:   online


And here's a back-trace from ompi_info:
==
Program received signal SIGSEGV, Segmentation fault.
opal_output_close (output_id=1880710872) at opal_object.h:409
409 for (i = 0; i < cls->cls_depth; i++) {
Current language:  auto; currently c
(gdb) bt
#0  opal_output_close (output_id=1880710872) at opal_object.h:4
#1  0x700d8e00 in mca_topo_base_close () at topo_base_close.c:46
#2  0x00016aa4 in close_components () at components.cc:254
#3  0x00018bbc in main (argc=1, argv=0xefa253f4) at ompi_info.cc:251
=
HOWEVER:  If I configure with --enable-debug, two things happen:
1.  I have to build ompi/mca/rcache/rb by hand because of incorrect
CFLAGS;
2.  The SegFault disappears.

(The line# in #0 above is incorrect; by accident I edited the email  
as I

was writing it and erased too much.  I can rebuild with '-g' but not
with --enable-debug if necessary.)

Other failing system:
fmccor@lacewing openmpi-1.0a1r7305 [96]% cat /proc/cpuinfo
cpu : TI UltraSparc II  (BlackBird)
fpu : UltraSparc II integrated FPU
promlib : Version 3 Revision 19
prom: 3.19.0
type: sun4u
ncpus probed: 2
ncpus active: 2
Cpu0Bogo: 799.53
Cpu0ClkTck  : 17d746a8
Cpu1Bogo: 799.53
Cpu1ClkTck  : 17d746a8
MMU Type: Spitfire
State:
CPU0:   online
CPU1:   online
===
Regards,


--
Ferris McCormick (P44646, MI) 
Developer, Gentoo Linux (Sparc, Devrel)
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux

2005-09-12 Thread Ferris McCormick
On Mon, 2005-09-12 at 13:34 -0500, Brian Barrett wrote:
> Ok, I see what's happening, although I'm not sure the two problems  
> are actually related.  The first is that the component to provide  
> high resolution timer support on Linux is disabling itself because:
> 
>1) it doesn't know how to figure out the clock rate of the CPU
>2) there's no assembly for reading a CPU counter on SPARC chips
> 
> The only reliable way to get CPU frequency is reading /proc/cpuinfo,  
> and for Linux, each architecture seems to have a different format.   
> So that part's covered with the information provided below.  Now I  
> just need to figure out how to get cycle counts out of a SPARC.  So  
> much easier on Solaris ;).
> 
> Brian
> 
Some information that might help:
The SB1000 is a (2x900MHz) Ultrasparc III, the second system is a
(2x400MHz) Ultrasparc II.  The SB1000 is well over twice as fast as the
U2.

Here is a (2x450MHz) Ultrasparc II (U60 system):

fmccor@antaresia openmpi-1.0a1r7305 [33]% cat /proc/cpuinfo
cpu : TI UltraSparc II  (BlackBird)
fpu : UltraSparc II integrated FPU
promlib : Version 3 Revision 29
prom: 3.29.0
type: sun4u
ncpus probed: 2
ncpus active: 2
Cpu0Bogo: 897.84
Cpu0ClkTck  : 1ad2f5d5
Cpu2Bogo: 897.84
Cpu2ClkTck  : 1ad2f5d5
MMU Type: Spitfire
State:
CPU0:   online
CPU2:   online


I think what you need to look at is the 'Cpu?ClkTck' values, if
900 --> 35a4e900
450 --> 1ad2f5d5
400 --> 17d746a8
is useful.

If you need more, you can try joining #gentoo-sparc on IRC freenode and
explaining exactly what you need; there are people there who probably
can help.  At this point, though, I am giving you more than I know,
which can always be misleading.



> >>
> > Here's /proc/cpuinfo from the SB1000:
> > =
> > fmccor@polylepis AGT [93]% cat /proc/cpuinfo
> > cpu : TI UltraSparc III (Cheetah)
> > fpu : UltraSparc III integrated FPU
> > promlib : Version 3 Revision 13
> > prom: 4.13.0
> > type: sun4u
> > ncpus probed: 2
> > ncpus active: 2
> > Cpu0Bogo: 598.01
> > Cpu0ClkTck  : 35a4e900
> > Cpu1Bogo: 598.01
> > Cpu1ClkTck  : 35a4e900
> > MMU Type: Cheetah
> > State:
> > CPU0:   online
> > CPU1:   online
> > 
> >

> > 
> > Other failing system:
> > fmccor@lacewing openmpi-1.0a1r7305 [96]% cat /proc/cpuinfo
> > cpu : TI UltraSparc II  (BlackBird)
> > fpu : UltraSparc II integrated FPU
> > promlib : Version 3 Revision 19
> > prom: 3.19.0
> > type: sun4u
> > ncpus probed: 2
> > ncpus active: 2
> > Cpu0Bogo: 799.53
> > Cpu0ClkTck  : 17d746a8
> > Cpu1Bogo: 799.53
> > Cpu1ClkTck  : 17d746a8
> > MMU Type: Spitfire
> > State:
> > CPU0:   online
> > CPU1:   online
> > ===

Regards,
-- 
Ferris McCormick (P44646, MI) 
Developer, Gentoo Linux (Sparc, Devrel)


signature.asc
Description: This is a digitally signed message part


[O-MPI devel] svn merge: lessons learned

2005-09-12 Thread Jeff Squyres

Lesson learned the hard way...

If you're going to make a branch into /tmp, it is STRONGLY ADVISED to 
cp an ***UNMODIFIED /trunk*** (i.e., do not have any local edits on the 
/trunk that you're copying).  Then make/apply all your changes in a new 
checkout of your /tmp tree and go from there.


This will make it SIGNIFICANTLY easier to merge your /tmp branch back 
into the trunk when you're done.


Just FYI.

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



[O-MPI devel] 64bit shared library problems

2005-09-12 Thread Nathan DeBardeleben
I've been having this problem for a week or so and I've been asking 
other people to weigh in if they know what I'm doing wrong.  I've gotten 
no where on this so I figure I'll finally drop it out on the list.  
First, here's the important info:

The machine:


[sparkplug]~ > cat /etc/issue

Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l).


[sparkplug]~ > uname -a
Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64 
x86_64 x86_64 GNU/Linux


My versions of libtool, autoconf, automake:


[sparkplug]~ > libtool --version
ltmain.sh (GNU libtool) 1.5.20 (1.1220.2.287 2005/08/31 18:54:15)

Copyright (C) 2005  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
PURPOSE.

[sparkplug]~ > autoconf --version
autoconf (GNU Autoconf) 2.59
Written by David J. MacKenzie and Akim Demaille.

Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
PURPOSE.

[sparkplug]~ > automake --version
automake (GNU automake) 1.8.5
Written by Tom Tromey .

Copyright 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
PURPOSE.
[sparkplug]~ > 


My ompi version: 7322 - but this has been going on for a few days like I 
said and I've been updating a lot, with no progress.


Configured using:

$ ./configure --enable-static --disable-shared --without-threads 
--prefix=/home/ndebard/local/ompi --with-devel-headers 
--enable-mca-no-build=ptl-gm


Simple C file which I will compile into a shared library:


int test_compile(int x) {
int rc;

rc = orte_init(true);
printf("rc = %d\n", rc);

return x + 1;
}


Above file is named 'testlib.c'

OK, so let's build this:


[sparkplug]~/ompi-test > mpicc -c testlib.c
[sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld:
testlib.o: relocation R_X86_64_32 can not be used when making a shared
object; recompile with -fPIC
testlib.o: could not read symbols: Bad value
collect2: ld returned 1 exit status


OK so relocation problems.  Maybe I'll follow the directions and -fPIC 
my file myself:



[sparkplug]~/ompi-test > mpicc -c testlib.c -fPIC
[sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld:
/home/ndebard/local/ompi/lib/liborte.a(orte_init.o): relocation
R_X86_64_32 can not be used when making a shared object; recompile 
with -fPIC

/home/ndebard/local/ompi/lib/liborte.a: could not read symbols: Bad value
collect2: ld returned 1 exit status


OK so I read this as there's a relocation problem in 'liborte.a'.  I 
un-arred liborte.a and checked some of the files with 'file' and it says 
64bit.  I havn't yet written a script to check every file in here, but 
here's orte_init.o:



[sparkplug]~/<1>tmp > file orte_init.o
orte_init.o: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (SYSV), 
not stripped


So that at least says it's 64bit.
And to confirm, my mpicc's 64bit too:


[sparkplug]~/<1>tmp > which mpicc
/home/ndebard/local/ompi/bin/mpicc
[sparkplug]~/<1>tmp > file /home/ndebard/local/ompi/bin/mpicc
/home/ndebard/local/ompi/bin/mpicc: ELF 64-bit LSB executable, AMD 
x86-64, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked 
(uses shared libs), not stripped


Someone suggested I take out the 'disabled-shared' from the configure 
line, so I did.  The result was the same.


So the result is that I can not build a shared library on a 64bit linux 
machine that uses orte calls.
So then I tried taking out the orte calls and instead use MPI calls.  
Sure, this function makes no sense but here it is now:



#include "orte_config.h"
#include 

int test_compile(int x) {
MPI_Comm_rank(MPI_COMM_WORLD, &x);

return x + 1;
}


And now, when I try and make a shared object I get relocation errors:

/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld: 
/home/ndebard/local/ompi/lib/libmpi.a(comm_init.o): relocation 
R_X86_64_32 can not be used when making a shared object; recompile 
with -fPIC

/home/ndebard/local/ompi/lib/libmpi.a: could not read symbols: Bad value


So... could perhaps the build be messed up and not be really using 64bit 
code?
Am I the only one seeing this?  It's a trivial test for those of you 
with access to a 64bit machine if you wouldn't mind testing for me.


Help would be greatly appreciated.

--
-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov

Re: [O-MPI devel] 64bit shared library problems

2005-09-12 Thread Jeff Squyres
Maybe I'm dense -- I thought you couldn't use --shared when linking to  
a static library...?


If you want to build OMPI as a shared library, then ditch the  
--enable-static --disable-shared from your configure line (building  
OMPI as shared is the default, which is how I build 95% of the time).




On Sep 12, 2005, at 5:47 PM, Nathan DeBardeleben wrote:


I've been having this problem for a week or so and I've been asking
other people to weigh in if they know what I'm doing wrong.  I've  
gotten

no where on this so I figure I'll finally drop it out on the list.
First, here's the important info:
The machine:


[sparkplug]~ > cat /etc/issue

Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l).


[sparkplug]~ > uname -a
Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64
x86_64 x86_64 GNU/Linux


My versions of libtool, autoconf, automake:


[sparkplug]~ > libtool --version
ltmain.sh (GNU libtool) 1.5.20 (1.1220.2.287 2005/08/31 18:54:15)

Copyright (C) 2005  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There  
is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
[sparkplug]~ > autoconf --version
autoconf (GNU Autoconf) 2.59
Written by David J. MacKenzie and Akim Demaille.

Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There  
is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
[sparkplug]~ > automake --version
automake (GNU automake) 1.8.5
Written by Tom Tromey .

Copyright 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There  
is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
[sparkplug]~ >


My ompi version: 7322 - but this has been going on for a few days like  
I

said and I've been updating a lot, with no progress.

Configured using:


$ ./configure --enable-static --disable-shared --without-threads
--prefix=/home/ndebard/local/ompi --with-devel-headers
--enable-mca-no-build=ptl-gm


Simple C file which I will compile into a shared library:


int test_compile(int x) {
int rc;

rc = orte_init(true);
printf("rc = %d\n", rc);

return x + 1;
}


Above file is named 'testlib.c'

OK, so let's build this:


[sparkplug]~/ompi-test > mpicc -c testlib.c
[sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- 
linux/bin/ld:

testlib.o: relocation R_X86_64_32 can not be used when making a shared
object; recompile with -fPIC
testlib.o: could not read symbols: Bad value
collect2: ld returned 1 exit status


OK so relocation problems.  Maybe I'll follow the directions and -fPIC
my file myself:


[sparkplug]~/ompi-test > mpicc -c testlib.c -fPIC
[sparkplug]~/ompi-test > mpicc -shared -o libtestlib.so testlib.o
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- 
linux/bin/ld:

/home/ndebard/local/ompi/lib/liborte.a(orte_init.o): relocation
R_X86_64_32 can not be used when making a shared object; recompile
with -fPIC
/home/ndebard/local/ompi/lib/liborte.a: could not read symbols: Bad  
value

collect2: ld returned 1 exit status


OK so I read this as there's a relocation problem in 'liborte.a'.  I
un-arred liborte.a and checked some of the files with 'file' and it  
says

64bit.  I havn't yet written a script to check every file in here, but
here's orte_init.o:


[sparkplug]~/<1>tmp > file orte_init.o
orte_init.o: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (SYSV),
not stripped


So that at least says it's 64bit.
And to confirm, my mpicc's 64bit too:


[sparkplug]~/<1>tmp > which mpicc
/home/ndebard/local/ompi/bin/mpicc
[sparkplug]~/<1>tmp > file /home/ndebard/local/ompi/bin/mpicc
/home/ndebard/local/ompi/bin/mpicc: ELF 64-bit LSB executable, AMD
x86-64, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked
(uses shared libs), not stripped


Someone suggested I take out the 'disabled-shared' from the configure
line, so I did.  The result was the same.

So the result is that I can not build a shared library on a 64bit linux
machine that uses orte calls.
So then I tried taking out the orte calls and instead use MPI calls.
Sure, this function makes no sense but here it is now:


#include "orte_config.h"
#include 

int test_compile(int x) {
MPI_Comm_rank(MPI_COMM_WORLD, &x);

return x + 1;
}


And now, when I try and make a shared object I get relocation errors:

/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- 
linux/bin/ld:

/home/ndebard/local/ompi/lib/libmpi.a(comm_init.o): relocation
R_X86_64_32 can not be used when making a shared object; recompile
with -fPIC
/home/ndebard/local/ompi/lib/libmpi.a: could not read symbols: Bad  
value


So... could perhaps the build be messed up and not be really using  
64bit

code?
Am I the only one seeing this?  It's a trivial test for those of you
with access to