Re: [OMPI users] OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Siegmar Gross
Hi Gilles,

thank you very much for the quick tutorial. Unfortunately I still
can't get a backtrace.

> You might need to configure with --enable-debug and add -g -O0
> to your CFLAGS and LDFLAGS
> 
> Then once you attach with gdb, you have to find the thread that is polling :
> thread 1
> bt
> thread 2
> bt
> and so on until you find the good thread
> If _dbg is a local variable, you need to select the right frame
> before you can change the value :
> get the frame number from bt (generally 1 under linux)
> f 
> set _dbg=0
> 
> I hope this helps

"--enable-debug" is one of my default options. Now I used the
following command to configure Open MPI. I always start the
build process in an empty directory and I always remove
/usr/local/openmpi-1.9.0_64_gcc, before I install a new version.

tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 112 head config.log \
  | grep openmpi
$ ../openmpi-dev-124-g91e9686/configure
  --prefix=/usr/local/openmpi-1.9.0_64_gcc
  --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64
  --with-jdk-bindir=/usr/local/jdk1.8.0/bin
  --with-jdk-headers=/usr/local/jdk1.8.0/include
  JAVA_HOME=/usr/local/jdk1.8.0
  LDFLAGS=-m64 -g -O0 CC=gcc CXX=g++ FC=gfortran
  CFLAGS=-m64 -D_REENTRANT -g -O0
  CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp
  CPPFLAGS=-D_REENTRANT CXXCPPFLAGS=
  --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java
  --enable-heterogeneous --enable-mpi-thread-multiple
  --with-threads=posix --with-hwloc=internal --without-verbs
  --with-wrapper-cflags=-std=c11 -m64 --enable-debug
tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 113 


"gbd" doesn't allow any backtrace for any thread.

tyr java 124 /usr/local/gdb-7.6.1_64_gcc/bin/gdb
GNU gdb (GDB) 7.6.1
...
(gdb) attach 18876
Attaching to process 18876
[New process 18876]
Retry #1:
Retry #2:
Retry #3:
Retry #4:
0x7eadcb04 in ?? ()
(gdb) info threads
[New LWP 12]
[New LWP 11]
[New LWP 10]
[New LWP 9]
[New LWP 8]
[New LWP 7]
[New LWP 6]
[New LWP 5]
[New LWP 4]
[New LWP 3]
[New LWP 2]
  Id   Target Id Frame 
  12   LWP 2 0x7eadc6b0 in ?? ()
  11   LWP 3 0x7eadcbb8 in ?? ()
  10   LWP 4 0x7eadcbb8 in ?? ()
  9LWP 5 0x7eadcbb8 in ?? ()
  8LWP 6 0x7eadcbb8 in ?? ()
  7LWP 7 0x7eadcbb8 in ?? ()
  6LWP 8 0x7ead8b0c in ?? ()
  5LWP 9 0x7eadcbb8 in ?? ()
  4LWP 100x7eadcbb8 in ?? ()
  3LWP 110x7eadcbb8 in ?? ()
  2LWP 120x7eadcbb8 in ?? ()
* 1LWP 1 0x7eadcb04 in ?? ()
(gdb) thread 1
[Switching to thread 1 (LWP 1)]
#0  0x7eadcb04 in ?? ()
(gdb) bt
#0  0x7eadcb04 in ?? ()
#1  0x7eaca12c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 2
[Switching to thread 2 (LWP 12)]
#0  0x7eadcbb8 in ?? ()
(gdb) bt
#0  0x7eadcbb8 in ?? ()
#1  0x7eac2638 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 3
[Switching to thread 3 (LWP 11)]
#0  0x7eadcbb8 in ?? ()
(gdb) bt
#0  0x7eadcbb8 in ?? ()
#1  0x7eac25a8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 4
[Switching to thread 4 (LWP 10)]
#0  0x7eadcbb8 in ?? ()
(gdb) bt
#0  0x7eadcbb8 in ?? ()
#1  0x7eac2638 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 5
[Switching to thread 5 (LWP 9)]
#0  0x7eadcbb8 in ?? ()
(gdb) bt
#0  0x7eadcbb8 in ?? ()
#1  0x7eac2638 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 6
[Switching to thread 6 (LWP 8)]
#0  0x7ead8b0c in ?? ()
(gdb) bt
#0  0x7ead8b0c in ?? ()
#1  0x7eacbcb0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 7
[Switching to thread 7 (LWP 7)]
#0  0x7eadcbb8 in ?? ()
(gdb) bt
#0  0x7eadcbb8 in ?? ()
#1  0x7eac25a8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 8
[Switching to thread 8 (LWP 6)]
#0  0x7eadcbb8 in ?? ()
(gdb) bt
#0  0x7eadcbb8 in ?? ()
#1  0x7eac25a8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 9
[Switching to thread 9 (LWP 5)]
#0  0x7eadcbb8 in ?? ()
(gdb) bt
#0  0x7eadcbb8 in ?? ()
#1  0x7eac2638 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 10
[Switching to thread 10 (LWP 4)]
#0  0x7eadcbb8 in ?? ()
(gdb) bt
#0  0x7eadcbb8 in ?? ()
#1  0x7eac25a8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 11
[Switching to thread 11 (LWP 3)]
#0  0x7eadcbb8 in ?? ()
(gdb) bt
#0  0x7eadcbb8 in ?? ()
#1  0x7eac25a8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 12
[Switching to thread 12 (LWP 2)]
#0  0x7eadc6b0 in ?? ()
(gdb) 



I also tried to set _dbg in all available frames without success.

(gdb) f 1
#1  0x7eacb46c in ?? ()

Re: [OMPI users] OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris

2014-10-25 Thread Gilles Gouaillardet
Hi Siegmar,

You might need to configure with --enable-debug and add -g -O0 to your CFLAGS 
and LDFLAGS

Then once you attach with gdb, you have to find the thread that is polling :
thread 1
bt
thread 2
bt
and so on until you find the good thread
If _dbg is a local variable, you need to select the right frame before you can 
change the value :
get the frame number from bt (generally 1 under linux)
f 
set _dbg=0

I hope this helps

Gilles


Siegmar Gross  wrote:
>Hi Gilles,
>
>I changed _dbg to a static variable, so that it is visible in the
>library, but unfortunately still not in the symbol table.
>
>
>tyr java 419 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so | grep -i 
>_dbg
>[271]   |  1249644| 4|OBJT |LOCL |0|18 |_dbg.14258
>tyr java 420 /usr/local/gdb-7.6.1_64_gcc/bin/gdb
>GNU gdb (GDB) 7.6.1
>...
>(gdb) attach 13019
>Attaching to process 13019
>[New process 13019]
>Retry #1:
>Retry #2:
>Retry #3:
>Retry #4:
>0x7eadcb04 in ?? ()
>(gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so
>Reading symbols from 
>/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done.
>(gdb) set var _dbg.14258=0
>No symbol "_dbg" in current context.
>(gdb) 
>
>
>Kind regards
>
>Siegmar
>
>
>
>
>> unfortunately I didn't get anything useful. It's probably my fault,
>> because I'm still not very familiar with gdb or any other debugger.
>> I did the following things.
>> 
>> 
>> 1st window:
>> ---
>> 
>> tyr java 174 setenv OMPI_ATTACH 1
>> tyr java 175 mpijavac InitFinalizeMain.java 
>> warning: [path] bad path element
>>   "/usr/local/openmpi-1.9.0_64_gcc/lib64/shmem.jar":
>>   no such file or directory
>> 1 warning
>> tyr java 176 mpiexec -np 1 java InitFinalizeMain
>> 
>> 
>> 
>> 2nd window:
>> ---
>> 
>> tyr java 379 ps -aef | grep java
>> noaccess  1345 1   0   May 22 ? 113:23 /usr/java/bin/java 
>> -server -Xmx128m -XX:+UseParallelGC 
>-XX:ParallelGCThreads=4 
>>   fd1026  3661 10753   0 14:09:12 pts/14  0:00 mpiexec -np 1 java 
>> InitFinalizeMain
>>   fd1026  3677 13371   0 14:16:55 pts/2   0:00 grep java
>>   fd1026  3663  3661   0 14:09:12 pts/14  0:01 java -cp 
>/home/fd1026/work/skripte/master/parallel/prog/mpi/java:/usr/local/jun
>> tyr java 380 /usr/local/gdb-7.6.1_64_gcc/bin/gdb
>> GNU gdb (GDB) 7.6.1
>> ...
>> (gdb) attach 3663
>> Attaching to process 3663
>> [New process 3663]
>> Retry #1:
>> Retry #2:
>> Retry #3:
>> Retry #4:
>> 0x7eadcb04 in ?? ()
>> (gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so
>> Reading symbols from 
>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done.
>> (gdb) set var _dbg=0
>> No symbol "_dbg" in current context.
>> (gdb) set var JNI_OnLoad::_dbg=0
>> No symbol "_dbg" in specified context.
>> (gdb) set JNI_OnLoad::_dbg=0
>> No symbol "_dbg" in specified context.
>> (gdb) info threads
>> [New LWP 12]
>> [New LWP 11]
>> [New LWP 10]
>> [New LWP 9]
>> [New LWP 8]
>> [New LWP 7]
>> [New LWP 6]
>> [New LWP 5]
>> [New LWP 4]
>> [New LWP 3]
>> [New LWP 2]
>>   Id   Target Id Frame 
>>   12   LWP 2 0x7eadc6b0 in ?? ()
>>   11   LWP 3 0x7eadcbb8 in ?? ()
>>   10   LWP 4 0x7eadcbb8 in ?? ()
>>   9LWP 5 0x7eadcbb8 in ?? ()
>>   8LWP 6 0x7eadcbb8 in ?? ()
>>   7LWP 7 0x7eadcbb8 in ?? ()
>>   6LWP 8 0x7ead8b0c in ?? ()
>>   5LWP 9 0x7eadcbb8 in ?? ()
>>   4LWP 100x7eadcbb8 in ?? ()
>>   3LWP 110x7eadcbb8 in ?? ()
>>   2LWP 120x7eadcbb8 in ?? ()
>> * 1LWP 1 0x7eadcb04 in ?? ()
>> (gdb) 
>> 
>> 
>> 
>> It seems that "_dbg" is unknown and unavailable.
>> 
>> tyr java 399 grep _dbg 
>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/*
>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c: 
>>volatile int _dbg = 1;
>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c: 
>>while (_dbg) poll(NULL, 0, 1);
>> tyr java 400 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i _dbg
>> tyr java 401 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i 
>> JNI_OnLoad
>> [1057]  |  139688| 444|FUNC |GLOB |0|11 
>> |JNI_OnLoad
>> tyr java 402 
>> 
>> 
>> 
>> How can I set _dbg to zero to continue mpiexec? I also tried to
>> set a breakpoint for function JNI_OnLoad, but it seems, that the
>> function isn't called before SIGSEGV.
>> 
>> 
>> tyr java 177 unsetenv OMPI_ATTACH 
>> tyr java 178 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>> GNU gdb (GDB) 7.6.1
>> ...
>> (gdb) b mpi_MPI.c:JNI_OnLoad
>> No source file named mpi_MPI.c.
>> Make breakpoint pending on future shared library load? (y or [n]) y
>> 
>> Breakpoint 1 (mpi_MPI.c:JNI_OnLoad) pending.
>> (gdb) run -np 1 java InitFinalizeMain 
>> Starting program: