Re: [OMPI users] OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris
Hi Gilles, thank you very much for the quick tutorial. Unfortunately I still can't get a backtrace. > You might need to configure with --enable-debug and add -g -O0 > to your CFLAGS and LDFLAGS > > Then once you attach with gdb, you have to find the thread that is polling : > thread 1 > bt > thread 2 > bt > and so on until you find the good thread > If _dbg is a local variable, you need to select the right frame > before you can change the value : > get the frame number from bt (generally 1 under linux) > f > set _dbg=0 > > I hope this helps "--enable-debug" is one of my default options. Now I used the following command to configure Open MPI. I always start the build process in an empty directory and I always remove /usr/local/openmpi-1.9.0_64_gcc, before I install a new version. tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 112 head config.log \ | grep openmpi $ ../openmpi-dev-124-g91e9686/configure --prefix=/usr/local/openmpi-1.9.0_64_gcc --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64 --with-jdk-bindir=/usr/local/jdk1.8.0/bin --with-jdk-headers=/usr/local/jdk1.8.0/include JAVA_HOME=/usr/local/jdk1.8.0 LDFLAGS=-m64 -g -O0 CC=gcc CXX=g++ FC=gfortran CFLAGS=-m64 -D_REENTRANT -g -O0 CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp CPPFLAGS=-D_REENTRANT CXXCPPFLAGS= --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java --enable-heterogeneous --enable-mpi-thread-multiple --with-threads=posix --with-hwloc=internal --without-verbs --with-wrapper-cflags=-std=c11 -m64 --enable-debug tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 113 "gbd" doesn't allow any backtrace for any thread. tyr java 124 /usr/local/gdb-7.6.1_64_gcc/bin/gdb GNU gdb (GDB) 7.6.1 ... (gdb) attach 18876 Attaching to process 18876 [New process 18876] Retry #1: Retry #2: Retry #3: Retry #4: 0x7eadcb04 in ?? () (gdb) info threads [New LWP 12] [New LWP 11] [New LWP 10] [New LWP 9] [New LWP 8] [New LWP 7] [New LWP 6] [New LWP 5] [New LWP 4] [New LWP 3] [New LWP 2] Id Target Id Frame 12 LWP 2 0x7eadc6b0 in ?? () 11 LWP 3 0x7eadcbb8 in ?? () 10 LWP 4 0x7eadcbb8 in ?? () 9LWP 5 0x7eadcbb8 in ?? () 8LWP 6 0x7eadcbb8 in ?? () 7LWP 7 0x7eadcbb8 in ?? () 6LWP 8 0x7ead8b0c in ?? () 5LWP 9 0x7eadcbb8 in ?? () 4LWP 100x7eadcbb8 in ?? () 3LWP 110x7eadcbb8 in ?? () 2LWP 120x7eadcbb8 in ?? () * 1LWP 1 0x7eadcb04 in ?? () (gdb) thread 1 [Switching to thread 1 (LWP 1)] #0 0x7eadcb04 in ?? () (gdb) bt #0 0x7eadcb04 in ?? () #1 0x7eaca12c in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 2 [Switching to thread 2 (LWP 12)] #0 0x7eadcbb8 in ?? () (gdb) bt #0 0x7eadcbb8 in ?? () #1 0x7eac2638 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 3 [Switching to thread 3 (LWP 11)] #0 0x7eadcbb8 in ?? () (gdb) bt #0 0x7eadcbb8 in ?? () #1 0x7eac25a8 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 4 [Switching to thread 4 (LWP 10)] #0 0x7eadcbb8 in ?? () (gdb) bt #0 0x7eadcbb8 in ?? () #1 0x7eac2638 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 5 [Switching to thread 5 (LWP 9)] #0 0x7eadcbb8 in ?? () (gdb) bt #0 0x7eadcbb8 in ?? () #1 0x7eac2638 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 6 [Switching to thread 6 (LWP 8)] #0 0x7ead8b0c in ?? () (gdb) bt #0 0x7ead8b0c in ?? () #1 0x7eacbcb0 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 7 [Switching to thread 7 (LWP 7)] #0 0x7eadcbb8 in ?? () (gdb) bt #0 0x7eadcbb8 in ?? () #1 0x7eac25a8 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 8 [Switching to thread 8 (LWP 6)] #0 0x7eadcbb8 in ?? () (gdb) bt #0 0x7eadcbb8 in ?? () #1 0x7eac25a8 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 9 [Switching to thread 9 (LWP 5)] #0 0x7eadcbb8 in ?? () (gdb) bt #0 0x7eadcbb8 in ?? () #1 0x7eac2638 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 10 [Switching to thread 10 (LWP 4)] #0 0x7eadcbb8 in ?? () (gdb) bt #0 0x7eadcbb8 in ?? () #1 0x7eac25a8 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 11 [Switching to thread 11 (LWP 3)] #0 0x7eadcbb8 in ?? () (gdb) bt #0 0x7eadcbb8 in ?? () #1 0x7eac25a8 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 12 [Switching to thread 12 (LWP 2)] #0 0x7eadc6b0 in ?? () (gdb) I also tried to set _dbg in all available frames without success. (gdb) f 1 #1 0x7eacb46c in ?? ()
Re: [OMPI users] OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris
Hi Siegmar, You might need to configure with --enable-debug and add -g -O0 to your CFLAGS and LDFLAGS Then once you attach with gdb, you have to find the thread that is polling : thread 1 bt thread 2 bt and so on until you find the good thread If _dbg is a local variable, you need to select the right frame before you can change the value : get the frame number from bt (generally 1 under linux) f set _dbg=0 I hope this helps Gilles Siegmar Grosswrote: >Hi Gilles, > >I changed _dbg to a static variable, so that it is visible in the >library, but unfortunately still not in the symbol table. > > >tyr java 419 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so | grep -i >_dbg >[271] | 1249644| 4|OBJT |LOCL |0|18 |_dbg.14258 >tyr java 420 /usr/local/gdb-7.6.1_64_gcc/bin/gdb >GNU gdb (GDB) 7.6.1 >... >(gdb) attach 13019 >Attaching to process 13019 >[New process 13019] >Retry #1: >Retry #2: >Retry #3: >Retry #4: >0x7eadcb04 in ?? () >(gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so >Reading symbols from >/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done. >(gdb) set var _dbg.14258=0 >No symbol "_dbg" in current context. >(gdb) > > >Kind regards > >Siegmar > > > > >> unfortunately I didn't get anything useful. It's probably my fault, >> because I'm still not very familiar with gdb or any other debugger. >> I did the following things. >> >> >> 1st window: >> --- >> >> tyr java 174 setenv OMPI_ATTACH 1 >> tyr java 175 mpijavac InitFinalizeMain.java >> warning: [path] bad path element >> "/usr/local/openmpi-1.9.0_64_gcc/lib64/shmem.jar": >> no such file or directory >> 1 warning >> tyr java 176 mpiexec -np 1 java InitFinalizeMain >> >> >> >> 2nd window: >> --- >> >> tyr java 379 ps -aef | grep java >> noaccess 1345 1 0 May 22 ? 113:23 /usr/java/bin/java >> -server -Xmx128m -XX:+UseParallelGC >-XX:ParallelGCThreads=4 >> fd1026 3661 10753 0 14:09:12 pts/14 0:00 mpiexec -np 1 java >> InitFinalizeMain >> fd1026 3677 13371 0 14:16:55 pts/2 0:00 grep java >> fd1026 3663 3661 0 14:09:12 pts/14 0:01 java -cp >/home/fd1026/work/skripte/master/parallel/prog/mpi/java:/usr/local/jun >> tyr java 380 /usr/local/gdb-7.6.1_64_gcc/bin/gdb >> GNU gdb (GDB) 7.6.1 >> ... >> (gdb) attach 3663 >> Attaching to process 3663 >> [New process 3663] >> Retry #1: >> Retry #2: >> Retry #3: >> Retry #4: >> 0x7eadcb04 in ?? () >> (gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so >> Reading symbols from >> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done. >> (gdb) set var _dbg=0 >> No symbol "_dbg" in current context. >> (gdb) set var JNI_OnLoad::_dbg=0 >> No symbol "_dbg" in specified context. >> (gdb) set JNI_OnLoad::_dbg=0 >> No symbol "_dbg" in specified context. >> (gdb) info threads >> [New LWP 12] >> [New LWP 11] >> [New LWP 10] >> [New LWP 9] >> [New LWP 8] >> [New LWP 7] >> [New LWP 6] >> [New LWP 5] >> [New LWP 4] >> [New LWP 3] >> [New LWP 2] >> Id Target Id Frame >> 12 LWP 2 0x7eadc6b0 in ?? () >> 11 LWP 3 0x7eadcbb8 in ?? () >> 10 LWP 4 0x7eadcbb8 in ?? () >> 9LWP 5 0x7eadcbb8 in ?? () >> 8LWP 6 0x7eadcbb8 in ?? () >> 7LWP 7 0x7eadcbb8 in ?? () >> 6LWP 8 0x7ead8b0c in ?? () >> 5LWP 9 0x7eadcbb8 in ?? () >> 4LWP 100x7eadcbb8 in ?? () >> 3LWP 110x7eadcbb8 in ?? () >> 2LWP 120x7eadcbb8 in ?? () >> * 1LWP 1 0x7eadcb04 in ?? () >> (gdb) >> >> >> >> It seems that "_dbg" is unknown and unavailable. >> >> tyr java 399 grep _dbg >> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/* >> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c: >>volatile int _dbg = 1; >> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c: >>while (_dbg) poll(NULL, 0, 1); >> tyr java 400 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i _dbg >> tyr java 401 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i >> JNI_OnLoad >> [1057] | 139688| 444|FUNC |GLOB |0|11 >> |JNI_OnLoad >> tyr java 402 >> >> >> >> How can I set _dbg to zero to continue mpiexec? I also tried to >> set a breakpoint for function JNI_OnLoad, but it seems, that the >> function isn't called before SIGSEGV. >> >> >> tyr java 177 unsetenv OMPI_ATTACH >> tyr java 178 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec >> GNU gdb (GDB) 7.6.1 >> ... >> (gdb) b mpi_MPI.c:JNI_OnLoad >> No source file named mpi_MPI.c. >> Make breakpoint pending on future shared library load? (y or [n]) y >> >> Breakpoint 1 (mpi_MPI.c:JNI_OnLoad) pending. >> (gdb) run -np 1 java InitFinalizeMain >> Starting program: