I have reported the same error a few days ago and submitted it now as a github issue: https://github.com/open-mpi/ompi/issues/371
On Mon, Feb 02, 2015 at 12:36:54PM +1100, Christopher Samuel wrote: > On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote: > > > New tarball posted (same location). Now featuring 100% fewer "make check" > > failures. > > On our BG/Q front-end node (PPC64, RHEL 6.4) I see: > > ../../config/test-driver: line 95: 30173 Segmentation fault (core > dumped) "$@" > $log_file 2>&1 > FAIL: opal_lifo > > Stack trace implies the culprit is in: > > #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > 51 old = *addr; > > I've attached a script of gdb doing "thread apply all bt full" in > case that's helpful. > > All the best, > Chris > -- > Christopher Samuel Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > Script started on Mon 02 Feb 2015 12:32:56 EST > > [samuel@avoca class]$ gdb > /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo > core.32444 > [?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1) > Copyright (C) 2010 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "ppc64-redhat-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from > /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done. > [New Thread 32465] > [New Thread 32464] > [New Thread 32466] > [New Thread 32444] > [New Thread 32469] > [New Thread 32467] > [New Thread 32470] > [New Thread 32463] > [New Thread 32468] > Missing separate debuginfo for > /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0 > Try: yum --disablerepo='*' --enablerepo='*-debug*' install > /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe > Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0 > Try: yum --disablerepo='*' --enablerepo='*-debug*' install > /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949 > Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libslurm.so.27 > Try: yum --disablerepo='*' --enablerepo='*-debug*' install > /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96 > Missing separate debuginfo for > Try: yum --disablerepo='*' --enablerepo='*-debug*' install > /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f > Reading symbols from > /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done. > Loaded symbols for > /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0 > Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done. > Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0 > Reading symbols from /usr/local/slurm/14.03.10/lib/libslurm.so.27...done. > Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27 > Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. > Loaded symbols for /lib64/libdl.so.2 > Reading symbols from /lib64/libpthread.so.0...(no debugging symbols > found)...done. > [Thread debugging using libthread_db enabled] > Loaded symbols for /lib64/libpthread.so.0 > Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. > Loaded symbols for /lib64/librt.so.1 > Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done. > Loaded symbols for /lib64/libm.so.6 > Reading symbols from /lib64/libutil.so.1...(no debugging symbols > found)...done. > Loaded symbols for /lib64/libutil.so.1 > Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. > Loaded symbols for /lib64/libc.so.6 > Reading symbols from /lib64/ld64.so.1...(no debugging symbols found)...done. > Loaded symbols for /lib64/ld64.so.1 > Core was generated by > `/vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo '. > Program terminated with signal 11, Segmentation fault. > #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > 51 old = *addr; > Missing separate debuginfos, use: debuginfo-install > glibc-2.12-1.107.el6_4.5.ppc64 > (gdb) thread apply all bt full > > Thread 9 (Thread 0xfff7a0ef200 (LWP 32468)): > #0 0x00000080adb6629c in .__libc_write () from /lib64/libpthread.so.0 > No symbol table info available. > #1 0x00000fff7d6905b4 in show_stackframe (signo=11, info=0xfff7a0ee3d8, > p=0xfff7a0edd00) > at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/util/stacktrace.c:81 > print_buffer = "[avoca:32444] *** Process received signal ***\n", > '\000' <repeats 977 times> > tmp = 0xfff7a0ed858 "[avoca:32444] *** Process received signal ***\n" > size = 1024 > ret = 46 > si_code_str = 0xfff7d75bab8 "" > #2 <signal handler called> > No symbol table info available. > #3 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > old = 1 > #4 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > item = 0x0 > #5 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > i = 4002 > lifo = 0xffff9e4a6a0 > item = 0x1000511c840 > start = {tv_sec = 1422840607, tv_usec = 750972} > stop = {tv_sec = 0, tv_usec = 0} > total = {tv_sec = 0, tv_usec = 0} > timing = 0 > #6 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > No symbol table info available. > #7 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > No symbol table info available. > > Thread 8 (Thread 0xfff7d2ef200 (LWP 32463)): > #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > old = 1 > #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > item = 0x0 > #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > i = 2049 > lifo = 0xffff9e4a6a0 > item = 0x1000511c7e0 > start = {tv_sec = 1422840607, tv_usec = 750871} > stop = {tv_sec = 17589991303296, tv_usec = 24} > total = {tv_sec = 17589991305936, tv_usec = 17589991336208} > timing = 2.8183218451323255e-315 > #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > No symbol table info available. > #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > No symbol table info available. > > Thread 7 (Thread 0xfff78cef200 (LWP 32470)): > #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > old = 1 > #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > ---Type <return> to continue, or q <return> to quit--- > item = 0x0 > #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > i = 1883 > lifo = 0xffff9e4a6a0 > item = 0x1000511c7e0 > start = {tv_sec = 1422840607, tv_usec = 751036} > stop = {tv_sec = 0, tv_usec = 0} > total = {tv_sec = 0, tv_usec = 0} > timing = 0 > #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > No symbol table info available. > #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > No symbol table info available. > > Thread 6 (Thread 0xfff7aaef200 (LWP 32467)): > #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > old = 1 > #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > item = 0x0 > #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > i = 3250 > lifo = 0xffff9e4a6a0 > item = 0x1000511c7e0 > start = {tv_sec = 1422840607, tv_usec = 750953} > stop = {tv_sec = 0, tv_usec = 0} > total = {tv_sec = 0, tv_usec = 0} > timing = 0 > #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > No symbol table info available. > #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > No symbol table info available. > > Thread 5 (Thread 0xfff796ef200 (LWP 32469)): > #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > old = 1 > #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > item = 0x0 > #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > i = 1922 > lifo = 0xffff9e4a6a0 > item = 0x1000511c7e0 > start = {tv_sec = 1422840607, tv_usec = 751004} > stop = {tv_sec = 0, tv_usec = 0} > total = {tv_sec = 0, tv_usec = 0} > timing = 0 > #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > No symbol table info available. > #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > No symbol table info available. > > Thread 4 (Thread 0x80ad907ef0 (LWP 32444)): > #0 0x00000080adb5c754 in .pthread_join () from /lib64/libpthread.so.0 > No symbol table info available. > ---Type <return> to continue, or q <return> to quit--- > #1 0x0000000010001ccc in main (argc=1, argv=0xffff9e4ab68) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:163 > ret = 0x1 > i = 0 > threads = {17589991305728, 17589980819968, 17589970334208, > 17589959848448, 17589949362688, 17589938876928, 17589928391168, > 17589917905408} > item = 0x1000511c8d0 > prev = 0xffff9e4a6c0 > item2 = 0x1000511b640 > start = {tv_sec = 1422840607, tv_usec = 750782} > stop = {tv_sec = 1422840607, tv_usec = 515534} > total = {tv_sec = 0, tv_usec = 42314} > lifo = {super = {obj_class = 0xfff7d7733e8, obj_reference_count = 1}, > opal_lifo_head = {data = {counter = 0, item = 0x1000511c7e0}}, > opal_lifo_ghost = {super = {obj_class = 0xfff7d773228, > obj_reference_count = 1}, opal_list_next = 0xffff9e4a6c0, opal_list_prev = > 0x0, > item_free = 1}} > success = false > timing = 4.2313999999999998e-08 > rc = 0 > > Thread 3 (Thread 0xfff7b4ef200 (LWP 32466)): > #0 opal_atomic_swap_32 (addr=0x1000511c860, newval=1) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:52 > old = 0 > #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > item = 0x1000511c840 > #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > i = 1876 > lifo = 0xffff9e4a6a0 > item = 0x1000511c840 > start = {tv_sec = 1422840607, tv_usec = 750939} > stop = {tv_sec = 0, tv_usec = 0} > total = {tv_sec = 0, tv_usec = 0} > timing = 0 > #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > No symbol table info available. > #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > No symbol table info available. > > Thread 2 (Thread 0xfff7c8ef200 (LWP 32464)): > #0 0x0000000010000f88 in opal_atomic_cmpset_64 (addr=0xffff9e4a6b8, > oldval=1099596679232, newval=1099596679136) > at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/powerpc/atomic.h:194 > ret = 1099596679232 > #1 0x00000000100010e4 in opal_atomic_cmpset_ptr (addr=0xffff9e4a6b8, > oldval=0x1000511c840, newval=0x1000511c7e0) > at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:227 > No locals. > #2 0x0000000010001438 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:198 > item = 0x1000511c840 > #3 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > i = 3968 > lifo = 0xffff9e4a6a0 > item = 0x1000511c840 > start = {tv_sec = 1422840607, tv_usec = 750893} > stop = {tv_sec = 0, tv_usec = 0} > total = {tv_sec = 0, tv_usec = 0} > timing = 0 > #4 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > No symbol table info available. > #5 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > ---Type <return> to continue, or q <return> to quit--- > No symbol table info available. > > Thread 1 (Thread 0xfff7beef200 (LWP 32465)): > #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > old = 1 > #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > item = 0x0 > #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > i = 3734 > lifo = 0xffff9e4a6a0 > item = 0x1000511c7e0 > start = {tv_sec = 1422840607, tv_usec = 750907} > stop = {tv_sec = 0, tv_usec = 0} > total = {tv_sec = 0, tv_usec = 0} > timing = 0 > #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > No symbol table info available. > #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > No symbol table info available. > (gdb) quit > ]0;samuel@avoca:~tmp/OMPI/build-gcc/test/class[samuel@avoca class]$ exit > > Script done on Mon 02 Feb 2015 12:33:16 EST > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Searchable archives: > http://www.open-mpi.org/community/lists/devel/2015/02/index.php