On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote:

> New tarball posted (same location).  Now featuring 100% fewer "make check" 
> failures.

On our BG/Q front-end node (PPC64, RHEL 6.4) I see:

../../config/test-driver: line 95: 30173 Segmentation fault      (core dumped) 
"$@" > $log_file 2>&1
FAIL: opal_lifo

Stack trace implies the culprit is in:

#0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
    at 
/vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
51              old = *addr;

I've attached a script of gdb doing "thread apply all bt full" in
case that's helpful.

All the best,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Script started on Mon 02 Feb 2015 12:32:56 EST

[samuel@avoca class]$ gdb /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo core.32444
[?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done.
[New Thread 32465]
[New Thread 32464]
[New Thread 32466]
[New Thread 32444]
[New Thread 32469]
[New Thread 32467]
[New Thread 32470]
[New Thread 32463]
[New Thread 32468]
Missing separate debuginfo for /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe
Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949
Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libslurm.so.27
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96
Missing separate debuginfo for 
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f
Reading symbols from /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done.
Loaded symbols for /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done.
Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0
Reading symbols from /usr/local/slurm/14.03.10/lib/libslurm.so.27...done.
Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld64.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld64.so.1
Core was generated by `/vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo '.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
    at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
51	        old = *addr;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.5.ppc64
(gdb) thread apply all bt full

Thread 9 (Thread 0xfff7a0ef200 (LWP 32468)):
#0  0x00000080adb6629c in .__libc_write () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00000fff7d6905b4 in show_stackframe (signo=11, info=0xfff7a0ee3d8, p=0xfff7a0edd00)
    at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/util/stacktrace.c:81
        print_buffer = "[avoca:32444] *** Process received signal ***\n", '\000' <repeats 977 times>
        tmp = 0xfff7a0ed858 "[avoca:32444] *** Process received signal ***\n"
        size = 1024
        ret = 46
        si_code_str = 0xfff7d75bab8 ""
#2  <signal handler called>
No symbol table info available.
#3  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
    at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
        old = 1
#4  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
        item = 0x0
#5  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
        i = 4002
        lifo = 0xffff9e4a6a0
        item = 0x1000511c840
        start = {tv_sec = 1422840607, tv_usec = 750972}
        stop = {tv_sec = 0, tv_usec = 0}
        total = {tv_sec = 0, tv_usec = 0}
        timing = 0
#6  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#7  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
No symbol table info available.

Thread 8 (Thread 0xfff7d2ef200 (LWP 32463)):
#0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
    at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
        old = 1
#1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
        item = 0x0
#2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
        i = 2049
        lifo = 0xffff9e4a6a0
        item = 0x1000511c7e0
        start = {tv_sec = 1422840607, tv_usec = 750871}
        stop = {tv_sec = 17589991303296, tv_usec = 24}
        total = {tv_sec = 17589991305936, tv_usec = 17589991336208}
        timing = 2.8183218451323255e-315
#3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
No symbol table info available.

Thread 7 (Thread 0xfff78cef200 (LWP 32470)):
#0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
    at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
        old = 1
#1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
---Type <return> to continue, or q <return> to quit---
        item = 0x0
#2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
        i = 1883
        lifo = 0xffff9e4a6a0
        item = 0x1000511c7e0
        start = {tv_sec = 1422840607, tv_usec = 751036}
        stop = {tv_sec = 0, tv_usec = 0}
        total = {tv_sec = 0, tv_usec = 0}
        timing = 0
#3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
No symbol table info available.

Thread 6 (Thread 0xfff7aaef200 (LWP 32467)):
#0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
    at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
        old = 1
#1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
        item = 0x0
#2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
        i = 3250
        lifo = 0xffff9e4a6a0
        item = 0x1000511c7e0
        start = {tv_sec = 1422840607, tv_usec = 750953}
        stop = {tv_sec = 0, tv_usec = 0}
        total = {tv_sec = 0, tv_usec = 0}
        timing = 0
#3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
No symbol table info available.

Thread 5 (Thread 0xfff796ef200 (LWP 32469)):
#0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
    at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
        old = 1
#1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
        item = 0x0
#2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
        i = 1922
        lifo = 0xffff9e4a6a0
        item = 0x1000511c7e0
        start = {tv_sec = 1422840607, tv_usec = 751004}
        stop = {tv_sec = 0, tv_usec = 0}
        total = {tv_sec = 0, tv_usec = 0}
        timing = 0
#3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
No symbol table info available.

Thread 4 (Thread 0x80ad907ef0 (LWP 32444)):
#0  0x00000080adb5c754 in .pthread_join () from /lib64/libpthread.so.0
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#1  0x0000000010001ccc in main (argc=1, argv=0xffff9e4ab68) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:163
        ret = 0x1
        i = 0
        threads = {17589991305728, 17589980819968, 17589970334208, 17589959848448, 17589949362688, 17589938876928, 17589928391168, 17589917905408}
        item = 0x1000511c8d0
        prev = 0xffff9e4a6c0
        item2 = 0x1000511b640
        start = {tv_sec = 1422840607, tv_usec = 750782}
        stop = {tv_sec = 1422840607, tv_usec = 515534}
        total = {tv_sec = 0, tv_usec = 42314}
        lifo = {super = {obj_class = 0xfff7d7733e8, obj_reference_count = 1}, opal_lifo_head = {data = {counter = 0, item = 0x1000511c7e0}}, 
          opal_lifo_ghost = {super = {obj_class = 0xfff7d773228, obj_reference_count = 1}, opal_list_next = 0xffff9e4a6c0, opal_list_prev = 0x0, 
            item_free = 1}}
        success = false
        timing = 4.2313999999999998e-08
        rc = 0

Thread 3 (Thread 0xfff7b4ef200 (LWP 32466)):
#0  opal_atomic_swap_32 (addr=0x1000511c860, newval=1) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:52
        old = 0
#1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
        item = 0x1000511c840
#2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
        i = 1876
        lifo = 0xffff9e4a6a0
        item = 0x1000511c840
        start = {tv_sec = 1422840607, tv_usec = 750939}
        stop = {tv_sec = 0, tv_usec = 0}
        total = {tv_sec = 0, tv_usec = 0}
        timing = 0
#3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
No symbol table info available.

Thread 2 (Thread 0xfff7c8ef200 (LWP 32464)):
#0  0x0000000010000f88 in opal_atomic_cmpset_64 (addr=0xffff9e4a6b8, oldval=1099596679232, newval=1099596679136)
    at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/powerpc/atomic.h:194
        ret = 1099596679232
#1  0x00000000100010e4 in opal_atomic_cmpset_ptr (addr=0xffff9e4a6b8, oldval=0x1000511c840, newval=0x1000511c7e0)
    at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:227
No locals.
#2  0x0000000010001438 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:198
        item = 0x1000511c840
#3  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
        i = 3968
        lifo = 0xffff9e4a6a0
        item = 0x1000511c840
        start = {tv_sec = 1422840607, tv_usec = 750893}
        stop = {tv_sec = 0, tv_usec = 0}
        total = {tv_sec = 0, tv_usec = 0}
        timing = 0
#4  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
No symbol table info available.

Thread 1 (Thread 0xfff7beef200 (LWP 32465)):
#0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
    at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
        old = 1
#1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
        item = 0x0
#2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
        i = 3734
        lifo = 0xffff9e4a6a0
        item = 0x1000511c7e0
        start = {tv_sec = 1422840607, tv_usec = 750907}
        stop = {tv_sec = 0, tv_usec = 0}
        total = {tv_sec = 0, tv_usec = 0}
        timing = 0
#3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
No symbol table info available.
(gdb) quit
]0;samuel@avoca:~tmp/OMPI/build-gcc/test/class[samuel@avoca class]$ exit

Script done on Mon 02 Feb 2015 12:33:16 EST

Reply via email to