I have reported the same error a few days ago and submitted it now as a
github issue: https://github.com/open-mpi/ompi/issues/371

On Mon, Feb 02, 2015 at 12:36:54PM +1100, Christopher Samuel wrote:
> On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote:
> 
> > New tarball posted (same location).  Now featuring 100% fewer "make check" 
> > failures.
> 
> On our BG/Q front-end node (PPC64, RHEL 6.4) I see:
> 
> ../../config/test-driver: line 95: 30173 Segmentation fault      (core 
> dumped) "$@" > $log_file 2>&1
> FAIL: opal_lifo
> 
> Stack trace implies the culprit is in:
> 
> #0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>     at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
> 51              old = *addr;
> 
> I've attached a script of gdb doing "thread apply all bt full" in
> case that's helpful.
> 
> All the best,
> Chris
> -- 
>  Christopher Samuel        Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/      http://twitter.com/vlsci
> 

> Script started on Mon 02 Feb 2015 12:32:56 EST
> 
> [samuel@avoca class]$ gdb 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo 
> core.32444
> [?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "ppc64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done.
> [New Thread 32465]
> [New Thread 32464]
> [New Thread 32466]
> [New Thread 32444]
> [New Thread 32469]
> [New Thread 32467]
> [New Thread 32470]
> [New Thread 32463]
> [New Thread 32468]
> Missing separate debuginfo for 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe
> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949
> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libslurm.so.27
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96
> Missing separate debuginfo for 
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f
> Reading symbols from 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done.
> Loaded symbols for 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
> Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done.
> Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0
> Reading symbols from /usr/local/slurm/14.03.10/lib/libslurm.so.27...done.
> Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
> found)...done.
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libutil.so.1...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/libutil.so.1
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld64.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/ld64.so.1
> Core was generated by 
> `/vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo '.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>     at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
> 51            old = *addr;
> Missing separate debuginfos, use: debuginfo-install 
> glibc-2.12-1.107.el6_4.5.ppc64
> (gdb) thread apply all bt full
> 
> Thread 9 (Thread 0xfff7a0ef200 (LWP 32468)):
> #0  0x00000080adb6629c in .__libc_write () from /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x00000fff7d6905b4 in show_stackframe (signo=11, info=0xfff7a0ee3d8, 
> p=0xfff7a0edd00)
>     at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/util/stacktrace.c:81
>         print_buffer = "[avoca:32444] *** Process received signal ***\n", 
> '\000' <repeats 977 times>
>         tmp = 0xfff7a0ed858 "[avoca:32444] *** Process received signal ***\n"
>         size = 1024
>         ret = 46
>         si_code_str = 0xfff7d75bab8 ""
> #2  <signal handler called>
> No symbol table info available.
> #3  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>     at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
>         old = 1
> #4  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
>         item = 0x0
> #5  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
>         i = 4002
>         lifo = 0xffff9e4a6a0
>         item = 0x1000511c840
>         start = {tv_sec = 1422840607, tv_usec = 750972}
>         stop = {tv_sec = 0, tv_usec = 0}
>         total = {tv_sec = 0, tv_usec = 0}
>         timing = 0
> #6  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #7  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
> No symbol table info available.
> 
> Thread 8 (Thread 0xfff7d2ef200 (LWP 32463)):
> #0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>     at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
>         old = 1
> #1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
>         item = 0x0
> #2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
>         i = 2049
>         lifo = 0xffff9e4a6a0
>         item = 0x1000511c7e0
>         start = {tv_sec = 1422840607, tv_usec = 750871}
>         stop = {tv_sec = 17589991303296, tv_usec = 24}
>         total = {tv_sec = 17589991305936, tv_usec = 17589991336208}
>         timing = 2.8183218451323255e-315
> #3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
> No symbol table info available.
> 
> Thread 7 (Thread 0xfff78cef200 (LWP 32470)):
> #0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>     at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
>         old = 1
> #1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
> ---Type <return> to continue, or q <return> to quit---
>         item = 0x0
> #2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
>         i = 1883
>         lifo = 0xffff9e4a6a0
>         item = 0x1000511c7e0
>         start = {tv_sec = 1422840607, tv_usec = 751036}
>         stop = {tv_sec = 0, tv_usec = 0}
>         total = {tv_sec = 0, tv_usec = 0}
>         timing = 0
> #3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
> No symbol table info available.
> 
> Thread 6 (Thread 0xfff7aaef200 (LWP 32467)):
> #0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>     at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
>         old = 1
> #1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
>         item = 0x0
> #2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
>         i = 3250
>         lifo = 0xffff9e4a6a0
>         item = 0x1000511c7e0
>         start = {tv_sec = 1422840607, tv_usec = 750953}
>         stop = {tv_sec = 0, tv_usec = 0}
>         total = {tv_sec = 0, tv_usec = 0}
>         timing = 0
> #3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
> No symbol table info available.
> 
> Thread 5 (Thread 0xfff796ef200 (LWP 32469)):
> #0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>     at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
>         old = 1
> #1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
>         item = 0x0
> #2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
>         i = 1922
>         lifo = 0xffff9e4a6a0
>         item = 0x1000511c7e0
>         start = {tv_sec = 1422840607, tv_usec = 751004}
>         stop = {tv_sec = 0, tv_usec = 0}
>         total = {tv_sec = 0, tv_usec = 0}
>         timing = 0
> #3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
> No symbol table info available.
> 
> Thread 4 (Thread 0x80ad907ef0 (LWP 32444)):
> #0  0x00000080adb5c754 in .pthread_join () from /lib64/libpthread.so.0
> No symbol table info available.
> ---Type <return> to continue, or q <return> to quit---
> #1  0x0000000010001ccc in main (argc=1, argv=0xffff9e4ab68) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:163
>         ret = 0x1
>         i = 0
>         threads = {17589991305728, 17589980819968, 17589970334208, 
> 17589959848448, 17589949362688, 17589938876928, 17589928391168, 
> 17589917905408}
>         item = 0x1000511c8d0
>         prev = 0xffff9e4a6c0
>         item2 = 0x1000511b640
>         start = {tv_sec = 1422840607, tv_usec = 750782}
>         stop = {tv_sec = 1422840607, tv_usec = 515534}
>         total = {tv_sec = 0, tv_usec = 42314}
>         lifo = {super = {obj_class = 0xfff7d7733e8, obj_reference_count = 1}, 
> opal_lifo_head = {data = {counter = 0, item = 0x1000511c7e0}}, 
>           opal_lifo_ghost = {super = {obj_class = 0xfff7d773228, 
> obj_reference_count = 1}, opal_list_next = 0xffff9e4a6c0, opal_list_prev = 
> 0x0, 
>             item_free = 1}}
>         success = false
>         timing = 4.2313999999999998e-08
>         rc = 0
> 
> Thread 3 (Thread 0xfff7b4ef200 (LWP 32466)):
> #0  opal_atomic_swap_32 (addr=0x1000511c860, newval=1) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:52
>         old = 0
> #1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
>         item = 0x1000511c840
> #2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
>         i = 1876
>         lifo = 0xffff9e4a6a0
>         item = 0x1000511c840
>         start = {tv_sec = 1422840607, tv_usec = 750939}
>         stop = {tv_sec = 0, tv_usec = 0}
>         total = {tv_sec = 0, tv_usec = 0}
>         timing = 0
> #3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
> No symbol table info available.
> 
> Thread 2 (Thread 0xfff7c8ef200 (LWP 32464)):
> #0  0x0000000010000f88 in opal_atomic_cmpset_64 (addr=0xffff9e4a6b8, 
> oldval=1099596679232, newval=1099596679136)
>     at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/powerpc/atomic.h:194
>         ret = 1099596679232
> #1  0x00000000100010e4 in opal_atomic_cmpset_ptr (addr=0xffff9e4a6b8, 
> oldval=0x1000511c840, newval=0x1000511c7e0)
>     at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:227
> No locals.
> #2  0x0000000010001438 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:198
>         item = 0x1000511c840
> #3  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
>         i = 3968
>         lifo = 0xffff9e4a6a0
>         item = 0x1000511c840
>         start = {tv_sec = 1422840607, tv_usec = 750893}
>         stop = {tv_sec = 0, tv_usec = 0}
>         total = {tv_sec = 0, tv_usec = 0}
>         timing = 0
> #4  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #5  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
> ---Type <return> to continue, or q <return> to quit---
> No symbol table info available.
> 
> Thread 1 (Thread 0xfff7beef200 (LWP 32465)):
> #0  0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>     at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
>         old = 1
> #1  0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193
>         item = 0x0
> #2  0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50
>         i = 3734
>         lifo = 0xffff9e4a6a0
>         item = 0x1000511c7e0
>         start = {tv_sec = 1422840607, tv_usec = 750907}
>         stop = {tv_sec = 0, tv_usec = 0}
>         total = {tv_sec = 0, tv_usec = 0}
>         timing = 0
> #3  0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x00000080ada5a53c in .__clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) quit
> ]0;samuel@avoca:~tmp/OMPI/build-gcc/test/class[samuel@avoca class]$ exit
> 
> Script done on Mon 02 Feb 2015 12:33:16 EST

> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Searchable archives: 
> http://www.open-mpi.org/community/lists/devel/2015/02/index.php

Reply via email to