Returning to the libltdl question: I think we may have a problem here. If we remove libltdl and default to disable-dlopen, then the user will - without warning - slurp all components that are able to build into libompi. This includes everything they specified, BUT because of our "build if you can" policy, it also includes a lot of stuff that they didn't specify and may not even realize is present.
As a result, they not only will have a bloated memory footprint, but they also may very well have slurped in GPL libraries (e.g., if Slurm is present) that could potentially impact their legal situation. We may need to reconsider our build policy in light of this situation. On Mon, Feb 2, 2015 at 3:35 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > Ah -- the point being that this is not an issue related to the libltdl > work. > > > > On Feb 2, 2015, at 2:51 AM, Adrian Reber <adr...@lisas.de> wrote: > > > > I have reported the same error a few days ago and submitted it now as a > > github issue: https://github.com/open-mpi/ompi/issues/371 > > > > On Mon, Feb 02, 2015 at 12:36:54PM +1100, Christopher Samuel wrote: > >> On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote: > >> > >>> New tarball posted (same location). Now featuring 100% fewer "make > check" failures. > >> > >> On our BG/Q front-end node (PPC64, RHEL 6.4) I see: > >> > >> ../../config/test-driver: line 95: 30173 Segmentation fault (core > dumped) "$@" > $log_file 2>&1 > >> FAIL: opal_lifo > >> > >> Stack trace implies the culprit is in: > >> > >> #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > >> 51 old = *addr; > >> > >> I've attached a script of gdb doing "thread apply all bt full" in > >> case that's helpful. > >> > >> All the best, > >> Chris > >> -- > >> Christopher Samuel Senior Systems Administrator > >> VLSCI - Victorian Life Sciences Computation Initiative > >> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > >> http://www.vlsci.org.au/ http://twitter.com/vlsci > >> > > > >> Script started on Mon 02 Feb 2015 12:32:56 EST > >> > >> [samuel@avoca class]$ gdb > /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo > core.32444 > >> [?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1) > >> Copyright (C) 2010 Free Software Foundation, Inc. > >> License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > >> This is free software: you are free to change and redistribute it. > >> There is NO WARRANTY, to the extent permitted by law. Type "show > copying" > >> and "show warranty" for details. > >> This GDB was configured as "ppc64-redhat-linux-gnu". > >> For bug reporting instructions, please see: > >> <http://www.gnu.org/software/gdb/bugs/>... > >> Reading symbols from > /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done. > >> [New Thread 32465] > >> [New Thread 32464] > >> [New Thread 32466] > >> [New Thread 32444] > >> [New Thread 32469] > >> [New Thread 32467] > >> [New Thread 32470] > >> [New Thread 32463] > >> [New Thread 32468] > >> Missing separate debuginfo for > /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0 > >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install > /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe > >> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0 > >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install > /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949 > >> Missing separate debuginfo for > /usr/local/slurm/14.03.10/lib/libslurm.so.27 > >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install > /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96 > >> Missing separate debuginfo for > >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install > /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f > >> Reading symbols from > /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done. > >> Loaded symbols for > /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0 > >> Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done. > >> Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0 > >> Reading symbols from > /usr/local/slurm/14.03.10/lib/libslurm.so.27...done. > >> Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27 > >> Reading symbols from /lib64/libdl.so.2...(no debugging symbols > found)...done. > >> Loaded symbols for /lib64/libdl.so.2 > >> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols > found)...done. > >> [Thread debugging using libthread_db enabled] > >> Loaded symbols for /lib64/libpthread.so.0 > >> Reading symbols from /lib64/librt.so.1...(no debugging symbols > found)...done. > >> Loaded symbols for /lib64/librt.so.1 > >> Reading symbols from /lib64/libm.so.6...(no debugging symbols > found)...done. > >> Loaded symbols for /lib64/libm.so.6 > >> Reading symbols from /lib64/libutil.so.1...(no debugging symbols > found)...done. > >> Loaded symbols for /lib64/libutil.so.1 > >> Reading symbols from /lib64/libc.so.6...(no debugging symbols > found)...done. > >> Loaded symbols for /lib64/libc.so.6 > >> Reading symbols from /lib64/ld64.so.1...(no debugging symbols > found)...done. > >> Loaded symbols for /lib64/ld64.so.1 > >> Core was generated by > `/vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo '. > >> Program terminated with signal 11, Segmentation fault. > >> #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > >> 51 old = *addr; > >> Missing separate debuginfos, use: debuginfo-install > glibc-2.12-1.107.el6_4.5.ppc64 > >> (gdb) thread apply all bt full > >> > >> Thread 9 (Thread 0xfff7a0ef200 (LWP 32468)): > >> #0 0x00000080adb6629c in .__libc_write () from /lib64/libpthread.so.0 > >> No symbol table info available. > >> #1 0x00000fff7d6905b4 in show_stackframe (signo=11, > info=0xfff7a0ee3d8, p=0xfff7a0edd00) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/util/stacktrace.c:81 > >> print_buffer = "[avoca:32444] *** Process received signal > ***\n", '\000' <repeats 977 times> > >> tmp = 0xfff7a0ed858 "[avoca:32444] *** Process received signal > ***\n" > >> size = 1024 > >> ret = 46 > >> si_code_str = 0xfff7d75bab8 "" > >> #2 <signal handler called> > >> No symbol table info available. > >> #3 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > >> old = 1 > >> #4 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > >> item = 0x0 > >> #5 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > >> i = 4002 > >> lifo = 0xffff9e4a6a0 > >> item = 0x1000511c840 > >> start = {tv_sec = 1422840607, tv_usec = 750972} > >> stop = {tv_sec = 0, tv_usec = 0} > >> total = {tv_sec = 0, tv_usec = 0} > >> timing = 0 > >> #6 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > >> No symbol table info available. > >> #7 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > >> No symbol table info available. > >> > >> Thread 8 (Thread 0xfff7d2ef200 (LWP 32463)): > >> #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > >> old = 1 > >> #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > >> item = 0x0 > >> #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > >> i = 2049 > >> lifo = 0xffff9e4a6a0 > >> item = 0x1000511c7e0 > >> start = {tv_sec = 1422840607, tv_usec = 750871} > >> stop = {tv_sec = 17589991303296, tv_usec = 24} > >> total = {tv_sec = 17589991305936, tv_usec = 17589991336208} > >> timing = 2.8183218451323255e-315 > >> #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > >> No symbol table info available. > >> #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > >> No symbol table info available. > >> > >> Thread 7 (Thread 0xfff78cef200 (LWP 32470)): > >> #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > >> old = 1 > >> #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > >> ---Type <return> to continue, or q <return> to quit--- > >> item = 0x0 > >> #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > >> i = 1883 > >> lifo = 0xffff9e4a6a0 > >> item = 0x1000511c7e0 > >> start = {tv_sec = 1422840607, tv_usec = 751036} > >> stop = {tv_sec = 0, tv_usec = 0} > >> total = {tv_sec = 0, tv_usec = 0} > >> timing = 0 > >> #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > >> No symbol table info available. > >> #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > >> No symbol table info available. > >> > >> Thread 6 (Thread 0xfff7aaef200 (LWP 32467)): > >> #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > >> old = 1 > >> #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > >> item = 0x0 > >> #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > >> i = 3250 > >> lifo = 0xffff9e4a6a0 > >> item = 0x1000511c7e0 > >> start = {tv_sec = 1422840607, tv_usec = 750953} > >> stop = {tv_sec = 0, tv_usec = 0} > >> total = {tv_sec = 0, tv_usec = 0} > >> timing = 0 > >> #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > >> No symbol table info available. > >> #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > >> No symbol table info available. > >> > >> Thread 5 (Thread 0xfff796ef200 (LWP 32469)): > >> #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > >> old = 1 > >> #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > >> item = 0x0 > >> #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > >> i = 1922 > >> lifo = 0xffff9e4a6a0 > >> item = 0x1000511c7e0 > >> start = {tv_sec = 1422840607, tv_usec = 751004} > >> stop = {tv_sec = 0, tv_usec = 0} > >> total = {tv_sec = 0, tv_usec = 0} > >> timing = 0 > >> #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > >> No symbol table info available. > >> #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > >> No symbol table info available. > >> > >> Thread 4 (Thread 0x80ad907ef0 (LWP 32444)): > >> #0 0x00000080adb5c754 in .pthread_join () from /lib64/libpthread.so.0 > >> No symbol table info available. > >> ---Type <return> to continue, or q <return> to quit--- > >> #1 0x0000000010001ccc in main (argc=1, argv=0xffff9e4ab68) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:163 > >> ret = 0x1 > >> i = 0 > >> threads = {17589991305728, 17589980819968, 17589970334208, > 17589959848448, 17589949362688, 17589938876928, 17589928391168, > 17589917905408} > >> item = 0x1000511c8d0 > >> prev = 0xffff9e4a6c0 > >> item2 = 0x1000511b640 > >> start = {tv_sec = 1422840607, tv_usec = 750782} > >> stop = {tv_sec = 1422840607, tv_usec = 515534} > >> total = {tv_sec = 0, tv_usec = 42314} > >> lifo = {super = {obj_class = 0xfff7d7733e8, obj_reference_count > = 1}, opal_lifo_head = {data = {counter = 0, item = 0x1000511c7e0}}, > >> opal_lifo_ghost = {super = {obj_class = 0xfff7d773228, > obj_reference_count = 1}, opal_list_next = 0xffff9e4a6c0, opal_list_prev = > 0x0, > >> item_free = 1}} > >> success = false > >> timing = 4.2313999999999998e-08 > >> rc = 0 > >> > >> Thread 3 (Thread 0xfff7b4ef200 (LWP 32466)): > >> #0 opal_atomic_swap_32 (addr=0x1000511c860, newval=1) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:52 > >> old = 0 > >> #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > >> item = 0x1000511c840 > >> #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > >> i = 1876 > >> lifo = 0xffff9e4a6a0 > >> item = 0x1000511c840 > >> start = {tv_sec = 1422840607, tv_usec = 750939} > >> stop = {tv_sec = 0, tv_usec = 0} > >> total = {tv_sec = 0, tv_usec = 0} > >> timing = 0 > >> #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > >> No symbol table info available. > >> #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > >> No symbol table info available. > >> > >> Thread 2 (Thread 0xfff7c8ef200 (LWP 32464)): > >> #0 0x0000000010000f88 in opal_atomic_cmpset_64 (addr=0xffff9e4a6b8, > oldval=1099596679232, newval=1099596679136) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/powerpc/atomic.h:194 > >> ret = 1099596679232 > >> #1 0x00000000100010e4 in opal_atomic_cmpset_ptr (addr=0xffff9e4a6b8, > oldval=0x1000511c840, newval=0x1000511c7e0) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:227 > >> No locals. > >> #2 0x0000000010001438 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:198 > >> item = 0x1000511c840 > >> #3 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > >> i = 3968 > >> lifo = 0xffff9e4a6a0 > >> item = 0x1000511c840 > >> start = {tv_sec = 1422840607, tv_usec = 750893} > >> stop = {tv_sec = 0, tv_usec = 0} > >> total = {tv_sec = 0, tv_usec = 0} > >> timing = 0 > >> #4 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > >> No symbol table info available. > >> #5 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > >> ---Type <return> to continue, or q <return> to quit--- > >> No symbol table info available. > >> > >> Thread 1 (Thread 0xfff7beef200 (LWP 32465)): > >> #0 0x0000000010001048 in opal_atomic_swap_32 (addr=0x20, newval=1) > >> at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51 > >> old = 1 > >> #1 0x0000000010001408 in opal_lifo_pop_atomic (lifo=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/class/opal_lifo.h:193 > >> item = 0x0 > >> #2 0x0000000010001630 in thread_test (arg=0xffff9e4a6a0) at > /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/test/class/opal_lifo.c:50 > >> i = 3734 > >> lifo = 0xffff9e4a6a0 > >> item = 0x1000511c7e0 > >> start = {tv_sec = 1422840607, tv_usec = 750907} > >> stop = {tv_sec = 0, tv_usec = 0} > >> total = {tv_sec = 0, tv_usec = 0} > >> timing = 0 > >> #3 0x00000080adb5c21c in .start_thread () from /lib64/libpthread.so.0 > >> No symbol table info available. > >> #4 0x00000080ada5a53c in .__clone () from /lib64/libc.so.6 > >> No symbol table info available. > >> (gdb) quit > >> ]0;samuel@avoca:~tmp/OMPI/build-gcc/test/class [samuel@avoca class]$ > exit > >> > >> Script done on Mon 02 Feb 2015 12:33:16 EST > > > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Searchable archives: > http://www.open-mpi.org/community/lists/devel/2015/02/index.php > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16873.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16875.php >