Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Adrian Reber
I have reported the same error a few days ago and submitted it now as a
github issue: https://github.com/open-mpi/ompi/issues/371

On Mon, Feb 02, 2015 at 12:36:54PM +1100, Christopher Samuel wrote:
> On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote:
> 
> > New tarball posted (same location).  Now featuring 100% fewer "make check" 
> > failures.
> 
> On our BG/Q front-end node (PPC64, RHEL 6.4) I see:
> 
> ../../config/test-driver: line 95: 30173 Segmentation fault  (core 
> dumped) "$@" > $log_file 2>&1
> FAIL: opal_lifo
> 
> Stack trace implies the culprit is in:
> 
> #0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
> at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
> 51  old = *addr;
> 
> I've attached a script of gdb doing "thread apply all bt full" in
> case that's helpful.
> 
> All the best,
> Chris
> -- 
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci
> 

> Script started on Mon 02 Feb 2015 12:32:56 EST
> 
> [samuel@avoca class]$ gdb 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo 
> core.32444
> [?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "ppc64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> ...
> Reading symbols from 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done.
> [New Thread 32465]
> [New Thread 32464]
> [New Thread 32466]
> [New Thread 32444]
> [New Thread 32469]
> [New Thread 32467]
> [New Thread 32470]
> [New Thread 32463]
> [New Thread 32468]
> Missing separate debuginfo for 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe
> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949
> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libslurm.so.27
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96
> Missing separate debuginfo for 
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f
> Reading symbols from 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done.
> Loaded symbols for 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
> Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done.
> Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0
> Reading symbols from /usr/local/slurm/14.03.10/lib/libslurm.so.27...done.
> Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
> found)...done.
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libutil.so.1...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/libutil.so.1
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld64.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/ld64.so.1
> Core was generated by 
> `/vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo '.
> Program terminated with signal 11, Segmentation fault.
> #0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
> at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
> 51old = *addr;
> Missing separate debuginfos, use: debuginfo-install 
> glibc-2.12-1.107.el6_4.5.ppc64
> (gdb) thread apply all bt full
> 
> Thread 9 (Thread 0xfff7a0ef200 (LWP 32468)):
> #0  0x0080adb6629c in .__libc_write () from /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x0fff7d6905b4 in show_stackframe (signo=11, info=0xfff7

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-02-02 Thread Adrian Reber
https://github.com/open-mpi/ompi/issues/372

On Sat, Jan 31, 2015 at 01:38:54PM +, Jeff Squyres (jsquyres) wrote:
> Adrian --
> 
> Can you file this as a Github issue?  Thanks.
> 
> 
> > On Jan 17, 2015, at 12:58 PM, Adrian Reber  wrote:
> > 
> > This time my bug report is not PSM related:
> > 
> > I was able to reproduce the MTT error from 
> > http://mtt.open-mpi.org/index.php?do_redir=2228
> > on my system with openmpi-dev-720-gf4693c9:
> > 
> > mpi_test_suite: btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 
> > 255' failed.
> > [n050409:06796] *** Process received signal ***
> > [n050409:06796] Signal: Aborted (6)
> > [n050409:06796] Signal code:  (-6)
> > [n050409:06796] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2b036d501710]
> > [n050409:06796] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2b036d741635]
> > [n050409:06796] [ 2] /lib64/libc.so.6(abort+0x175)[0x2b036d742e15]
> > [n050409:06796] [ 3] /lib64/libc.so.6(+0x2b75e)[0x2b036d73a75e]
> > [n050409:06796] [ 4] 
> > /lib64/libc.so.6(__assert_perror_fail+0x0)[0x2b036d73a820]
> > [n050409:06796] [ 5] 
> > /lustre/ws1/ws/adrian-mtt-0/ompi-install/lib/openmpi/mca_btl_openib.so(mca_btl_openib_alloc+0x77)[0x2b03730cf6d0]
> > [n050409:06796] [ 6] 
> > /lustre/ws1/ws/adrian-mtt-0/ompi-install/lib/openmpi/mca_btl_openib.so(mca_btl_openib_sendi+0x5e5)[0x2b03730d1ae9]
> > [n050409:06796] [ 7] 
> > /lustre/ws1/ws/adrian-mtt-0/ompi-install/lib/openmpi/mca_pml_ob1.so(+0xd407)[0x2b0373961407]
> > [n050409:06796] [ 8] 
> > /lustre/ws1/ws/adrian-mtt-0/ompi-install/lib/openmpi/mca_pml_ob1.so(+0xde45)[0x2b0373961e45]
> > [n050409:06796] [ 9] 
> > /lustre/ws1/ws/adrian-mtt-0/ompi-install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x1ce)[0x2b0373962501]
> > [n050409:06796] [10] 
> > /lustre/ws1/ws/adrian-mtt-0/ompi-install/lib/libmpi.so.0(PMPI_Send+0x2b4)[0x2b036d20d1bb]
> > [n050409:06796] [11] mpi_test_suite[0x464424]
> > [n050409:06796] [12] mpi_test_suite[0x470304]
> > [n050409:06796] [13] mpi_test_suite[0x444a72]
> > [n050409:06796] [14] 
> > /lib64/libc.so.6(__libc_start_main+0xfd)[0x2b036d72dd5d]
> > [n050409:06796] [15] mpi_test_suite[0x4051a9]
> > [n050409:06796] *** End of error message ***
> > --
> > mpirun noticed that process rank 0 with PID 0 on node n050409 exited on 
> > signal 6 (Aborted).
> > --
> > 
> > Core was generated by `mpi_test_suite -t p2p'.
> > Program terminated with signal 6, Aborted.
> > (gdb) bt
> > #0  0x2b036d741635 in raise () from /lib64/libc.so.6
> > #1  0x2b036d742d9d in abort () from /lib64/libc.so.6
> > #2  0x2b036d73a75e in __assert_fail_base () from /lib64/libc.so.6
> > #3  0x2b036d73a820 in __assert_fail () from /lib64/libc.so.6
> > #4  0x2b03730cf6d0 in mca_btl_openib_alloc (btl=0x224e740, 
> > ep=0x22b66a0, order=255 '\377', size=73014, flags=3) at btl_openib.c:1200
> > #5  0x2b03730d1ae9 in mca_btl_openib_sendi (btl=0x224e740, 
> > ep=0x22b66a0, convertor=0x7fff2c527bb0, header=0x7fff2c527cd0, 
> > header_size=14, payload_size=73000, order=255 '\377', flags=3, 
> >tag=65 'A', descriptor=0x7fff2c527ce8) at btl_openib.c:1829
> > #6  0x2b0373961407 in mca_bml_base_sendi (bml_btl=0x2198850, 
> > convertor=0x7fff2c527bb0, header=0x7fff2c527cd0, header_size=14, 
> > payload_size=73000, order=255 '\377', flags=3, tag=65 'A', 
> >descriptor=0x7fff2c527ce8) at ../../../../ompi/mca/bml/bml.h:305
> > #7  0x2b0373961e45 in mca_pml_ob1_send_inline (buf=0x2b7b760, count=1, 
> > datatype=0x2b97440, dst=1, tag=37, seqn=3639, dst_proc=0x21c2940, 
> > endpoint=0x22dff00, comm=0x6939e0) at pml_ob1_isend.c:107
> > #8  0x2b0373962501 in mca_pml_ob1_send (buf=0x2b7b760, count=1, 
> > datatype=0x2b97440, dst=1, tag=37, sendmode=MCA_PML_BASE_SEND_STANDARD, 
> > comm=0x6939e0) at pml_ob1_isend.c:214
> > #9  0x2b036d20d1bb in PMPI_Send (buf=0x2b7b760, count=1, 
> > type=0x2b97440, dest=1, tag=37, comm=0x6939e0) at psend.c:78
> > #10 0x00464424 in tst_p2p_simple_ring_xsend_run 
> > (env=0x7fff2c528530) at p2p/tst_p2p_simple_ring_xsend.c:97
> > #11 0x00470304 in tst_test_run_func (env=0x7fff2c528530) at 
> > tst_tests.c:1463
> > #12 0x00444a72 in main (argc=3, argv=0x7fff2c5287f8) at 
> > mpi_test_suite.c:639
> > 
> > This is with --enable-debug. Without --enable-debug I get a
> > segmentation fault, but not always. Using fewer cores it works most
> > of the time. With 32 cores on 4 nodes it happens almost
> > all the time. If it does not crash using fewer cores I get messages like:
> > 
> > [n050409][[36216,1],1][btl_openib_xrc.c:58:mca_btl_openib_xrc_check_api] 
> > XRC error: bad XRC API (require XRC from OFED pre 3.12).
> > 
> > Adrian
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this p

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
Ah -- the point being that this is not an issue related to the libltdl work.


> On Feb 2, 2015, at 2:51 AM, Adrian Reber  wrote:
> 
> I have reported the same error a few days ago and submitted it now as a
> github issue: https://github.com/open-mpi/ompi/issues/371
> 
> On Mon, Feb 02, 2015 at 12:36:54PM +1100, Christopher Samuel wrote:
>> On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote:
>> 
>>> New tarball posted (same location).  Now featuring 100% fewer "make check" 
>>> failures.
>> 
>> On our BG/Q front-end node (PPC64, RHEL 6.4) I see:
>> 
>> ../../config/test-driver: line 95: 30173 Segmentation fault  (core 
>> dumped) "$@" > $log_file 2>&1
>> FAIL: opal_lifo
>> 
>> Stack trace implies the culprit is in:
>> 
>> #0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>>at 
>> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
>> 51  old = *addr;
>> 
>> I've attached a script of gdb doing "thread apply all bt full" in
>> case that's helpful.
>> 
>> All the best,
>> Chris
>> -- 
>> Christopher SamuelSenior Systems Administrator
>> VLSCI - Victorian Life Sciences Computation Initiative
>> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>> http://www.vlsci.org.au/  http://twitter.com/vlsci
>> 
> 
>> Script started on Mon 02 Feb 2015 12:32:56 EST
>> 
>> [samuel@avoca class]$ gdb 
>> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo 
>> core.32444
>> [?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
>> Copyright (C) 2010 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later 
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "ppc64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> ...
>> Reading symbols from 
>> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done.
>> [New Thread 32465]
>> [New Thread 32464]
>> [New Thread 32466]
>> [New Thread 32444]
>> [New Thread 32469]
>> [New Thread 32467]
>> [New Thread 32470]
>> [New Thread 32463]
>> [New Thread 32468]
>> Missing separate debuginfo for 
>> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
>> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
>> /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe
>> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0
>> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
>> /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949
>> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libslurm.so.27
>> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
>> /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96
>> Missing separate debuginfo for 
>> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
>> /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f
>> Reading symbols from 
>> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done.
>> Loaded symbols for 
>> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
>> Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done.
>> Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0
>> Reading symbols from /usr/local/slurm/14.03.10/lib/libslurm.so.27...done.
>> Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27
>> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libdl.so.2
>> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
>> found)...done.
>> [Thread debugging using libthread_db enabled]
>> Loaded symbols for /lib64/libpthread.so.0
>> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/librt.so.1
>> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libm.so.6
>> Reading symbols from /lib64/libutil.so.1...(no debugging symbols 
>> found)...done.
>> Loaded symbols for /lib64/libutil.so.1
>> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libc.so.6
>> Reading symbols from /lib64/ld64.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/ld64.so.1
>> Core was generated by 
>> `/vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo '.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>>at 
>> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
>> 51   old = *addr;
>> Missing separate debuginfos, use: debuginfo-install 
>> glibc-2.12-1.107.el6_4.5.ppc64
>> (gdb) thread apply

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Chris Samuel
On Mon, 2 Feb 2015 11:35:40 AM Jeff Squyres wrote:

> Ah -- the point being that this is not an issue related to the libltdl work.

Sorry - I saw the request to test the tarball and tried it out, missed the 
significance of the subject. :-/

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Ralph Castain
Returning to the libltdl question: I think we may have a problem here. If
we remove libltdl and default to disable-dlopen, then the user will -
without warning - slurp all components that are able to build into libompi.
This includes everything they specified, BUT because of our "build if you
can" policy, it also includes a lot of stuff that they didn't specify and
may not even realize is present.

As a result, they not only will have a bloated memory footprint, but they
also may very well have slurped in GPL libraries (e.g., if Slurm is
present) that could potentially impact their legal situation. We may need
to reconsider our build policy in light of this situation.


On Mon, Feb 2, 2015 at 3:35 AM, Jeff Squyres (jsquyres) 
wrote:

> Ah -- the point being that this is not an issue related to the libltdl
> work.
>
>
> > On Feb 2, 2015, at 2:51 AM, Adrian Reber  wrote:
> >
> > I have reported the same error a few days ago and submitted it now as a
> > github issue: https://github.com/open-mpi/ompi/issues/371
> >
> > On Mon, Feb 02, 2015 at 12:36:54PM +1100, Christopher Samuel wrote:
> >> On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote:
> >>
> >>> New tarball posted (same location).  Now featuring 100% fewer "make
> check" failures.
> >>
> >> On our BG/Q front-end node (PPC64, RHEL 6.4) I see:
> >>
> >> ../../config/test-driver: line 95: 30173 Segmentation fault  (core
> dumped) "$@" > $log_file 2>&1
> >> FAIL: opal_lifo
> >>
> >> Stack trace implies the culprit is in:
> >>
> >> #0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
> >>at
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
> >> 51  old = *addr;
> >>
> >> I've attached a script of gdb doing "thread apply all bt full" in
> >> case that's helpful.
> >>
> >> All the best,
> >> Chris
> >> --
> >> Christopher SamuelSenior Systems Administrator
> >> VLSCI - Victorian Life Sciences Computation Initiative
> >> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> >> http://www.vlsci.org.au/  http://twitter.com/vlsci
> >>
> >
> >> Script started on Mon 02 Feb 2015 12:32:56 EST
> >>
> >> [samuel@avoca class]$ gdb
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo
> core.32444
> >>  [?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> >> Copyright (C) 2010 Free Software Foundation, Inc.
> >> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> >> This is free software: you are free to change and redistribute it.
> >> There is NO WARRANTY, to the extent permitted by law.  Type "show
> copying"
> >> and "show warranty" for details.
> >> This GDB was configured as "ppc64-redhat-linux-gnu".
> >> For bug reporting instructions, please see:
> >> ...
> >> Reading symbols from
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done.
> >> [New Thread 32465]
> >> [New Thread 32464]
> >> [New Thread 32466]
> >> [New Thread 32444]
> >> [New Thread 32469]
> >> [New Thread 32467]
> >> [New Thread 32470]
> >> [New Thread 32463]
> >> [New Thread 32468]
> >> Missing separate debuginfo for
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
> >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install
> /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe
> >> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0
> >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install
> /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949
> >> Missing separate debuginfo for
> /usr/local/slurm/14.03.10/lib/libslurm.so.27
> >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install
> /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96
> >> Missing separate debuginfo for
> >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install
> /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f
> >> Reading symbols from
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done.
> >> Loaded symbols for
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
> >> Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done.
> >> Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0
> >> Reading symbols from
> /usr/local/slurm/14.03.10/lib/libslurm.so.27...done.
> >> Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27
> >> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> >> Loaded symbols for /lib64/libdl.so.2
> >> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> >> [Thread debugging using libthread_db enabled]
> >> Loaded symbols for /lib64/libpthread.so.0
> >> Reading symbols from /lib64/librt.so.1...(no debugging symbols
> found)...done.
> >> Loaded symbols for /lib64/librt.so.1
> >> Reading symbols from /lib64/libm.so.6...(no debugging symbols
> found)...done.

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Ralph Castain
Hi Chris

Just out of curiosity: I see you are reporting about a build on the
headnode of a BG cluster. We've never ported OMPI to BG - are you using it
on such a system? Or were you just test building the code on a convenient
server?

Ralph


On Mon, Feb 2, 2015 at 3:52 AM, Chris Samuel  wrote:

> On Mon, 2 Feb 2015 11:35:40 AM Jeff Squyres wrote:
>
> > Ah -- the point being that this is not an issue related to the libltdl
> work.
>
> Sorry - I saw the request to test the tarball and tried it out, missed the
> significance of the subject. :-/
>
> --
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/16876.php
>


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
Uuuurggghhh.

More below.


> On Feb 2, 2015, at 1:04 PM, Ralph Castain  wrote:
> 
> Returning to the libltdl question: I think we may have a problem here. If we 
> remove libltdl and default to disable-dlopen, then the user will - without 
> warning - slurp all components that are able to build into libompi. This 
> includes everything they specified, BUT because of our "build if you can" 
> policy, it also includes a lot of stuff that they didn't specify and may not 
> even realize is present.

Yes, this is true -- the size of libmpi.so (etc.) will actually go up.



It would be an interesting experiment to see if the process size actually 
increases.  When you dlopen() a DSO, it's loaded into distinct pages -- even 
components that are fairly small (e.g., mca_btl_self.so is 63726 bytes on my 
system) are automatically inflated to be multiples of 4K.  When all the 
components are packed into libmpi.so (etc), the end result is actually smaller.

That being said, when built as DSOs, OMPI can (and likely does) dlclose 
components that you don't use at run time.  You obviously can't do that when 
all the components are in libmpi.so (etc.).  Meaning: there's forces pulling 
both ways here -- I wonder whether users will typically grow or shrink their 
process sizes...?

The answer may be an obvious "your process will grow", but it may not be.  If 
someone has some spare cycles (hah!), this would be an interesting experiment.  
:-)



We've had these discussions before; the conclusion of which was to ensure that 
we provide "--disable" and "--without" options for those people who know 
exactly what they want, and don't want anything else.

So Ralph -- I hear the cautionary warning that you're raising.  Are the 
--disable/--without options no longer viable?

> As a result, they not only will have a bloated memory footprint, but they 
> also may very well have slurped in GPL libraries (e.g., if Slurm is present) 
> that could potentially impact their legal situation. We may need to 
> reconsider our build policy in light of this situation.

IANAL and all that.

If you're distributing binaries, my understanding is that this doesn't change 
your legal situation.  I.e., if you're a) building an OMPI component that links 
against GPL libraries, and then b) distributing those binaries, it doesn't 
matter if you built the component as a DSO or as part of (for example) 
libmpi.so.

-

All that being said, yes, removing our default model of plugins is a *big* 
change.  There are many subtle issues involved (including those that Ralph 
brought up in this mail).

If we want to keep this model (plugins by default), the only way I can think of 
to do that is to manually embed libltdl ourselves.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] HELP in OpenMPI - for PH.D research

2015-02-02 Thread Jeff Squyres (jsquyres)
On Jan 25, 2015, at 1:06 PM, Cyrille DIBAMOU MBEUYO  wrote:
> 
> Good afternoon development team,
> 
> I have a small problem in OpenMPI to achieve my Ph.D research
> 
> My problem is that :
> 
> while saving the context.PID of a process running on a node with BLCR
> through OpenMPI on the checkpoint folder, i also want to get and save the
> utilisation average of the CPU and the Memory for this process on a
> file, and use this informations later.

I was hoping Adrian would answer here, since this is a CR question.  :-)

The current code does not do this, as you have discovered -- the only way to 
save it would be to modify the code to do this.  Are you comfortable doing that?

If so, what version of OMPI are you using?

> Or there is another method to have this informations ?

Do you want this information in an ongoing basis, or just when you checkpoint / 
restart?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
Re-adding devel, since Paul sent me the logs off-list.

(per Ralph's comment, we may or may not stick with this don't-build-libltdl 
philosophy, but I'd still like to run this issue down)

Howard: see Paul's notes below.  It's on the hopper system at Nersc.

Do you have any Cray insight here?  (see below for the exact issue)


> On Feb 1, 2015, at 3:52 AM, Paul Hargrove  wrote:
> 
> Jeff (off-list),
> 
> Original make was with V=1, so I skipped the "make clean" before "record/send 
> the output of make w/ V=1".
> All the requested files should be in the attached .tar.bz2.
> 
> What I see from configure is the following is explicit about "without search 
> path":
> configure:63392: result: looking for library without search path
> configure:63394: checking for lt_dlopen in -lltdl
> configure:63419: pgcc -o conftest -g   conftest.c -lltdl  -lrt -lutil  >&5
> configure:63419: $? = 0
> configure:63428: result: yes
> 
> The "make V=1" shows "-ltdl" passed to libtool in the line before the one I 
> quoted previously.
> Libtool then *instead* passes "/usr/lib/libltdl.so" to the link command.
> So, I've included the generated config.lt, which appears to place /usr/lib 
> ahead of /usr/lib64 in its search path(s).
> 
> Let me know what else you may need.
> This is on NERSC's Hopper, where Howard and Nathan both have accounts (though 
> I did see the message about Nathan taking some time off).

Here's the full output from the logs that Paul sent to me -- you can see that 
Makefile passes "-lltdl" and then libtool converts it to /usr/lib/ltdl.so:

/bin/sh ../libtool  --tag=CC   --mode=link pgcc 
-DOPAL_CONFIGURE_HOST="\"hopper09\"" -g  -version-info 0:0:0  -o libopen-pal.la 
-rpath 
/scratch/scratchdirs/hargrove/OMPI/openmpi-libltdl-linux-x86_64-pgi-14.7/INST/lib
  class/opal_bitmap.lo class/opal_free_list.lo class/opal_hash_table.lo 
class/opal_hotel.lo class/opal_tree.lo class/opal_list.lo class/opal_object.lo 
class/opal_graph.lo class/opal_lifo.lo class/opal_fifo.lo 
class/opal_pointer_array.lo class/opal_value_array.lo class/opal_ring_buffer.lo 
class/opal_rb_tree.lo class/ompi_free_list.lo memoryhooks/memory.lo 
runtime/opal_progress.lo runtime/opal_finalize.lo runtime/opal_init.lo 
runtime/opal_params.lo runtime/opal_cr.lo runtime/opal_info_support.lo 
runtime/opal_progress_threads.lo threads/condition.lo threads/mutex.lo 
threads/thread.lo dss/dss_internal_functions.lo dss/dss_compare.lo 
dss/dss_copy.lo dss/dss_dump.lo dss/dss_load_unload.lo dss/dss_lookup.lo 
dss/dss_pack.lo dss/dss_peek.lo dss/dss_print.lo dss/dss_register.lo 
dss/dss_unpack.lo dss/dss_open_close.lo asm/libasm.la datatype/libdatatype.la 
mca/base/libmca_base.la util/libopalutil.la  mca/allocator/libmca_allocator.la  
mca/backtrace/libmca_backtrace.la 
mca/backtrace/execinfo/libmca_backtrace_execinfo.la  mca/btl/libmca_btl.la  
mca/compress/libmca_compress.la  mca/crs/libmca_crs.la  
mca/dstore/libmca_dstore.la  mca/event/libmca_event.la 
mca/event/libevent2022/libmca_event_libevent2022.la  mca/hwloc/libmca_hwloc.la 
mca/hwloc/hwloc191/libmca_hwloc_hwloc191.la  mca/if/libmca_if.la 
mca/if/posix_ipv4/libmca_if_posix_ipv4.la 
mca/if/linux_ipv6/libmca_if_linux_ipv6.la  
mca/installdirs/libmca_installdirs.la 
mca/installdirs/config/libmca_installdirs_config.la 
mca/installdirs/env/libmca_installdirs_env.la  
mca/memchecker/libmca_memchecker.la  mca/memcpy/libmca_memcpy.la  
mca/memory/libmca_memory.la mca/memory/linux/libmca_memory_linux.la  
mca/mpool/libmca_mpool.la  mca/pmix/libmca_pmix.la  mca/pstat/libmca_pstat.la  
mca/rcache/libmca_rcache.la  mca/sec/libmca_sec.la  mca/shmem/libmca_shmem.la  
mca/timer/libmca_timer.la mca/timer/linux/libmca_timer_linux.la  -lrt -lutil  
-lltdl   -lrt -lutil  -lltdl  
libtool: link: pgcc -shared  -fpic -DPIC  class/.libs/opal_bitmap.o 
class/.libs/opal_free_list.o class/.libs/opal_hash_table.o 
class/.libs/opal_hotel.o class/.libs/opal_tree.o class/.libs/opal_list.o 
class/.libs/opal_object.o class/.libs/opal_graph.o class/.libs/opal_lifo.o 
class/.libs/opal_fifo.o class/.libs/opal_pointer_array.o 
class/.libs/opal_value_array.o class/.libs/opal_ring_buffer.o 
class/.libs/opal_rb_tree.o class/.libs/ompi_free_list.o 
memoryhooks/.libs/memory.o runtime/.libs/opal_progress.o 
runtime/.libs/opal_finalize.o runtime/.libs/opal_init.o 
runtime/.libs/opal_params.o runtime/.libs/opal_cr.o 
runtime/.libs/opal_info_support.o runtime/.libs/opal_progress_threads.o 
threads/.libs/condition.o threads/.libs/mutex.o threads/.libs/thread.o 
dss/.libs/dss_internal_functions.o dss/.libs/dss_compare.o dss/.libs/dss_copy.o 
dss/.libs/dss_dump.o dss/.libs/dss_load_unload.o dss/.libs/dss_lookup.o 
dss/.libs/dss_pack.o dss/.libs/dss_peek.o dss/.libs/dss_print.o 
dss/.libs/dss_register.o dss/.libs/dss_unpack.o dss/.libs/dss_open_close.o  
-Wl,--whole-archive,asm/.libs/libasm.a,datatype/.libs/libdatatype.a,mca/base/.libs/libmca_base.a,util/.libs/libopalutil.a,mca/allocator/.libs/libmca_allocator.a,mca/backtrace/.libs/libmca_backtra

[OMPI devel] confusing output when no c++ compiler

2015-02-02 Thread Paul Hargrove
The output below occurred testing Jeff's no-embedded-libltdl tarball, but I
am assuming in quite likely the same is true on the trunk.

The "issue" is that I am told by configure that "C and C++ compilers are
not link compatible".
However, it appears I just don't have a C++ compiler at all!!

I am not sure (and too lazy/busy to re-read README) if Open MPI currently
requires a C++ compiler, but that certainly appears to be the case.  If
that is *not* the intent, then this issue is bigger than a misleading error
message.

I have configured this Linux/x86-64 system with only --prefix=... and
--enable-debug.

-Paul


*** C++ compiler and preprocessor
checking for g++... no
checking for c++... no
checking for gpp... no
checking for aCC... no
checking for CC... no
checking for cxx... no
checking for cc++... no
checking for cl.exe... no
checking for FCC... no
checking for KCC... no
checking for RCC... no
checking for xlC_r... no
checking for xlC... no
checking whether we are using the GNU C++ compiler... no
checking whether g++ accepts -g... no
checking dependency style of g++... none
checking how to run the C++ preprocessor... /lib/cpp
checking for the C++ compiler vendor... unknown
configure: WARNING: -g has been added to CXXFLAGS (--enable-debug)
checking if C and C++ are link compatible... no
**
* It appears that your C++ compiler is unable to link against object
* files created by your C compiler.  This generally indicates either
* a conflict between the options specified in CFLAGS and CXXFLAGS
* or a problem with the local compiler installation.  More
* information (including exactly what command was given to the
* compilers and what error resulted when the commands were executed) is
* available in the config.log file in this directory.
**
configure: error: C and C++ compilers are not link compatible.  Can not
continue.

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
Jeff and Howard,

Just a couple minor points:

1.  In case one has lost track, the reason the behavior described by Jeff
is erroneous is that /usr/lib contains 32-bit libs (and target is 64-bit).
Therefore libtool should have replaced -lltdl with /usr/lib64/libltdl.so
(if at all).

2a.  Jeff does raise a good point that the problem might be Cray-specific.
It is worth noting that I was performing a build for the login node (not
the compute nodes), using the PGI-14.7.0 compiler.  Configure options are
--prefix-...  --enable-debug CC=pgcc CXX=pgCC FC=pgf90

2b.  I am retrying now with all of Cray's environment modules unloaded
except the one for the PGI compiler.  Nathan had suggested something like
this to me in the past, but I've never had issues with the default
environment.  I will report the result when available.

-Paul

On Mon, Feb 2, 2015 at 1:18 PM, Jeff Squyres (jsquyres) 
wrote:

> Re-adding devel, since Paul sent me the logs off-list.
>
> (per Ralph's comment, we may or may not stick with this
> don't-build-libltdl philosophy, but I'd still like to run this issue down)
>
> Howard: see Paul's notes below.  It's on the hopper system at Nersc.
>
> Do you have any Cray insight here?  (see below for the exact issue)
>
>
> > On Feb 1, 2015, at 3:52 AM, Paul Hargrove  wrote:
> >
> > Jeff (off-list),
> >
> > Original make was with V=1, so I skipped the "make clean" before
> "record/send the output of make w/ V=1".
> > All the requested files should be in the attached .tar.bz2.
> >
> > What I see from configure is the following is explicit about "without
> search path":
> > configure:63392: result: looking for library without search path
> > configure:63394: checking for lt_dlopen in -lltdl
> > configure:63419: pgcc -o conftest -g   conftest.c -lltdl  -lrt -lutil
> >&5
> > configure:63419: $? = 0
> > configure:63428: result: yes
> >
> > The "make V=1" shows "-ltdl" passed to libtool in the line before the
> one I quoted previously.
> > Libtool then *instead* passes "/usr/lib/libltdl.so" to the link command.
> > So, I've included the generated config.lt, which appears to place
> /usr/lib ahead of /usr/lib64 in its search path(s).
> >
> > Let me know what else you may need.
> > This is on NERSC's Hopper, where Howard and Nathan both have accounts
> (though I did see the message about Nathan taking some time off).
>
> Here's the full output from the logs that Paul sent to me -- you can see
> that Makefile passes "-lltdl" and then libtool converts it to
> /usr/lib/ltdl.so:
>
> /bin/sh ../libtool  --tag=CC   --mode=link pgcc
> -DOPAL_CONFIGURE_HOST="\"hopper09\"" -g  -version-info 0:0:0  -o
> libopen-pal.la -rpath
> /scratch/scratchdirs/hargrove/OMPI/openmpi-libltdl-linux-x86_64-pgi-14.7/INST/lib
> class/opal_bitmap.lo class/opal_free_list.lo class/opal_hash_table.lo
> class/opal_hotel.lo class/opal_tree.lo class/opal_list.lo
> class/opal_object.lo class/opal_graph.lo class/opal_lifo.lo
> class/opal_fifo.lo class/opal_pointer_array.lo class/opal_value_array.lo
> class/opal_ring_buffer.lo class/opal_rb_tree.lo class/ompi_free_list.lo
> memoryhooks/memory.lo runtime/opal_progress.lo runtime/opal_finalize.lo
> runtime/opal_init.lo runtime/opal_params.lo runtime/opal_cr.lo
> runtime/opal_info_support.lo runtime/opal_progress_threads.lo
> threads/condition.lo threads/mutex.lo threads/thread.lo
> dss/dss_internal_functions.lo dss/dss_compare.lo dss/dss_copy.lo
> dss/dss_dump.lo dss/dss_load_unload.lo dss/dss_lookup.lo dss/dss_pack.lo
> dss/dss_peek.lo dss/dss_print.lo dss/dss_register.lo dss/dss_unpack.lo
> dss/dss_open_close.lo asm/libasm.la datatype/libdatatype.la mca/base/
> libmca_base.la util/libopalutil.la  mca/allocator/libmca_allocator.la
> mca/backtrace/libmca_backtrace.la mca/backtrace/execinfo/
> libmca_backtrace_execinfo.la  mca/btl/libmca_btl.la  mca/compress/
> libmca_compress.la  mca/crs/libmca_crs.la  mca/dstore/libmca_dstore.la
> mca/event/libmca_event.la mca/event/libevent2022/
> libmca_event_libevent2022.la  mca/hwloc/libmca_hwloc.la
> mca/hwloc/hwloc191/libmca_hwloc_hwloc191.la  mca/if/libmca_if.la
> mca/if/posix_ipv4/libmca_if_posix_ipv4.la mca/if/linux_ipv6/
> libmca_if_linux_ipv6.la  mca/installdirs/libmca_installdirs.la
> mca/installdirs/config/libmca_installdirs_config.la mca/installdirs/env/
> libmca_installdirs_env.la  mca/memchecker/libmca_memchecker.la
> mca/memcpy/libmca_memcpy.la  mca/memory/libmca_memory.la mca/memory/linux/
> libmca_memory_linux.la  mca/mpool/libmca_mpool.la  mca/pmix/libmca_pmix.la
> mca/pstat/libmca_pstat.la  mca/rcache/libmca_rcache.la  mca/sec/
> libmca_sec.la  mca/shmem/libmca_shmem.la  mca/timer/libmca_timer.la
> mca/timer/linux/libmca_timer_linux.la  -lrt -lutil  -lltdl   -lrt -lutil
> -lltdl
> libtool: link: pgcc -shared  -fpic -DPIC  class/.libs/opal_bitmap.o
> class/.libs/opal_free_list.o class/.libs/opal_hash_table.o
> class/.libs/opal_hotel.o class/.libs/opal_tree.o class/.libs/opal_list.o
> class/.libs/opal_object.o class/.libs/opal_graph.o class/.lib

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
Ralph and I just chatted about this on the phone.

IANAL, but after talking through the license stuff, we think there will be new 
license issues caused by --disable-dlopen behavior.

It feels like there's a lot of unexpected issues coming up with (more-or-less) 
causing (most?) people to build with --disable-dlopen support:

- (probably?) larger libraries and process memory footprint
- wonky behavior on Cray/NERSC Hopper system (but perhaps Howard will solve 
that one?)
- after talking to Howard and Rolf today, might well need to (re)add 
--with-libltdl=DIR support (for libltdl installed in non-standard locations)
- difference in behavior between git clone builds (require libltdl by default) 
and production builds (build libltdl support or not)
- it seems that there are valid use cases where people want to add plugins to 
existing Open MPI installations

It might well be worth investigating manually embedding libltdl ourselves 
(i.e., git committing libltdl vs. having autogen copy it in).  The 
bootstrapping will be a bit different; Dave raised the point last week that 
it's not guaranteed that this will work -- would need to be investigated.




> On Feb 2, 2015, at 2:25 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Uuuurggghhh.
> 
> More below.
> 
> 
>> On Feb 2, 2015, at 1:04 PM, Ralph Castain  wrote:
>> 
>> Returning to the libltdl question: I think we may have a problem here. If we 
>> remove libltdl and default to disable-dlopen, then the user will - without 
>> warning - slurp all components that are able to build into libompi. This 
>> includes everything they specified, BUT because of our "build if you can" 
>> policy, it also includes a lot of stuff that they didn't specify and may not 
>> even realize is present.
> 
> Yes, this is true -- the size of libmpi.so (etc.) will actually go up.
> 
> 
> 
> It would be an interesting experiment to see if the process size actually 
> increases.  When you dlopen() a DSO, it's loaded into distinct pages -- even 
> components that are fairly small (e.g., mca_btl_self.so is 63726 bytes on my 
> system) are automatically inflated to be multiples of 4K.  When all the 
> components are packed into libmpi.so (etc), the end result is actually 
> smaller.
> 
> That being said, when built as DSOs, OMPI can (and likely does) dlclose 
> components that you don't use at run time.  You obviously can't do that when 
> all the components are in libmpi.so (etc.).  Meaning: there's forces pulling 
> both ways here -- I wonder whether users will typically grow or shrink their 
> process sizes...?
> 
> The answer may be an obvious "your process will grow", but it may not be.  If 
> someone has some spare cycles (hah!), this would be an interesting 
> experiment.  :-)
> 
> 
> 
> We've had these discussions before; the conclusion of which was to ensure 
> that we provide "--disable" and "--without" options for those people who know 
> exactly what they want, and don't want anything else.
> 
> So Ralph -- I hear the cautionary warning that you're raising.  Are the 
> --disable/--without options no longer viable?
> 
>> As a result, they not only will have a bloated memory footprint, but they 
>> also may very well have slurped in GPL libraries (e.g., if Slurm is present) 
>> that could potentially impact their legal situation. We may need to 
>> reconsider our build policy in light of this situation.
> 
> IANAL and all that.
> 
> If you're distributing binaries, my understanding is that this doesn't change 
> your legal situation.  I.e., if you're a) building an OMPI component that 
> links against GPL libraries, and then b) distributing those binaries, it 
> doesn't matter if you built the component as a DSO or as part of (for 
> example) libmpi.so.
> 
> -
> 
> All that being said, yes, removing our default model of plugins is a *big* 
> change.  There are many subtle issues involved (including those that Ralph 
> brought up in this mail).
> 
> If we want to keep this model (plugins by default), the only way I can think 
> of to do that is to manually embed libltdl ourselves.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
On Feb 2, 2015, at 5:24 PM, Jeff Squyres (jsquyres)  wrote:
> 
> IANAL, but after talking through the license stuff, we think there will be 
> new license issues caused by --disable-dlopen behavior.

ARRGH -- that should have been:

...we think there will be ***NO*** new license issues caused by 
--disable-dlopen behavior.

Sorry for any confusion!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Christopher Samuel
On 03/02/15 05:09, Ralph Castain wrote:

> Just out of curiosity: I see you are reporting about a build on the
> headnode of a BG cluster. We've never ported OMPI to BG - are you using
> it on such a system? Or were you just test building the code on a
> convenient server?

Just a convenient server with a not-so-mainstream architecture (and an
older RHEL release through necessity).  Sorry to get your hopes up! :-)

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
Jeff,

Looks like you didn't hit all the un-guarded references to lt_dladvise.
Specifically you missed a struct decl:

/[]/openmpi-libltdl-linux-x86_64-gcc/openmpi-gitclone/opal/util/lt_interface.c:25:8:
error: unknown type name 'lt_dladvise'

-Paul


On Sat, Jan 31, 2015 at 4:44 AM, Jeff Squyres (jsquyres)  wrote:

> Looks like the lt_interface.c code didn't properly use the lt_dladvise
> #if. How did that ever work, I wonder?
>
> Fixed now.  On to your second finding...
>
>
> > On Jan 30, 2015, at 7:42 PM, Paul Hargrove  wrote:
> >
> > Not meeting with the greatest of success.
> > This is a report of just the first (of at least 2) failure modes I am
> seeing.
> >
> > On a Scientific Linux 5.5. (RHEL-5.5 clone like CentOS) I get a build
> failure described below.
> > At least Solaris-11 and a few other linux systems (including RHAS-4.4)
> are also failing in what appears to be the same manner.
> > I am sure there are more, but I am aborting this round of testing at
> this point.
> >
> > I again await a new tarball with a less broken-by-default behavior.
> >
> > -Paul
> >
> >
> > The configure output includes
> > checking ltdl.h usability... yes
> > checking ltdl.h presence... yes
> > checking for ltdl.h... yes
> > looking for library without search path
> > checking for lt_dlopen in -lltdl... yes
> > checking for lt_dladvise_init... no
> > configure: WARNING: *
> > configure: WARNING: Could not find lt_dladvise_init in libltdl
> > configure: WARNING: This could mean that your libltdl version
> > configure: WARNING: is old.  If you could upgrade, that would be great.
> > configure: WARNING: *
> > checking for lt_dladvise... no
> >
> > However, it looks like opal/utill/lt_interface.c is still attempting to
> call lt_dladvise:
> > PGC-S-0040-Illegal use of symbol, lt_dladvise
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > PGC-W-0156-Type not specified, 'int' assumed
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
> >
> > The put of "libtool --version" says "1.5.22" and we have
> libltdl.so.3.1.4.
> > However, the rpm database is not readable, preventing me from checking a
> package version associated with the libltdl.
> >
> > The failing Solaris-11/x86-64 system says 1.5.22 without any ambiguity:
> > $ pkg info libltdl | grep Version
> >Version: 1.5.22
> >
> >
> > -Paul
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > New tarball posted (same location).  Now featuring 100% fewer "make
> check" failures.
> >
> > http://www.open-mpi.org/~jsquyres/unofficial/
> >
> >
> > > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > >
> > > Shame on me for not running "make check".
> > >
> > > Fixing...
> > >
> > >
> > >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove 
> wrote:
> > >>
> > >> Jeff,
> > >>
> > >> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> > >> It encountered the failure show below on "make check" at which point
> I decided not to test 60+ platforms.
> > >> Please advise how I should proceed (best guess is wait for a new
> tarball).
> > >>
> > >> -Paul
> > >>
> > >> Making check in test
> > >> Making check in support
> > >> make  libsupport.a
> > >>  CC   components.o
> > >>
> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
> fatal error: 'opal/libltdl/ltdl.h' file not found
> > >> #include "opal/libltdl/ltdl.h"
> > >> ^
> > >>
> > >>
> > >> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > >> On Jan 30, 2015, at 2:46 PM, Paul Hargrove 
> wrote:
> > >>>
> > >>> If I had new enough autotools to autogen on this old system then I
> wouldn't have asked about libltdl from libtool-1.4.  So, please *do*
> generate a tarball and I will test (on *all* of my systems).
> > >>
> > >> Sweet, thank you.  I just posted a tarball here:
> > >>
> > >>http://www.open-mpi.org/~jsquyres/unofficial/
> > >>
> > >> --
> > >> Jeff Squyres
> > >> jsquy...@cisco.com
> > >> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> > >>
> > >> ___
> > >> devel mailing list
> > >> de...@open-mpi.org
> > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16854.php
> > >>
> > >>
> > >>
> > >> --
> > >> Paul H. Hargrove  phhargr...@lbl.gov
> > >> Computer Languages & Systems Software (CLaSS) Group
> > >> Computer Science Department   Tel: +1-510

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
On Mon, Feb 2, 2015 at 1:58 PM, Paul Hargrove  wrote:

> 2b.  I am retrying now with all of Cray's environment modules unloaded
> except the one for the PGI compiler.  Nathan had suggested something like
> this to me in the past, but I've never had issues with the default
> environment.  I will report the result when available.



The result is unchanged after unloading all the Cray environment modules.

However, I did notice that configure found (for instance) ALPS support
despite my unloading all the Cray environment modules and included a
message recognizing the system as CLE4.   So, it is possible that unloading
the modules was insufficient to avoid the Cray-specific aspects of the
system.

HOWEVER - switching from PGI to GNU compilers made the problem go away.
So, I suspect it may be an issue with the installation/configuration of the
PGI compilers.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
I had fixed it in my local tree but not yet pushed to my github branch; I was 
waiting to see what happened w.r.t. your failure on the NERSC machine.

I pushed the fix up to my branch now; do you want a new tarball?


> On Feb 2, 2015, at 5:56 PM, Paul Hargrove  wrote:
> 
> Jeff,
> 
> Looks like you didn't hit all the un-guarded references to lt_dladvise.
> Specifically you missed a struct decl:
> 
> /[]/openmpi-libltdl-linux-x86_64-gcc/openmpi-gitclone/opal/util/lt_interface.c:25:8:
>  error: unknown type name 'lt_dladvise'
> 
> -Paul
> 
> 
> On Sat, Jan 31, 2015 at 4:44 AM, Jeff Squyres (jsquyres)  
> wrote:
> Looks like the lt_interface.c code didn't properly use the lt_dladvise #if. 
> How did that ever work, I wonder?
> 
> Fixed now.  On to your second finding...
> 
> 
> > On Jan 30, 2015, at 7:42 PM, Paul Hargrove  wrote:
> >
> > Not meeting with the greatest of success.
> > This is a report of just the first (of at least 2) failure modes I am 
> > seeing.
> >
> > On a Scientific Linux 5.5. (RHEL-5.5 clone like CentOS) I get a build 
> > failure described below.
> > At least Solaris-11 and a few other linux systems (including RHAS-4.4) are 
> > also failing in what appears to be the same manner.
> > I am sure there are more, but I am aborting this round of testing at this 
> > point.
> >
> > I again await a new tarball with a less broken-by-default behavior.
> >
> > -Paul
> >
> >
> > The configure output includes
> > checking ltdl.h usability... yes
> > checking ltdl.h presence... yes
> > checking for ltdl.h... yes
> > looking for library without search path
> > checking for lt_dlopen in -lltdl... yes
> > checking for lt_dladvise_init... no
> > configure: WARNING: *
> > configure: WARNING: Could not find lt_dladvise_init in libltdl
> > configure: WARNING: This could mean that your libltdl version
> > configure: WARNING: is old.  If you could upgrade, that would be great.
> > configure: WARNING: *
> > checking for lt_dladvise... no
> >
> > However, it looks like opal/utill/lt_interface.c is still attempting to 
> > call lt_dladvise:
> > PGC-S-0040-Illegal use of symbol, lt_dladvise 
> > (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> >  25)
> > PGC-W-0156-Type not specified, 'int' assumed 
> > (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> >  25)
> > PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
> >
> > The put of "libtool --version" says "1.5.22" and we have libltdl.so.3.1.4.
> > However, the rpm database is not readable, preventing me from checking a 
> > package version associated with the libltdl.
> >
> > The failing Solaris-11/x86-64 system says 1.5.22 without any ambiguity:
> > $ pkg info libltdl | grep Version
> >Version: 1.5.22
> >
> >
> > -Paul
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres) 
> >  wrote:
> > New tarball posted (same location).  Now featuring 100% fewer "make check" 
> > failures.
> >
> > http://www.open-mpi.org/~jsquyres/unofficial/
> >
> >
> > > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres)  
> > > wrote:
> > >
> > > Shame on me for not running "make check".
> > >
> > > Fixing...
> > >
> > >
> > >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove  wrote:
> > >>
> > >> Jeff,
> > >>
> > >> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> > >> It encountered the failure show below on "make check" at which point I 
> > >> decided not to test 60+ platforms.
> > >> Please advise how I should proceed (best guess is wait for a new 
> > >> tarball).
> > >>
> > >> -Paul
> > >>
> > >> Making check in test
> > >> Making check in support
> > >> make  libsupport.a
> > >>  CC   components.o
> > >> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
> > >>  fatal error: 'opal/libltdl/ltdl.h' file not found
> > >> #include "opal/libltdl/ltdl.h"
> > >> ^
> > >>
> > >>
> > >> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) 
> > >>  wrote:
> > >> On Jan 30, 2015, at 2:46 PM, Paul Hargrove  wrote:
> > >>>
> > >>> If I had new enough autotools to autogen on this old system then I 
> > >>> wouldn't have asked about libltdl from libtool-1.4.  So, please *do* 
> > >>> generate a tarball and I will test (on *all* of my systems).
> > >>
> > >> Sweet, thank you.  I just posted a tarball here:
> > >>
> > >>http://www.open-mpi.org/~jsquyres/unofficial/
> > >>
> > >> --
> > >> Jeff Squyres
> > >> jsquy...@cisco.com
> > >> For corporate legal information go to: 
> > >> http://www.cisco.com/web/about/doing_business/legal/cri/
> > >>
> > >> ___
> > >> devel mailing list
> > >> de...@open-mpi.org
> > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cg

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
Jeff,

If you are still chasing the goal of getting this branch to "just work",
then I am willing to keep testing.  Let me know when a new tarball is ready
and I'll give it a run on all of my systems.

-Paul

On Mon, Feb 2, 2015 at 4:15 PM, Jeff Squyres (jsquyres) 
wrote:

> I had fixed it in my local tree but not yet pushed to my github branch; I
> was waiting to see what happened w.r.t. your failure on the NERSC machine.
>
> I pushed the fix up to my branch now; do you want a new tarball?
>
>
> > On Feb 2, 2015, at 5:56 PM, Paul Hargrove  wrote:
> >
> > Jeff,
> >
> > Looks like you didn't hit all the un-guarded references to lt_dladvise.
> > Specifically you missed a struct decl:
> >
> >
> /[]/openmpi-libltdl-linux-x86_64-gcc/openmpi-gitclone/opal/util/lt_interface.c:25:8:
> error: unknown type name 'lt_dladvise'
> >
> > -Paul
> >
> >
> > On Sat, Jan 31, 2015 at 4:44 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > Looks like the lt_interface.c code didn't properly use the lt_dladvise
> #if. How did that ever work, I wonder?
> >
> > Fixed now.  On to your second finding...
> >
> >
> > > On Jan 30, 2015, at 7:42 PM, Paul Hargrove  wrote:
> > >
> > > Not meeting with the greatest of success.
> > > This is a report of just the first (of at least 2) failure modes I am
> seeing.
> > >
> > > On a Scientific Linux 5.5. (RHEL-5.5 clone like CentOS) I get a build
> failure described below.
> > > At least Solaris-11 and a few other linux systems (including RHAS-4.4)
> are also failing in what appears to be the same manner.
> > > I am sure there are more, but I am aborting this round of testing at
> this point.
> > >
> > > I again await a new tarball with a less broken-by-default behavior.
> > >
> > > -Paul
> > >
> > >
> > > The configure output includes
> > > checking ltdl.h usability... yes
> > > checking ltdl.h presence... yes
> > > checking for ltdl.h... yes
> > > looking for library without search path
> > > checking for lt_dlopen in -lltdl... yes
> > > checking for lt_dladvise_init... no
> > > configure: WARNING: *
> > > configure: WARNING: Could not find lt_dladvise_init in libltdl
> > > configure: WARNING: This could mean that your libltdl version
> > > configure: WARNING: is old.  If you could upgrade, that would be great.
> > > configure: WARNING: *
> > > checking for lt_dladvise... no
> > >
> > > However, it looks like opal/utill/lt_interface.c is still attempting
> to call lt_dladvise:
> > > PGC-S-0040-Illegal use of symbol, lt_dladvise
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > > PGC-W-0156-Type not specified, 'int' assumed
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > > PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
> > >
> > > The put of "libtool --version" says "1.5.22" and we have
> libltdl.so.3.1.4.
> > > However, the rpm database is not readable, preventing me from checking
> a package version associated with the libltdl.
> > >
> > > The failing Solaris-11/x86-64 system says 1.5.22 without any ambiguity:
> > > $ pkg info libltdl | grep Version
> > >Version: 1.5.22
> > >
> > >
> > > -Paul
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > New tarball posted (same location).  Now featuring 100% fewer "make
> check" failures.
> > >
> > > http://www.open-mpi.org/~jsquyres/unofficial/
> > >
> > >
> > > > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > >
> > > > Shame on me for not running "make check".
> > > >
> > > > Fixing...
> > > >
> > > >
> > > >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove 
> wrote:
> > > >>
> > > >> Jeff,
> > > >>
> > > >> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> > > >> It encountered the failure show below on "make check" at which
> point I decided not to test 60+ platforms.
> > > >> Please advise how I should proceed (best guess is wait for a new
> tarball).
> > > >>
> > > >> -Paul
> > > >>
> > > >> Making check in test
> > > >> Making check in support
> > > >> make  libsupport.a
> > > >>  CC   components.o
> > > >>
> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
> fatal error: 'opal/libltdl/ltdl.h' file not found
> > > >> #include "opal/libltdl/ltdl.h"
> > > >> ^
> > > >>
> > > >>
> > > >> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > >> On Jan 30, 2015, at 2:46 PM, Paul Hargrove 
> wrote:
> > > >>>
> > > >>> If I had new enough autotools to autogen on this old system then I
> wouldn't have asked about libltdl from libtool-1.4.  So, please *do*
> generate a tarball and I will test (on *all* of 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
Paul --

If you've got the cycles and it's easy, release the hounds on the tarball that 
I just uploaded to:

http://www.open-mpi.org/~jsquyres/unofficial/

Thanks!


> On Feb 2, 2015, at 7:19 PM, Paul Hargrove  wrote:
> 
> Jeff,
> 
> If you are still chasing the goal of getting this branch to "just work", then 
> I am willing to keep testing.  Let me know when a new tarball is ready and 
> I'll give it a run on all of my systems.
> 
> -Paul
> 
> On Mon, Feb 2, 2015 at 4:15 PM, Jeff Squyres (jsquyres)  
> wrote:
> I had fixed it in my local tree but not yet pushed to my github branch; I was 
> waiting to see what happened w.r.t. your failure on the NERSC machine.
> 
> I pushed the fix up to my branch now; do you want a new tarball?
> 
> 
> > On Feb 2, 2015, at 5:56 PM, Paul Hargrove  wrote:
> >
> > Jeff,
> >
> > Looks like you didn't hit all the un-guarded references to lt_dladvise.
> > Specifically you missed a struct decl:
> >
> > /[]/openmpi-libltdl-linux-x86_64-gcc/openmpi-gitclone/opal/util/lt_interface.c:25:8:
> >  error: unknown type name 'lt_dladvise'
> >
> > -Paul
> >
> >
> > On Sat, Jan 31, 2015 at 4:44 AM, Jeff Squyres (jsquyres) 
> >  wrote:
> > Looks like the lt_interface.c code didn't properly use the lt_dladvise #if. 
> > How did that ever work, I wonder?
> >
> > Fixed now.  On to your second finding...
> >
> >
> > > On Jan 30, 2015, at 7:42 PM, Paul Hargrove  wrote:
> > >
> > > Not meeting with the greatest of success.
> > > This is a report of just the first (of at least 2) failure modes I am 
> > > seeing.
> > >
> > > On a Scientific Linux 5.5. (RHEL-5.5 clone like CentOS) I get a build 
> > > failure described below.
> > > At least Solaris-11 and a few other linux systems (including RHAS-4.4) 
> > > are also failing in what appears to be the same manner.
> > > I am sure there are more, but I am aborting this round of testing at this 
> > > point.
> > >
> > > I again await a new tarball with a less broken-by-default behavior.
> > >
> > > -Paul
> > >
> > >
> > > The configure output includes
> > > checking ltdl.h usability... yes
> > > checking ltdl.h presence... yes
> > > checking for ltdl.h... yes
> > > looking for library without search path
> > > checking for lt_dlopen in -lltdl... yes
> > > checking for lt_dladvise_init... no
> > > configure: WARNING: *
> > > configure: WARNING: Could not find lt_dladvise_init in libltdl
> > > configure: WARNING: This could mean that your libltdl version
> > > configure: WARNING: is old.  If you could upgrade, that would be great.
> > > configure: WARNING: *
> > > checking for lt_dladvise... no
> > >
> > > However, it looks like opal/utill/lt_interface.c is still attempting to 
> > > call lt_dladvise:
> > > PGC-S-0040-Illegal use of symbol, lt_dladvise 
> > > (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> > >  25)
> > > PGC-W-0156-Type not specified, 'int' assumed 
> > > (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> > >  25)
> > > PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
> > >
> > > The put of "libtool --version" says "1.5.22" and we have libltdl.so.3.1.4.
> > > However, the rpm database is not readable, preventing me from checking a 
> > > package version associated with the libltdl.
> > >
> > > The failing Solaris-11/x86-64 system says 1.5.22 without any ambiguity:
> > > $ pkg info libltdl | grep Version
> > >Version: 1.5.22
> > >
> > >
> > > -Paul
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres) 
> > >  wrote:
> > > New tarball posted (same location).  Now featuring 100% fewer "make 
> > > check" failures.
> > >
> > > http://www.open-mpi.org/~jsquyres/unofficial/
> > >
> > >
> > > > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres) 
> > > >  wrote:
> > > >
> > > > Shame on me for not running "make check".
> > > >
> > > > Fixing...
> > > >
> > > >
> > > >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove  wrote:
> > > >>
> > > >> Jeff,
> > > >>
> > > >> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> > > >> It encountered the failure show below on "make check" at which point I 
> > > >> decided not to test 60+ platforms.
> > > >> Please advise how I should proceed (best guess is wait for a new 
> > > >> tarball).
> > > >>
> > > >> -Paul
> > > >>
> > > >> Making check in test
> > > >> Making check in support
> > > >> make  libsupport.a
> > > >>  CC   components.o
> > > >> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
> > > >>  fatal error: 'opal/libltdl/ltdl.h' file not found
> > > >> #include "opal/libltdl/ltdl.h"
> > > >> ^
> > > >>
> > > >>
> > > >> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) 
> > > >>  wrote:
> 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
Jeff,

Having already pointed my script at your tarball's URL, typing
"./test-ompi" releases about 60 "hounds".  I get an email for each system
as it's tests complete, and gmail filters tag only the ones where one or
more configurations failed.  So, the overhead for me is pretty small as
long as the number of failures is kept low.

I'll report what I find, but at this point I am expecting only the Cray+PGI
issue we know about.  I am under the impression that you've fixed
everything else I had reported.

-Paul

On Mon, Feb 2, 2015 at 5:05 PM, Jeff Squyres (jsquyres) 
wrote:

> Paul --
>
> If you've got the cycles and it's easy, release the hounds on the tarball
> that I just uploaded to:
>
> http://www.open-mpi.org/~jsquyres/unofficial/
>
> Thanks!
>
>
> > On Feb 2, 2015, at 7:19 PM, Paul Hargrove  wrote:
> >
> > Jeff,
> >
> > If you are still chasing the goal of getting this branch to "just work",
> then I am willing to keep testing.  Let me know when a new tarball is ready
> and I'll give it a run on all of my systems.
> >
> > -Paul
> >
> > On Mon, Feb 2, 2015 at 4:15 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > I had fixed it in my local tree but not yet pushed to my github branch;
> I was waiting to see what happened w.r.t. your failure on the NERSC machine.
> >
> > I pushed the fix up to my branch now; do you want a new tarball?
> >
> >
> > > On Feb 2, 2015, at 5:56 PM, Paul Hargrove  wrote:
> > >
> > > Jeff,
> > >
> > > Looks like you didn't hit all the un-guarded references to lt_dladvise.
> > > Specifically you missed a struct decl:
> > >
> > >
> /[]/openmpi-libltdl-linux-x86_64-gcc/openmpi-gitclone/opal/util/lt_interface.c:25:8:
> error: unknown type name 'lt_dladvise'
> > >
> > > -Paul
> > >
> > >
> > > On Sat, Jan 31, 2015 at 4:44 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > Looks like the lt_interface.c code didn't properly use the lt_dladvise
> #if. How did that ever work, I wonder?
> > >
> > > Fixed now.  On to your second finding...
> > >
> > >
> > > > On Jan 30, 2015, at 7:42 PM, Paul Hargrove 
> wrote:
> > > >
> > > > Not meeting with the greatest of success.
> > > > This is a report of just the first (of at least 2) failure modes I
> am seeing.
> > > >
> > > > On a Scientific Linux 5.5. (RHEL-5.5 clone like CentOS) I get a
> build failure described below.
> > > > At least Solaris-11 and a few other linux systems (including
> RHAS-4.4) are also failing in what appears to be the same manner.
> > > > I am sure there are more, but I am aborting this round of testing at
> this point.
> > > >
> > > > I again await a new tarball with a less broken-by-default behavior.
> > > >
> > > > -Paul
> > > >
> > > >
> > > > The configure output includes
> > > > checking ltdl.h usability... yes
> > > > checking ltdl.h presence... yes
> > > > checking for ltdl.h... yes
> > > > looking for library without search path
> > > > checking for lt_dlopen in -lltdl... yes
> > > > checking for lt_dladvise_init... no
> > > > configure: WARNING: *
> > > > configure: WARNING: Could not find lt_dladvise_init in libltdl
> > > > configure: WARNING: This could mean that your libltdl version
> > > > configure: WARNING: is old.  If you could upgrade, that would be
> great.
> > > > configure: WARNING: *
> > > > checking for lt_dladvise... no
> > > >
> > > > However, it looks like opal/utill/lt_interface.c is still attempting
> to call lt_dladvise:
> > > > PGC-S-0040-Illegal use of symbol, lt_dladvise
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > > > PGC-W-0156-Type not specified, 'int' assumed
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > > > PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
> > > >
> > > > The put of "libtool --version" says "1.5.22" and we have
> libltdl.so.3.1.4.
> > > > However, the rpm database is not readable, preventing me from
> checking a package version associated with the libltdl.
> > > >
> > > > The failing Solaris-11/x86-64 system says 1.5.22 without any
> ambiguity:
> > > > $ pkg info libltdl | grep Version
> > > >Version: 1.5.22
> > > >
> > > >
> > > > -Paul
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > > New tarball posted (same location).  Now featuring 100% fewer "make
> check" failures.
> > > >
> > > > http://www.open-mpi.org/~jsquyres/unofficial/
> > > >
> > > >
> > > > > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > > >
> > > > > Shame on me for not running "make check".
> > > > >
> > > > > Fixing...
> > > > >
> > > > >
> > > > >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove 
> wrote:
> > > > >>
> > > > >> J

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
On Mon, Feb 2, 2015 at 4:13 PM, Paul Hargrove  wrote:

> HOWEVER - switching from PGI to GNU compilers made the problem go away.
> So, I suspect it may be an issue with the installation/configuration of
> the PGI compilers.
>


I've reproduced the problem on a non-Cray system with four different
installations of the PGI compilers.
The system has PGI 10.x and 11.x installed by the sys admins.
It also has my private installs of 9.x and 12.x, which I know were
installed with just the defaults.

I'll report my test results more completely later, but all 4 PGI-based
builds I have results for so far have failed with libtool replacing
"-lltdl" in  link command line with "/usr/lib/libltdl.so" rather than the
correct "/usr/lib64/libltdl.so".

So, this is a PGI compiler issue not a Cray one.
Will know later is "PGI" needs to be replaced with "non-GNU"

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] Build failure on OpenBSD (deja vu)

2015-02-02 Thread Paul Hargrove
The following comes from testing Jeff's no-embedded-libltdl work, but I
suspect the same is true on tru^H^H^Hmaster.

The output below, from "make V=1" shows a link failure from trying to use
arc4random_addrandom(), which was removed on OpenBSD in late 2013.

The part that bugs me is that I thought Ralph had fixed this in v1.8
already!
See https://svn.open-mpi.org/trac/ompi/ticket/4829

FYI: The warnings are just the standard Open BSD paranoia and don't
indicate any real problems.

-Paul

/bin/sh ../../../libtool  --tag=CC--mode=link gcc -std=gnu99  -g
-finline-functions -fno-strict-aliasing -pthread-o opal_wrapper
opal_wrapper.o ../../../opal/libopen-pal.la -lm -lutil   -lm -lutil
libtool: link: gcc -std=gnu99 -g -finline-functions -fno-strict-aliasing
-pthread -o .libs/opal_wrapper opal_wrapper.o  -L../../../opal/.libs
-lopen-pal -lpthread -lm -lutil -pthread
-Wl,-rpath,/home/phargrov/OMPI/openmpi-libltdl-openbsd5-amd64/INST/lib
../../../opal/.libs/libopen-pal.so.0.0: warning: vsprintf() is often
misused, please use vsnprintf()
../../../opal/.libs/libopen-pal.so.0.0: warning: strcpy() is almost always
misused, please use strlcpy()
../../../opal/.libs/libopen-pal.so.0.0: warning: random() isn't random;
consider using arc4random()
../../../opal/.libs/libopen-pal.so.0.0: warning: strcat() is almost always
misused, please use strlcat()
../../../opal/.libs/libopen-pal.so.0.0: warning: sprintf() is often
misused, please use snprintf()
../../../opal/.libs/libopen-pal.so.0.0: undefined reference to
`arc4random_addrandom'
collect2: ld returned 1 exit status




-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
On Mon, Feb 2, 2015 at 5:22 PM, Paul Hargrove  wrote:

> So, the overhead for me is pretty small as long as the number of failures
> is kept low.


I jinxed it!!!

I have, I believe, about 7 different failures now on various systems.
All of those appear UNRELATED to the libltdl changes.

I went ahead and reported the OpenBSD/arc4random issue, since it appears to
be a regression of something I reported against v1.8 6 months ago.
However, for the other issues I've encountered I am going to re-run against
a trunk tarball before reporting (to avoid wasting my time and yours).

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] When libltdl is not your friend

2015-02-02 Thread Paul Hargrove
Below is one example of what happens when you assume that you can trust the
libltdl installed an otherwise very well maintained national center.  I
think this is another "vote" for continuing to embed (a working) libltdl.

-Paul

$ mpirun -mca btl sm,self -np 2 examples/ring_c'
libibverbs: Warning: no userspace device-specific driver found for
/sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for
/sys/class/infiniband_verbs/uverbs1
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_loadleveler:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_loadleveler.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_rank_file:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_rank_file.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama:
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama.so:
failed to map segment from shared object: Cannot allocate memory (ignored)
[cvrsvc03:25777] mca: base: component_find: unable to open
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist:
/global/homes/h/hargrove/GSCRATCH/OMPI/ope

Re: [OMPI devel] When libltdl is not your friend

2015-02-02 Thread Howard Pritchard
Hi Paul,

Thanks for checking in depth into this.  Just to help in determining how to
proceed, which national center is this?

Howard


2015-02-02 19:35 GMT-07:00 Paul Hargrove :

> Below is one example of what happens when you assume that you can trust
> the libltdl installed an otherwise very well maintained national center.  I
> think this is another "vote" for continuing to embed (a working) libltdl.
>
> -Paul
>
> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
> libibverbs: Warning: no userspace device-specific driver found for
> /sys/class/infiniband_verbs/uverbs2
> libibverbs: Warning: no userspace device-specific driver found for
> /sys/class/infiniband_verbs/uverbs1
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_loadleveler:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_loadleveler.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_rank_file:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_rank_file.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm.so:
> failed to map segment from shared object: Cannot allocate memory (ignored)
> [cvrsvc03:25777] mca: base: component_find: unable to open
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama:
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama

Re: [OMPI devel] When libltdl is not your friend

2015-02-02 Thread Paul Hargrove
Howard,

This was seen on NERSC's Carver.

-Paul

On Mon, Feb 2, 2015 at 6:49 PM, Howard Pritchard 
wrote:

> Hi Paul,
>
> Thanks for checking in depth into this.  Just to help in determining how
> to proceed, which national center is this?
>
> Howard
>
>
> 2015-02-02 19:35 GMT-07:00 Paul Hargrove :
>
>> Below is one example of what happens when you assume that you can trust
>> the libltdl installed an otherwise very well maintained national center.  I
>> think this is another "vote" for continuing to embed (a working) libltdl.
>>
>> -Paul
>>
>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>> libibverbs: Warning: no userspace device-specific driver found for
>> /sys/class/infiniband_verbs/uverbs2
>> libibverbs: Warning: no userspace device-specific driver found for
>> /sys/class/infiniband_verbs/uverbs1
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_loadleveler:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_loadleveler.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_rank_file:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_rank_file.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm:
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm.so:
>> failed to map segment from shared object: Cannot allocate memory (ignored)
>> [cvrsvc03:25777] mca: base: component_find: unable to open
>> /global/homes/h/hargrove/GSCRATCH/OMPI/o

[OMPI devel] Master build failure on Mac OS 10.8 with --enable-static/--disable-shared

2015-02-02 Thread Paul Hargrove
I have a Mac OSX 10.8 system, where cc is clang.
I have no problems with a default build from the current master tarball.
However, a static-only build leads to a link failure on opal_wrapper.

Configured with
  --prefix=... --enable-debug CC=cc CXX=c++ --enable-static --disable-shared

Failing portion of "make V=1":

/bin/sh ../../../libtool  --tag=CC   --mode=link cc  -g -finline-functions
-fno-strict-aliasing   -export-dynamic-o opal_wrapper opal_wrapper.o
../../../opal/libopen-pal.la
libtool: link: cc -g -finline-functions -fno-strict-aliasing -o
opal_wrapper opal_wrapper.o  ../../../opal/.libs/libopen-pal.a -lm
Undefined symbols for architecture x86_64:
  "_opal_pmix", referenced from:
  _opal_get_proc_hostname in libopen-pal.a(proc.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see
invocation)

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] Master failure building oshmem java examples

2015-02-02 Thread Paul Hargrove
On a system on which 1.8.4rc5 passed all my tests, I see the following
running "make" in the examples directory:

[...]
make[2]: Leaving directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: Entering directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: *** No rule to make target `Hello_oshmem.java', needed by
`Hello_oshmem.class'.
make[2]: Target `Hello_oshmem.class' not remade because of errors.
make[2]: Leaving directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: Entering directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: *** No rule to make target `Ring_oshmem.java', needed by
`Ring_oshmem.class'.
make[2]: Target `Ring_oshmem.class' not remade because of errors.
make[2]: Leaving directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: Entering directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: *** No rule to make target `oshmem_circular_shift.java', needed by
`oshmem_circular_shift.class'.
make[2]: Target `oshmem_circular_shift.class' not remade because of errors.
make[2]: Leaving directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: Entering directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: *** No rule to make target `oshmem_max_reduction.java', needed by
`oshmem_max_reduction.class'.
make[2]: Target `oshmem_max_reduction.class' not remade because of errors.
make[2]: Leaving directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: Entering directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: *** No rule to make target `oshmem_strided_puts.java', needed by
`oshmem_strided_puts.class'.
make[2]: Target `oshmem_strided_puts.class' not remade because of errors.
make[2]: Leaving directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: Entering directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: *** No rule to make target `oshmem_symmetric_data.java', needed by
`oshmem_symmetric_data.class'.
make[2]: Target `oshmem_symmetric_data.class' not remade because of errors.
make[2]: Leaving directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[1]: *** [oshmem] Error 2
make[1]: Leaving directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make: *** [all] Error 2

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] When libltdl is not your friend

2015-02-02 Thread Paul Hargrove
It looks like I was too quick to blame libltdl.
A build of the current 'master' tarball on the same system and identical
configure arguments fails as seen below.

While the failure is not identical, it is also a out-of-memory error.
I am currently assuming that an rlimit has been lowered on this system
since the last time I tested there (1.8.4rc5, I believe).

-Paul

--
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  cvrsvc03
  System call: mmap(2)
  Error:   Cannot allocate memory (errno 12)
--
[cvrsvc03:19412] create_and_attach: unable to create shared memory BTL
coordinating structure :: size 134217728
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[30315,1],0]) is on host: cvrsvc03
  Process 2 ([[30315,1],1]) is on host: cvrsvc03
  BTLs attempted: self

Your MPI job is now going to abort; sorry.

On Mon, Feb 2, 2015 at 7:01 PM, Paul Hargrove  wrote:

> Howard,
>
> This was seen on NERSC's Carver.
>
> -Paul
>
> On Mon, Feb 2, 2015 at 6:49 PM, Howard Pritchard 
> wrote:
>
>> Hi Paul,
>>
>> Thanks for checking in depth into this.  Just to help in determining how
>> to proceed, which national center is this?
>>
>> Howard
>>
>>
>> 2015-02-02 19:35 GMT-07:00 Paul Hargrove :
>>
>>> Below is one example of what happens when you assume that you can trust
>>> the libltdl installed an otherwise very well maintained national center.  I
>>> think this is another "vote" for continuing to embed (a working) libltdl.
>>>
>>> -Paul
>>>
>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>>> libibverbs: Warning: no userspace device-specific driver found for
>>> /sys/class/infiniband_verbs/uverbs2
>>> libibverbs: Warning: no userspace device-specific driver found for
>>> /sys/class/infiniband_verbs/uverbs1
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_loadleveler:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_loadleveler.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr.so:
>>> failed to map segment from shar

[OMPI devel] Master assert failure on Linux/PPC64

2015-02-02 Thread Paul Hargrove
On a Linux/PPC64 system I see the failure below from a build of the current
master tarball.
This build was configured with
   --prefix=... --enable-debug \
  CFLAGS=-m64 --with-wrapper-cflags=-m64 \
  CXXFLAGS=-m64 --with-wrapper-cxxflags=-m64 \
  FCFLAGS=-m64 --with-wrapper-fcflags=-m64

I am not sure if putting "-m64" in both the *FLAGS and wrapper flags is
required, but am confident the error is unrelated.

-Paul

$ mpirun -mca btl sm,self -np 2 examples/ring_c'
[pcp-k-422:08534] mca: base: components_open: component coll / libnbc open
function failed
ring_c:
/home/phargrov/OMPI/openmpi-master-linux-ppc64/openmpi-dev-803-g5919b63/ompi/mca/coll/libnbc/coll_libnbc_component.c:118:
libnbc_close: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) ==
((opal_object_t *)
(&mca_coll_libnbc_component.active_requests))->obj_magic_id' failed.
[pcp-k-422:08534] *** Process received signal ***
[pcp-k-422:08534] Signal: Aborted (6)
[pcp-k-422:08534] Signal code:  (-6)
[pcp-k-422:08534] [ 0] [0x3fff8bd90478]
[pcp-k-422:08534] [ 1] /lib64/libc.so.6(gsignal-0x155030)[0x3fff8b9fc510]
[pcp-k-422:08534] [ 2] /lib64/libc.so.6(abort-0x150094)[0x3fff8ba01be4]
[pcp-k-422:08534] [ 3] /lib64/libc.so.6(+0x572ac)[0x3fff8b9f22ac]
[pcp-k-422:08534] [ 4]
/lib64/libc.so.6(__assert_fail-0x15ddac)[0x3fff8b9f239c]
[pcp-k-422:08534] [ 5]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/openmpi/mca_coll_libnbc.so(+0x9088)[0x3fff8a190088]
[pcp-k-422:08534] [ 6]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_component_close-0xed5e8)[0x3fff8b758308]
[pcp-k-422:08534] [ 7]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(+0xa9c5c)[0x3fff8b757c5c]
[pcp-k-422:08534] [ 8]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_components_open-0xee088)[0x3fff8b757778]
[pcp-k-422:08534] [ 9]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_open-0xdc3f8)[0x3fff8b76a620]
[pcp-k-422:08534] [10]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(ompi_mpi_init-0x12d5fc)[0x3fff8bc33d14]
[pcp-k-422:08534] [11]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(MPI_Init-0xe4734)[0x3fff8bc821bc]
[pcp-k-422:08534] [12] examples/ring_c[0x1a20]
[pcp-k-422:08534] [13] /lib64/libc.so.6(+0x47b6c)[0x3fff8b9e2b6c]
[pcp-k-422:08534] [14]
/lib64/libc.so.6(__libc_start_main-0x16caf8)[0x3fff8b9e2d98]
[pcp-k-422:08534] *** End of error message ***
[pcp-k-422:08535] mca: base: components_open: component coll / libnbc open
function failed
ring_c:
/home/phargrov/OMPI/openmpi-master-linux-ppc64/openmpi-dev-803-g5919b63/ompi/mca/coll/libnbc/coll_libnbc_component.c:118:
libnbc_close: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) ==
((opal_object_t *)
(&mca_coll_libnbc_component.active_requests))->obj_magic_id' failed.
[pcp-k-422:08535] *** Process received signal ***
[pcp-k-422:08535] Signal: Aborted (6)
[pcp-k-422:08535] Signal code:  (-6)
[pcp-k-422:08535] [ 0] [0x3fff99e30478]
[pcp-k-422:08535] [ 1] /lib64/libc.so.6(gsignal-0x155030)[0x3fff99a9c510]
[pcp-k-422:08535] [ 2] /lib64/libc.so.6(abort-0x150094)[0x3fff99aa1be4]
[pcp-k-422:08535] [ 3] /lib64/libc.so.6(+0x572ac)[0x3fff99a922ac]
[pcp-k-422:08535] [ 4]
/lib64/libc.so.6(__assert_fail-0x15ddac)[0x3fff99a9239c]
[pcp-k-422:08535] [ 5]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/openmpi/mca_coll_libnbc.so(+0x9088)[0x3fff98230088]
[pcp-k-422:08535] [ 6]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_component_close-0xed5e8)[0x3fff997f8308]
[pcp-k-422:08535] [ 7]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(+0xa9c5c)[0x3fff997f7c5c]
[pcp-k-422:08535] [ 8]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_components_open-0xee088)[0x3fff997f7778]
[pcp-k-422:08535] [ 9]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_open-0xdc3f8)[0x3fff9980a620]
[pcp-k-422:08535] [10]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(ompi_mpi_init-0x12d5fc)[0x3fff99cd3d14]
[pcp-k-422:08535] [11]
/home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(MPI_Init-0xe4734)[0x3fff99d221bc]
[pcp-k-422:08535] [12] examples/ring_c[0x1a20]
[pcp-k-422:08535] [13] /lib64/libc.so.6(+0x47b6c)[0x3fff99a82b6c]
[pcp-k-422:08535] [14]
/lib64/libc.so.6(__libc_start_main-0x16caf8)[0x3fff99a82d98]
[pcp-k-422:08535] *** End of error message ***
--
mpirun noticed that process rank 1 with PID 0 on node pcp-k-422 exited on
signal 6 (Aborted).
--



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National

[OMPI devel] Master build failure w/ Solaris Studio 12.3 on Linux/x86-64

2015-02-02 Thread Paul Hargrove
On a Linux/x86-64 system I am using the Solaris Studio 12.3 compilers.
I have configured the current master tarball as follows:
  --prefix=... --enable-debug \
  CC=cc CXX=CC FC=f90 \
  CXXFLAGS='-L/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu
-library=stlport4' \
  --with-wrapper-cxxflags='-L/lib/x86_64-linux-gnu
-L/usr/lib/x86_64-linux-gnu -library=stlport4' \
  --enable-mpi-cxx --enable-mpi-java

When building Open MPI I see (from "make V=1"):

libtool: compile:  cc -DHAVE_CONFIG_H -I.
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/opal/mca/common/libfabric
-I../../../../opal/include -I../../../../ompi/include
-I../../../../oshmem/include
-I../../../../opal/mca/common/libfabric/libfabric
-I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen
-I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric/include
-D_GNU_SOURCE
-DSYSCONFDIR=\"/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/INST/etc\"
-DRDMADIR=\"/tmp\"
-DEXTDIR=\"/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/INST/lib/openmpi\"
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63
-I../../../..
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/opal/include
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/orte/include
-I../../../../orte/include
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/ompi/include
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/oshmem/include
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/opal/mca/hwloc/hwloc191/hwloc/include
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/BLD/opal/mca/hwloc/hwloc191/hwloc/include
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/opal/mca/event/libevent2022/libevent
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/opal/mca/event/libevent2022/libevent/include
-I/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/BLD/opal/mca/event/libevent2022/libevent/include
-g -mt -c
/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric/src/fabric.c
 -KPIC -DPIC -o libfabric/src/.libs/libmca_common_libfabric_la-fabric.o
Bad seg kind in yFinishObjectCode()
cc: acomp failed for
/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u3/openmpi-dev-803-g5919b63/opal/mca/common/libfabric/libfabric/src/fabric.c

I have no clue where yFinishObjectCode() comes from.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] Build failure on OpenBSD (deja vu)

2015-02-02 Thread Ralph Castain
I see what happened - we upgraded libevent not that long ago, and I tried
to catch all the OMPI-committed changes to it. However, I appear to have
missed this one.

I'll fix it now. Sorry about that...
Ralph


On Mon, Feb 2, 2015 at 6:11 PM, Paul Hargrove  wrote:

> The following comes from testing Jeff's no-embedded-libltdl work, but I
> suspect the same is true on tru^H^H^Hmaster.
>
> The output below, from "make V=1" shows a link failure from trying to use
> arc4random_addrandom(), which was removed on OpenBSD in late 2013.
>
> The part that bugs me is that I thought Ralph had fixed this in v1.8
> already!
> See https://svn.open-mpi.org/trac/ompi/ticket/4829
>
> FYI: The warnings are just the standard Open BSD paranoia and don't
> indicate any real problems.
>
> -Paul
>
> /bin/sh ../../../libtool  --tag=CC--mode=link gcc -std=gnu99  -g
> -finline-functions -fno-strict-aliasing -pthread-o opal_wrapper
> opal_wrapper.o ../../../opal/libopen-pal.la -lm -lutil   -lm -lutil
> libtool: link: gcc -std=gnu99 -g -finline-functions -fno-strict-aliasing
> -pthread -o .libs/opal_wrapper opal_wrapper.o  -L../../../opal/.libs
> -lopen-pal -lpthread -lm -lutil -pthread
> -Wl,-rpath,/home/phargrov/OMPI/openmpi-libltdl-openbsd5-amd64/INST/lib
> ../../../opal/.libs/libopen-pal.so.0.0: warning: vsprintf() is often
> misused, please use vsnprintf()
> ../../../opal/.libs/libopen-pal.so.0.0: warning: strcpy() is almost always
> misused, please use strlcpy()
> ../../../opal/.libs/libopen-pal.so.0.0: warning: random() isn't random;
> consider using arc4random()
> ../../../opal/.libs/libopen-pal.so.0.0: warning: strcat() is almost always
> misused, please use strlcat()
> ../../../opal/.libs/libopen-pal.so.0.0: warning: sprintf() is often
> misused, please use snprintf()
> ../../../opal/.libs/libopen-pal.so.0.0: undefined reference to
> `arc4random_addrandom'
> collect2: ld returned 1 exit status
>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/16894.php
>


Re: [OMPI devel] Master failure building oshmem java examples

2015-02-02 Thread Ralph Castain
Sigh...someone forgot to add those examples to the tarball. Fixing now.


On Mon, Feb 2, 2015 at 7:15 PM, Paul Hargrove  wrote:

> On a system on which 1.8.4rc5 passed all my tests, I see the following
> running "make" in the examples directory:
>
> [...]
> make[2]: Leaving directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: Entering directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: *** No rule to make target `Hello_oshmem.java', needed by
> `Hello_oshmem.class'.
> make[2]: Target `Hello_oshmem.class' not remade because of errors.
> make[2]: Leaving directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: Entering directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: *** No rule to make target `Ring_oshmem.java', needed by
> `Ring_oshmem.class'.
> make[2]: Target `Ring_oshmem.class' not remade because of errors.
> make[2]: Leaving directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: Entering directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: *** No rule to make target `oshmem_circular_shift.java', needed
> by `oshmem_circular_shift.class'.
> make[2]: Target `oshmem_circular_shift.class' not remade because of errors.
> make[2]: Leaving directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: Entering directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: *** No rule to make target `oshmem_max_reduction.java', needed by
> `oshmem_max_reduction.class'.
> make[2]: Target `oshmem_max_reduction.class' not remade because of errors.
> make[2]: Leaving directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: Entering directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: *** No rule to make target `oshmem_strided_puts.java', needed by
> `oshmem_strided_puts.class'.
> make[2]: Target `oshmem_strided_puts.class' not remade because of errors.
> make[2]: Leaving directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: Entering directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[2]: *** No rule to make target `oshmem_symmetric_data.java', needed
> by `oshmem_symmetric_data.class'.
> make[2]: Target `oshmem_symmetric_data.class' not remade because of errors.
> make[2]: Leaving directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make[1]: *** [oshmem] Error 2
> make[1]: Leaving directory
> `/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
> make: *** [all] Error 2
>
> -Paul
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/16900.php
>