Yohann, Things run fine with those additional flags. In fact, adding just "--mca pml cm" is sufficient to eliminate the SEGV.
-Paul On Wed, Jul 22, 2015 at 8:49 AM, Burette, Yohann <yohann.bure...@intel.com> wrote: > Hi Paul, > > > > Thank you for doing all this testing! > > > > About 1), it’s hard for me to see whether it’s a problem with mtl:ofi or > with how OMPI selects the components to use. > > Could you please run your test again with “--mca mtl ofi --mca > mtl_ofi_provider sockets --mca pml cm”? > > The idea is that if it still fails, then we have a problem with either > mtl:ofi or the OFI/sockets provider. If it works, then there is an issue > with how OMPI selects what component to use. > > > > I just tried 1.10.0rc2 with the latest libfabric (master) and it seems to > work fine. > > > > Yohann > > > > *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Paul > Hargrove > *Sent:* Wednesday, July 22, 2015 1:05 AM > *To:* Open MPI Developers > *Subject:* Re: [OMPI devel] 1.10.0rc2 > > > > 1.10.0rc2 looks mostly good to me, but I still found some issues. > > > > > > 1) New to this round of testing, I have built mtl:ofi with gcc, pgi, icc, > clang, open64 and studio compilers. > > I have only the sockets provider in libfaric (v1.0.0 and 1.1.0rc2). > > However, unless I pass "-mca mtl ^ofi" to mpirun I get a SEGV from a > callback invoked in opal_progress(). > > Gdb did not give a function name for the callback, but the PC looks valid. > > > > > > 2) Of the several compilers I tried, only pgi-13.0 failed to compile > mtl:ofi: > > > > /bin/sh ../../../../libtool --tag=CC --mode=compile pgcc > -DHAVE_CONFIG_H -I. > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi > -I../../../../opal/include -I../../../../orte/include > -I../../../../ompi/include -I../../../../oshmem/include > -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen > -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen > -I/usr/common/ftg/libfabric/1.1.0rc2p1/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2 > -I../../../.. > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/orte/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/oshmem/include > > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/hwloc/hwloc191/hwloc/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/hwloc/hwloc191/hwloc/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/event/libevent2021/libevent/include > -g -c -o mtl_ofi_component.lo > /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c > > libtool: compile: pgcc -DHAVE_CONFIG_H -I. > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi > -I../../../../opal/include -I../../../../orte/include > -I../../../../ompi/include -I../../../../oshmem/include > -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen > -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen > -I/usr/common/ftg/libfabric/1.1.0rc2p1/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2 > -I../../../.. > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/orte/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/oshmem/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/hwloc/hwloc191/hwloc/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/hwloc/hwloc191/hwloc/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/event/libevent2021/libevent/include > -g -c > /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c > -fpic -DPIC -o .libs/mtl_ofi_component.o > > PGC-S-0060-opal_convertor_clone is not a member of this struct or union > (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c: > 51) > > pgcc-Fatal-/global/scratch2/sd/hargrove/pgi-13.10/linux86-64/13.10/bin/pgc > TERMINATED by signal 11 > > > > Since this ends with a SEGV in the compiler, I don't think this is an > issue with the C code, just a plain compiler bug. > > At lease pgi-9.0-4 and pgi-10.9 compiled the code just fine. > > > > > > 3) As I noted in a separate email, there are some newly uncovered issues > in the embedded hwloc w/ pgi and -m32. > > However, I had not tested such configurations previously, and all > indications are that these issues have existed for a while. > > Brice is on vacation, so there will not be an official hwloc fix for this > issue until next week at the earliest. > > [The upside is that I now have coverage for eight additional x86 > configurations (true x86 or x86-64 w/ -m32).] > > > > > > 4) I noticed a couple warnings somebody might want to investigate: > > > openmpi-1.10.0rc2/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:2323:59: > warning: format specifies type 'int' but the argument has type 'struct > ibv_qp *' [-Wformat] > > > openmpi-1.10.0rc2/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c", > line 2471: warning: improper pointer/integer combination: arg #3 > > > > > > > > Also worth noting: > > > > The ConnectX and ConnectIB XRC detection logic appears to be working as > expected on multiple systems. > > > > I also have learned that pgi-9.0-4 is not a conforming C99 compiler when > passed -m32, which is not Open MPI's fault. > > > > > > And as before... > > + I am currently without any SPARC platforms > > + Several qemu-emulated ARM and MIPS tests will complete by morning > (though I have some ARM successes already) > > > > > > -Paul > > > > On Tue, Jul 21, 2015 at 12:29 PM, Ralph Castain <r...@open-mpi.org> wrote: > > Hey folks > > > > 1.10.0rc2 is now out for review - excepting the library version numbers, > this should be the final version. Please take a quick gander and let me > know of any problems. > > > > http://www.open-mpi.org/software/ompi/v1.10/ > > > > Ralph > > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/07/17670.php > > > > > > -- > > Paul H. Hargrove phhargr...@lbl.gov > > Computer Languages & Systems Software (CLaSS) Group > > Computer Science Department Tel: +1-510-495-2352 > > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/07/17681.php > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900