Jeff I was wrong about this. all the mtls except for portals4 register with opal progress in their comp init.
I dont see how this is a problem though as base select only invokes comp init on the selected mtl. Howard ---------- sent from my smart phonr so no good type. Howard On Jul 24, 2015 8:19 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: > Yohann -- > > Can you have a look? > > > > On Jul 24, 2015, at 10:15 AM, Howard Pritchard <hpprit...@gmail.com> > wrote: > > > > looks like ofi mtl is being naughty. its tje onlx mtl which registers > with opal progress in component init method. > > > > ---------- > > > > sent from my smart phonr so no good type. > > > > Howard > > > > On Jul 23, 2015 7:03 PM, "Ralph Castain" <r...@open-mpi.org> wrote: > > It looks like one of the MTL components is registering a progress call > with the opal_progress thread, and then unloading when de-selected. > Registering with opal_progress should only be done once the MTL has been > selected and will run > > > > > >> On Jul 23, 2015, at 5:05 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > >> > >> Yohann, > >> > >> With PR409 as it stands right now (commit 6daef310) I see no change to > the behavior. > >> I still get a SEGV below opal_progress() unless I use either > >> -mca mtl ^ofi > >> OR > >> -mca pml cm > >> > >> A backtrace from gdb appears below. > >> > >> -Paul > >> > >> (gdb) where > >> #0 0x00007f5bc7b59867 in ?? () from /lib64/libgcc_s.so.1 > >> #1 0x00007f5bc7b5a119 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1 > >> #2 0x00007f5bcc9b08f6 in __backtrace (array=<value optimized out>, > size=32) > >> at ../sysdeps/ia64/backtrace.c:110 > >> #3 0x00007f5bcc3483e1 in opal_backtrace_print (file=0x7f5bccc40880, > >> prefix=0x7fff6181d1f0 "[pcp-f-5:05049] ", strip=2) > >> at > /scratch/phargrov/OMPI/openmpi-1.10.0rc2-linux-x86_64-sl6x/openmpi-1.10.0rc2/opal/mca/backtrace/execinfo/backtrace_execinfo.c:47 > >> #4 0x00007f5bcc3456a9 in show_stackframe (signo=11, > info=0x7fff6181d770, p=0x7fff6181d640) > >> at > /scratch/phargrov/OMPI/openmpi-1.10.0rc2-linux-x86_64-sl6x/openmpi-1.10.0rc2/opal/util/stacktrace.c:336 > >> #5 <signal handler called> > >> #6 0x00007f5bc7717c58 in ?? () > >> #7 0x00007f5bcc2f567a in opal_progress () > >> at > /scratch/phargrov/OMPI/openmpi-1.10.0rc2-linux-x86_64-sl6x/openmpi-1.10.0rc2/opal/runtime/opal_progress.c:187 > >> #8 0x00007f5bccebbcb9 in ompi_mpi_init (argc=1, argv=0x7fff6181dd78, > requested=0, provided=0x7fff6181dbf8) > >> at > /scratch/phargrov/OMPI/openmpi-1.10.0rc2-linux-x86_64-sl6x/openmpi-1.10.0rc2/ompi/runtime/ompi_mpi_init.c:645 > >> #9 0x00007f5bccefbe77 in PMPI_Init (argc=0x7fff6181dc5c, > argv=0x7fff6181dc50) at pinit.c:84 > >> #10 0x000000000040088e in main (argc=1, argv=0x7fff6181dd78) at > ring_c.c:19 > >> > >> (gdb) up 6 > >> #6 0x00007f5bc7717c58 in ?? () > >> (gdb) disass > >> No function contains program counter for selected frame. > >> > >> On Thu, Jul 23, 2015 at 8:13 AM, Burette, Yohann < > yohann.bure...@intel.com> wrote: > >> Paul, > >> > >> > >> > >> While looking at the issue, we noticed that we were missing some code > that deals with MTL priorities. > >> > >> > >> > >> PR 409 (https://github.com/open-mpi/ompi-release/pull/409) is > attempting to fix that. > >> > >> > >> > >> Hopefully, this will also fix the error you encountered. > >> > >> > >> > >> Thanks again, > >> > >> Yohann > >> > >> > >> > >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Paul > Hargrove > >> Sent: Wednesday, July 22, 2015 12:07 PM > >> > >> > >> To: Open MPI Developers > >> Subject: Re: [OMPI devel] 1.10.0rc2 > >> > >> > >> > >> Yohann, > >> > >> > >> > >> Things run fine with those additional flags. > >> > >> In fact, adding just "--mca pml cm" is sufficient to eliminate the SEGV. > >> > >> > >> > >> -Paul > >> > >> > >> > >> On Wed, Jul 22, 2015 at 8:49 AM, Burette, Yohann < > yohann.bure...@intel.com> wrote: > >> > >> Hi Paul, > >> > >> > >> > >> Thank you for doing all this testing! > >> > >> > >> > >> About 1), it’s hard for me to see whether it’s a problem with mtl:ofi > or with how OMPI selects the components to use. > >> > >> Could you please run your test again with “--mca mtl ofi --mca > mtl_ofi_provider sockets --mca pml cm”? > >> > >> The idea is that if it still fails, then we have a problem with either > mtl:ofi or the OFI/sockets provider. If it works, then there is an issue > with how OMPI selects what component to use. > >> > >> > >> > >> I just tried 1.10.0rc2 with the latest libfabric (master) and it seems > to work fine. > >> > >> > >> > >> Yohann > >> > >> > >> > >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Paul > Hargrove > >> Sent: Wednesday, July 22, 2015 1:05 AM > >> To: Open MPI Developers > >> Subject: Re: [OMPI devel] 1.10.0rc2 > >> > >> > >> > >> 1.10.0rc2 looks mostly good to me, but I still found some issues. > >> > >> > >> > >> > >> > >> 1) New to this round of testing, I have built mtl:ofi with gcc, pgi, > icc, clang, open64 and studio compilers. > >> > >> I have only the sockets provider in libfaric (v1.0.0 and 1.1.0rc2). > >> > >> However, unless I pass "-mca mtl ^ofi" to mpirun I get a SEGV from a > callback invoked in opal_progress(). > >> > >> Gdb did not give a function name for the callback, but the PC looks > valid. > >> > >> > >> > >> > >> > >> 2) Of the several compilers I tried, only pgi-13.0 failed to compile > mtl:ofi: > >> > >> > >> > >> /bin/sh ../../../../libtool --tag=CC --mode=compile pgcc > -DHAVE_CONFIG_H -I. > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi > -I../../../../opal/include -I../../../../orte/include > -I../../../../ompi/include -I../../../../oshmem/include > -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen > -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen > -I/usr/common/ftg/libfabric/1.1.0rc2p1/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2 > -I../../../.. > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/orte/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/oshmem/include > > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/hwloc/hwloc191/hwloc/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/hwloc/hwloc191/hwloc/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/event/libevent2021/libevent/include > -g -c -o mtl_ofi_component.lo > /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c > >> > >> libtool: compile: pgcc -DHAVE_CONFIG_H -I. > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi > -I../../../../opal/include -I../../../../orte/include > -I../../../../ompi/include -I../../../../oshmem/include > -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen > -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen > -I/usr/common/ftg/libfabric/1.1.0rc2p1/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2 > -I../../../.. > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/orte/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/oshmem/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/hwloc/hwloc191/hwloc/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/hwloc/hwloc191/hwloc/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent/include > -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/event/libevent2021/libevent/include > -g -c > /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c > -fpic -DPIC -o .libs/mtl_ofi_component.o > >> > >> PGC-S-0060-opal_convertor_clone is not a member of this struct or union > (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c: > 51) > >> > >> > pgcc-Fatal-/global/scratch2/sd/hargrove/pgi-13.10/linux86-64/13.10/bin/pgc > TERMINATED by signal 11 > >> > >> > >> > >> Since this ends with a SEGV in the compiler, I don't think this is an > issue with the C code, just a plain compiler bug. > >> > >> At lease pgi-9.0-4 and pgi-10.9 compiled the code just fine. > >> > >> > >> > >> > >> > >> 3) As I noted in a separate email, there are some newly uncovered > issues in the embedded hwloc w/ pgi and -m32. > >> > >> However, I had not tested such configurations previously, and all > indications are that these issues have existed for a while. > >> > >> Brice is on vacation, so there will not be an official hwloc fix for > this issue until next week at the earliest. > >> > >> [The upside is that I now have coverage for eight additional x86 > configurations (true x86 or x86-64 w/ -m32).] > >> > >> > >> > >> > >> > >> 4) I noticed a couple warnings somebody might want to investigate: > >> > >> > > openmpi-1.10.0rc2/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:2323:59: > warning: format specifies type 'int' but the argument has type 'struct > ibv_qp *' [-Wformat] > >> > >> > openmpi-1.10.0rc2/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c", > line 2471: warning: improper pointer/integer combination: arg #3 > >> > >> > >> > >> > >> > >> > >> > >> Also worth noting: > >> > >> > >> > >> The ConnectX and ConnectIB XRC detection logic appears to be working as > expected on multiple systems. > >> > >> > >> > >> I also have learned that pgi-9.0-4 is not a conforming C99 compiler > when passed -m32, which is not Open MPI's fault. > >> > >> > >> > >> > >> > >> And as before... > >> > >> + I am currently without any SPARC platforms > >> > >> + Several qemu-emulated ARM and MIPS tests will complete by morning > (though I have some ARM successes already) > >> > >> > >> > >> > >> > >> -Paul > >> > >> > >> > >> On Tue, Jul 21, 2015 at 12:29 PM, Ralph Castain <r...@open-mpi.org> > wrote: > >> > >> Hey folks > >> > >> > >> > >> 1.10.0rc2 is now out for review - excepting the library version > numbers, this should be the final version. Please take a quick gander and > let me know of any problems. > >> > >> > >> > >> http://www.open-mpi.org/software/ompi/v1.10/ > >> > >> > >> > >> Ralph > >> > >> > >> > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/07/17670.php > >> > >> > >> > >> > >> > >> > >> -- > >> > >> Paul H. Hargrove phhargr...@lbl.gov > >> > >> Computer Languages & Systems Software (CLaSS) Group > >> > >> Computer Science Department Tel: +1-510-495-2352 > >> > >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > >> > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/07/17681.php > >> > >> > >> > >> > >> > >> > >> -- > >> > >> Paul H. Hargrove phhargr...@lbl.gov > >> > >> Computer Languages & Systems Software (CLaSS) Group > >> > >> Computer Science Department Tel: +1-510-495-2352 > >> > >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > >> > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/07/17687.php > >> > >> > >> > >> -- > >> Paul H. Hargrove phhargr...@lbl.gov > >> Computer Languages & Systems Software (CLaSS) Group > >> Computer Science Department Tel: +1-510-495-2352 > >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/07/17688.php > > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/07/17689.php > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/07/17690.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/07/17691.php