I think Ralph answered this question: if you register a progress function but 
then get your component unloaded without un-registering the progress 
function... kaboom.


> On Jul 24, 2015, at 10:37 AM, Howard Pritchard <hpprit...@gmail.com> wrote:
> 
> Jeff
> 
> I was wrong about this.  all the mtls except for portals4 register with opal 
> progress in their comp init.
> 
> I dont see how this is a problem though as base select only invokes comp init 
> on the selected mtl. 
> 
> Howard
> 
> ----------
> 
> sent from my smart phonr so no good type.
> 
> Howard
> 
> On Jul 24, 2015 8:19 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:
> Yohann --
> 
> Can you have a look?
> 
> 
> > On Jul 24, 2015, at 10:15 AM, Howard Pritchard <hpprit...@gmail.com> wrote:
> >
> > looks like ofi mtl is being naughty.  its tje onlx mtl which registers with 
> > opal progress in component init method.
> >
> > ----------
> >
> > sent from my smart phonr so no good type.
> >
> > Howard
> >
> > On Jul 23, 2015 7:03 PM, "Ralph Castain" <r...@open-mpi.org> wrote:
> > It looks like one of the MTL components is registering a progress call with 
> > the opal_progress thread, and then unloading when de-selected. Registering 
> > with opal_progress should only be done once the MTL has been selected and 
> > will run
> >
> >
> >> On Jul 23, 2015, at 5:05 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >>
> >> Yohann,
> >>
> >> With PR409 as it stands right now (commit 6daef310) I see no change to the 
> >> behavior.
> >> I still get a SEGV below opal_progress() unless I use either
> >>    -mca mtl ^ofi
> >> OR
> >>    -mca pml cm
> >>
> >> A backtrace from gdb appears below.
> >>
> >> -Paul
> >>
> >> (gdb) where
> >> #0  0x00007f5bc7b59867 in ?? () from /lib64/libgcc_s.so.1
> >> #1  0x00007f5bc7b5a119 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> >> #2  0x00007f5bcc9b08f6 in __backtrace (array=<value optimized out>, 
> >> size=32)
> >>     at ../sysdeps/ia64/backtrace.c:110
> >> #3  0x00007f5bcc3483e1 in opal_backtrace_print (file=0x7f5bccc40880,
> >>     prefix=0x7fff6181d1f0 "[pcp-f-5:05049] ", strip=2)
> >>     at 
> >> /scratch/phargrov/OMPI/openmpi-1.10.0rc2-linux-x86_64-sl6x/openmpi-1.10.0rc2/opal/mca/backtrace/execinfo/backtrace_execinfo.c:47
> >> #4  0x00007f5bcc3456a9 in show_stackframe (signo=11, info=0x7fff6181d770, 
> >> p=0x7fff6181d640)
> >>     at 
> >> /scratch/phargrov/OMPI/openmpi-1.10.0rc2-linux-x86_64-sl6x/openmpi-1.10.0rc2/opal/util/stacktrace.c:336
> >> #5  <signal handler called>
> >> #6  0x00007f5bc7717c58 in ?? ()
> >> #7  0x00007f5bcc2f567a in opal_progress ()
> >>     at 
> >> /scratch/phargrov/OMPI/openmpi-1.10.0rc2-linux-x86_64-sl6x/openmpi-1.10.0rc2/opal/runtime/opal_progress.c:187
> >> #8  0x00007f5bccebbcb9 in ompi_mpi_init (argc=1, argv=0x7fff6181dd78, 
> >> requested=0, provided=0x7fff6181dbf8)
> >>     at 
> >> /scratch/phargrov/OMPI/openmpi-1.10.0rc2-linux-x86_64-sl6x/openmpi-1.10.0rc2/ompi/runtime/ompi_mpi_init.c:645
> >> #9  0x00007f5bccefbe77 in PMPI_Init (argc=0x7fff6181dc5c, 
> >> argv=0x7fff6181dc50) at pinit.c:84
> >> #10 0x000000000040088e in main (argc=1, argv=0x7fff6181dd78) at ring_c.c:19
> >>
> >> (gdb) up 6
> >> #6  0x00007f5bc7717c58 in ?? ()
> >> (gdb) disass
> >> No function contains program counter for selected frame.
> >>
> >> On Thu, Jul 23, 2015 at 8:13 AM, Burette, Yohann 
> >> <yohann.bure...@intel.com> wrote:
> >> Paul,
> >>
> >>
> >>
> >> While looking at the issue, we noticed that we were missing some code that 
> >> deals with MTL priorities.
> >>
> >>
> >>
> >> PR 409 (https://github.com/open-mpi/ompi-release/pull/409) is attempting 
> >> to fix that.
> >>
> >>
> >>
> >> Hopefully, this will also fix the error you encountered.
> >>
> >>
> >>
> >> Thanks again,
> >>
> >> Yohann
> >>
> >>
> >>
> >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Paul Hargrove
> >> Sent: Wednesday, July 22, 2015 12:07 PM
> >>
> >>
> >> To: Open MPI Developers
> >> Subject: Re: [OMPI devel] 1.10.0rc2
> >>
> >>
> >>
> >> Yohann,
> >>
> >>
> >>
> >> Things run fine with those additional flags.
> >>
> >> In fact, adding just "--mca pml cm" is sufficient to eliminate the SEGV.
> >>
> >>
> >>
> >> -Paul
> >>
> >>
> >>
> >> On Wed, Jul 22, 2015 at 8:49 AM, Burette, Yohann 
> >> <yohann.bure...@intel.com> wrote:
> >>
> >> Hi Paul,
> >>
> >>
> >>
> >> Thank you for doing all this testing!
> >>
> >>
> >>
> >> About 1), it’s hard for me to see whether it’s a problem with mtl:ofi or 
> >> with how OMPI selects the components to use.
> >>
> >> Could you please run your test again with “--mca mtl ofi --mca 
> >> mtl_ofi_provider sockets --mca pml cm”?
> >>
> >> The idea is that if it still fails, then we have a problem with either 
> >> mtl:ofi or the OFI/sockets provider. If it works, then there is an issue 
> >> with how OMPI selects what component to use.
> >>
> >>
> >>
> >> I just tried 1.10.0rc2 with the latest libfabric (master) and it seems to 
> >> work fine.
> >>
> >>
> >>
> >> Yohann
> >>
> >>
> >>
> >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Paul Hargrove
> >> Sent: Wednesday, July 22, 2015 1:05 AM
> >> To: Open MPI Developers
> >> Subject: Re: [OMPI devel] 1.10.0rc2
> >>
> >>
> >>
> >> 1.10.0rc2 looks mostly good to me, but I still found some issues.
> >>
> >>
> >>
> >>
> >>
> >> 1) New to this round of testing, I have built mtl:ofi with gcc, pgi, icc, 
> >> clang, open64 and studio compilers.
> >>
> >> I have only the sockets provider in libfaric (v1.0.0 and 1.1.0rc2).
> >>
> >> However, unless I pass "-mca mtl ^ofi" to mpirun I get a SEGV from a 
> >> callback invoked in opal_progress().
> >>
> >> Gdb did not give a function name for the  callback, but the PC looks valid.
> >>
> >>
> >>
> >>
> >>
> >> 2) Of the several compilers I tried, only pgi-13.0 failed to compile 
> >> mtl:ofi:
> >>
> >>
> >>
> >>         /bin/sh ../../../../libtool  --tag=CC   --mode=compile pgcc 
> >> -DHAVE_CONFIG_H -I. 
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi
> >>  -I../../../../opal/include -I../../../../orte/include 
> >> -I../../../../ompi/include -I../../../../oshmem/include 
> >> -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen 
> >> -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen  
> >> -I/usr/common/ftg/libfabric/1.1.0rc2p1/include 
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2
> >>  -I../../../.. 
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/orte/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/oshmem/include
> >>    
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/hwloc/hwloc191/hwloc/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/hwloc/hwloc191/hwloc/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/event/libevent2021/libevent/include
> >>   -g  -c -o mtl_ofi_component.lo 
> >> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c
> >>
> >> libtool: compile:  pgcc -DHAVE_CONFIG_H -I. 
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi
> >>  -I../../../../opal/include -I../../../../orte/include 
> >> -I../../../../ompi/include -I../../../../oshmem/include 
> >> -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen 
> >> -I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen 
> >> -I/usr/common/ftg/libfabric/1.1.0rc2p1/include 
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2
> >>  -I../../../.. 
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/orte/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/oshmem/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/hwloc/hwloc191/hwloc/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/hwloc/hwloc191/hwloc/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent/include
> >>  
> >> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/event/libevent2021/libevent/include
> >>  -g -c 
> >> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c
> >>   -fpic -DPIC -o .libs/mtl_ofi_component.o
> >>
> >> PGC-S-0060-opal_convertor_clone is not a member of this struct or union 
> >> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c:
> >>  51)
> >>
> >> pgcc-Fatal-/global/scratch2/sd/hargrove/pgi-13.10/linux86-64/13.10/bin/pgc 
> >> TERMINATED by signal 11
> >>
> >>
> >>
> >> Since this ends with a SEGV in the compiler, I don't think this is an 
> >> issue with the C code, just a plain compiler bug.
> >>
> >> At lease pgi-9.0-4 and pgi-10.9 compiled the code just fine.
> >>
> >>
> >>
> >>
> >>
> >> 3) As I noted in a separate email, there are some newly uncovered issues 
> >> in the embedded hwloc w/ pgi and -m32.
> >>
> >> However, I had not tested such configurations previously, and all 
> >> indications are that these issues have existed for a while.
> >>
> >> Brice is on vacation, so there will not be an official hwloc fix for this 
> >> issue until next week at the earliest.
> >>
> >> [The upside is that I now have coverage for eight additional x86 
> >> configurations (true x86 or x86-64 w/ -m32).]
> >>
> >>
> >>
> >>
> >>
> >> 4) I noticed a couple warnings somebody might want to investigate:
> >>
> >>   
> >> openmpi-1.10.0rc2/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:2323:59:
> >>  warning: format specifies type 'int' but the argument has type 'struct 
> >> ibv_qp *' [-Wformat]
> >>
> >>   
> >> openmpi-1.10.0rc2/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c", 
> >> line 2471: warning: improper pointer/integer combination: arg #3
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Also worth noting:
> >>
> >>
> >>
> >> The ConnectX and ConnectIB XRC detection logic appears to be working as 
> >> expected on multiple systems.
> >>
> >>
> >>
> >> I also have learned that pgi-9.0-4 is not a conforming C99 compiler when 
> >> passed -m32, which is not Open MPI's fault.
> >>
> >>
> >>
> >>
> >>
> >> And as before...
> >>
> >> + I am currently without any SPARC platforms
> >>
> >> + Several qemu-emulated ARM and MIPS tests will complete by morning 
> >> (though I have some ARM successes already)
> >>
> >>
> >>
> >>
> >>
> >> -Paul
> >>
> >>
> >>
> >> On Tue, Jul 21, 2015 at 12:29 PM, Ralph Castain <r...@open-mpi.org> wrote:
> >>
> >> Hey folks
> >>
> >>
> >>
> >> 1.10.0rc2 is now out for review - excepting the library version numbers, 
> >> this should be the final version. Please take a quick gander and let me 
> >> know of any problems.
> >>
> >>
> >>
> >> http://www.open-mpi.org/software/ompi/v1.10/
> >>
> >>
> >>
> >> Ralph
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2015/07/17670.php
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Paul H. Hargrove                          phhargr...@lbl.gov
> >>
> >> Computer Languages & Systems Software (CLaSS) Group
> >>
> >> Computer Science Department               Tel: +1-510-495-2352
> >>
> >> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2015/07/17681.php
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Paul H. Hargrove                          phhargr...@lbl.gov
> >>
> >> Computer Languages & Systems Software (CLaSS) Group
> >>
> >> Computer Science Department               Tel: +1-510-495-2352
> >>
> >> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2015/07/17687.php
> >>
> >>
> >>
> >> --
> >> Paul H. Hargrove                          phhargr...@lbl.gov
> >> Computer Languages & Systems Software (CLaSS) Group
> >> Computer Science Department               Tel: +1-510-495-2352
> >> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2015/07/17688.php
> >
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/07/17689.php
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/07/17690.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/07/17691.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/07/17694.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to