Thanks Paul, I will give it a try
Cheers, Gilles On Wednesday, August 26, 2015, Paul Hargrove <phhargr...@lbl.gov> wrote: > Gilles, > > Is the conflict over "SIG32"? > If so, I believe setenv PSM_RCVTHREAD=0 in the environment will disable > InfiniPath's use of that signal. > > -Paul > > On Tue, Aug 25, 2015 at 6:02 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: > >> i run on a centos 7 vm, and with the OFED that comes with centos >> (I will send full details tomorrow) >> there is no psm hardware, just infinipath libs >> >> a first trivial workaround in ompi would be to >> putenv("OMPI_MCA_mtl_psm_priority=0") >> in the java binding before invoking ompi_mpi_init, >> but that cannot works because libinfinipath is dlopen'ed and it's signal >> handler is set >> also, I guess putenv("OMPI_MCA_mtl=^psm") would not work if ompi was >> configure'd with--disable-dlopen >> >> Cheers, >> >> Gilles >> >> >> On Wednesday, August 26, 2015, Ralph Castain <r...@open-mpi.org >> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote: >> >>> Gilles: what version of PSM were you using? and with which cards? >>> >>> >>> On Aug 25, 2015, at 9:32 AM, Nathaniel Graham <nrgraha...@gmail.com> >>> wrote: >>> >>> What if we modify the mpirun script to include the --mca mtl ^psm tag if >>> java is in the run string? >>> >>> -Nathan >>> >>> On Tue, Aug 25, 2015 at 9:47 AM, Howard Pritchard <hpprit...@gmail.com> >>> wrote: >>> >>>> I'll update the java FAQ. >>>> >>>> 2015-08-25 8:36 GMT-06:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com>: >>>> >>>>> On Aug 25, 2015, at 10:00 AM, Howard Pritchard <hpprit...@gmail.com> >>>>> wrote: >>>>> > >>>>> > I think rather than trying workarounds of dubious robustness inside >>>>> open mpi we >>>>> > >>>>> > - dicument the issue on either the somewhat aged open mpi website >>>>> faq or add it to a wiki page on github >>>>> >>>>> It should probably be documented in the README and the FAQ. >>>>> >>>>> I'd be against adding user documentation to the wiki -- this would be >>>>> a 3rd place for users to look for information. >>>>> >>>>> > - file a bug against intel psm >>>>> >>>>> I'd like to hear what they have to say first... :-) >>>>> >>>>> > >>>>> > ---------- >>>>> > >>>>> > sent from my smart phonr so no good type. >>>>> > >>>>> > Howard >>>>> > >>>>> > On Aug 25, 2015 6:02 AM, "Gilles Gouaillardet" < >>>>> gilles.gouaillar...@gmail.com> wrote: >>>>> > i do not know if this can be runtime detected ... >>>>> > note we should report this to intel folks and ask them to advise. >>>>> > ideally, they would provide a way to make sure libinfinipath.so does >>>>> not conflict with the jvm signal handlers. >>>>> > >>>>> > my idea is to dlopen libinfinipath only if java bindings are not >>>>> used. >>>>> > >>>>> > On Tuesday, August 25, 2015, Jeff Squyres (jsquyres) < >>>>> jsquy...@cisco.com> wrote: >>>>> > Is it possible to run-time detect this situation? E.g., probe the >>>>> signal handler, or somesuch. >>>>> > >>>>> > Rationale: I'd rather have something run-time disabled than not >>>>> built. >>>>> > >>>>> > Would dlopen'ing libinfinipath change actually change its signal >>>>> handler behavior? >>>>> > >>>>> > >>>>> > > On Aug 25, 2015, at 4:27 AM, Gilles Gouaillardet < >>>>> gil...@rist.or.jp> wrote: >>>>> > > >>>>> > > Folks, >>>>> > > >>>>> > > some time ago, some crashes were reported when using java bindings. >>>>> > > one of them was caused was caused by mca_mtl_psm.so. >>>>> > > the root cause is libinfinipath.so initializer sets its own signal >>>>> handler, which >>>>> > > conflicts with the signal handler sets by the jvm. >>>>> > > the only workaround is to disable the psm mtl >>>>> > > (e.g. mpirun --mca mtl ^psm ...) >>>>> > > since mpirun --mca mtl_psm_priority 0 ... does not work >>>>> > > (libinfinipath.so is loaded, so the initializer is ran and the >>>>> signal handlers are set) >>>>> > > so the psm mtl cannot be disabled by the Java MPI_Init() >>>>> > > >>>>> > > one option is to document this >>>>> > > an other option is not to build the psm mtl if java bindings are >>>>> built >>>>> > > and an other option is to revamp mca_mtl_psm.so so it does not >>>>> link with libinfinipath.so >>>>> > > (use an intermediate component, or dlopen libinfinipath) >>>>> > > >>>>> > > any thoughts ? >>>>> > > >>>>> > > Cheers, >>>>> > > >>>>> > > Gilles >>>>> > > _______________________________________________ >>>>> > > devel mailing list >>>>> > > de...@open-mpi.org >>>>> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> > > Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17838.php >>>>> > >>>>> > >>>>> > -- >>>>> > Jeff Squyres >>>>> > jsquy...@cisco.com >>>>> > For corporate legal information go to: >>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>> > >>>>> > _______________________________________________ >>>>> > devel mailing list >>>>> > de...@open-mpi.org >>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> > Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17840.php >>>>> > >>>>> > _______________________________________________ >>>>> > devel mailing list >>>>> > de...@open-mpi.org >>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> > Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17841.php >>>>> > _______________________________________________ >>>>> > devel mailing list >>>>> > de...@open-mpi.org >>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> > Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17845.php >>>>> >>>>> >>>>> -- >>>>> Jeff Squyres >>>>> jsquy...@cisco.com >>>>> For corporate legal information go to: >>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17847.php >>>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/08/17849.php >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/08/17851.php >>> >>> >>> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/08/17857.php >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');> > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >