Gilles, Is the conflict over "SIG32"? If so, I believe setenv PSM_RCVTHREAD=0 in the environment will disable InfiniPath's use of that signal.
-Paul On Tue, Aug 25, 2015 at 6:02 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > i run on a centos 7 vm, and with the OFED that comes with centos > (I will send full details tomorrow) > there is no psm hardware, just infinipath libs > > a first trivial workaround in ompi would be to > putenv("OMPI_MCA_mtl_psm_priority=0") > in the java binding before invoking ompi_mpi_init, > but that cannot works because libinfinipath is dlopen'ed and it's signal > handler is set > also, I guess putenv("OMPI_MCA_mtl=^psm") would not work if ompi was > configure'd with--disable-dlopen > > Cheers, > > Gilles > > > On Wednesday, August 26, 2015, Ralph Castain <r...@open-mpi.org> wrote: > >> Gilles: what version of PSM were you using? and with which cards? >> >> >> On Aug 25, 2015, at 9:32 AM, Nathaniel Graham <nrgraha...@gmail.com> >> wrote: >> >> What if we modify the mpirun script to include the --mca mtl ^psm tag if >> java is in the run string? >> >> -Nathan >> >> On Tue, Aug 25, 2015 at 9:47 AM, Howard Pritchard <hpprit...@gmail.com> >> wrote: >> >>> I'll update the java FAQ. >>> >>> 2015-08-25 8:36 GMT-06:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com>: >>> >>>> On Aug 25, 2015, at 10:00 AM, Howard Pritchard <hpprit...@gmail.com> >>>> wrote: >>>> > >>>> > I think rather than trying workarounds of dubious robustness inside >>>> open mpi we >>>> > >>>> > - dicument the issue on either the somewhat aged open mpi website faq >>>> or add it to a wiki page on github >>>> >>>> It should probably be documented in the README and the FAQ. >>>> >>>> I'd be against adding user documentation to the wiki -- this would be a >>>> 3rd place for users to look for information. >>>> >>>> > - file a bug against intel psm >>>> >>>> I'd like to hear what they have to say first... :-) >>>> >>>> > >>>> > ---------- >>>> > >>>> > sent from my smart phonr so no good type. >>>> > >>>> > Howard >>>> > >>>> > On Aug 25, 2015 6:02 AM, "Gilles Gouaillardet" < >>>> gilles.gouaillar...@gmail.com> wrote: >>>> > i do not know if this can be runtime detected ... >>>> > note we should report this to intel folks and ask them to advise. >>>> > ideally, they would provide a way to make sure libinfinipath.so does >>>> not conflict with the jvm signal handlers. >>>> > >>>> > my idea is to dlopen libinfinipath only if java bindings are not used. >>>> > >>>> > On Tuesday, August 25, 2015, Jeff Squyres (jsquyres) < >>>> jsquy...@cisco.com> wrote: >>>> > Is it possible to run-time detect this situation? E.g., probe the >>>> signal handler, or somesuch. >>>> > >>>> > Rationale: I'd rather have something run-time disabled than not built. >>>> > >>>> > Would dlopen'ing libinfinipath change actually change its signal >>>> handler behavior? >>>> > >>>> > >>>> > > On Aug 25, 2015, at 4:27 AM, Gilles Gouaillardet <gil...@rist.or.jp> >>>> wrote: >>>> > > >>>> > > Folks, >>>> > > >>>> > > some time ago, some crashes were reported when using java bindings. >>>> > > one of them was caused was caused by mca_mtl_psm.so. >>>> > > the root cause is libinfinipath.so initializer sets its own signal >>>> handler, which >>>> > > conflicts with the signal handler sets by the jvm. >>>> > > the only workaround is to disable the psm mtl >>>> > > (e.g. mpirun --mca mtl ^psm ...) >>>> > > since mpirun --mca mtl_psm_priority 0 ... does not work >>>> > > (libinfinipath.so is loaded, so the initializer is ran and the >>>> signal handlers are set) >>>> > > so the psm mtl cannot be disabled by the Java MPI_Init() >>>> > > >>>> > > one option is to document this >>>> > > an other option is not to build the psm mtl if java bindings are >>>> built >>>> > > and an other option is to revamp mca_mtl_psm.so so it does not link >>>> with libinfinipath.so >>>> > > (use an intermediate component, or dlopen libinfinipath) >>>> > > >>>> > > any thoughts ? >>>> > > >>>> > > Cheers, >>>> > > >>>> > > Gilles >>>> > > _______________________________________________ >>>> > > devel mailing list >>>> > > de...@open-mpi.org >>>> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> > > Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/08/17838.php >>>> > >>>> > >>>> > -- >>>> > Jeff Squyres >>>> > jsquy...@cisco.com >>>> > For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> > >>>> > _______________________________________________ >>>> > devel mailing list >>>> > de...@open-mpi.org >>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> > Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/08/17840.php >>>> > >>>> > _______________________________________________ >>>> > devel mailing list >>>> > de...@open-mpi.org >>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> > Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/08/17841.php >>>> > _______________________________________________ >>>> > devel mailing list >>>> > de...@open-mpi.org >>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> > Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/08/17845.php >>>> >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/08/17847.php >>>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/08/17849.php >>> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/08/17851.php >> >> >> > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/08/17857.php > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900