Gilles,

Is the conflict over "SIG32"?
If so, I believe setenv PSM_RCVTHREAD=0 in the environment will disable
InfiniPath's use of that signal.

-Paul

On Tue, Aug 25, 2015 at 6:02 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> i run on a centos 7 vm, and with the OFED that comes with centos
> (I will send full details tomorrow)
> there is no psm hardware, just infinipath libs
>
> a first trivial workaround in ompi would be to
> putenv("OMPI_MCA_mtl_psm_priority=0")
> in the java binding before invoking ompi_mpi_init,
> but that cannot works because libinfinipath is dlopen'ed and it's signal
> handler is set
> also, I guess putenv("OMPI_MCA_mtl=^psm") would not work if ompi was
> configure'd with--disable-dlopen
>
> Cheers,
>
> Gilles
>
>
> On Wednesday, August 26, 2015, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Gilles: what version of PSM were you using? and with which cards?
>>
>>
>> On Aug 25, 2015, at 9:32 AM, Nathaniel Graham <nrgraha...@gmail.com>
>> wrote:
>>
>> What if we modify the mpirun script to include the --mca mtl ^psm tag if
>> java is in the run string?
>>
>> -Nathan
>>
>> On Tue, Aug 25, 2015 at 9:47 AM, Howard Pritchard <hpprit...@gmail.com>
>> wrote:
>>
>>> I'll update the java FAQ.
>>>
>>> 2015-08-25 8:36 GMT-06:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com>:
>>>
>>>> On Aug 25, 2015, at 10:00 AM, Howard Pritchard <hpprit...@gmail.com>
>>>> wrote:
>>>> >
>>>> > I think rather than trying workarounds of dubious robustness inside
>>>> open mpi we
>>>> >
>>>> > - dicument the issue on either the somewhat aged open mpi website faq
>>>> or add it to a wiki page on github
>>>>
>>>> It should probably be documented in the README and the FAQ.
>>>>
>>>> I'd be against adding user documentation to the wiki -- this would be a
>>>> 3rd place for users to look for information.
>>>>
>>>> > - file a bug against  intel psm
>>>>
>>>> I'd like to hear what they have to say first... :-)
>>>>
>>>> >
>>>> > ----------
>>>> >
>>>> > sent from my smart phonr so no good type.
>>>> >
>>>> > Howard
>>>> >
>>>> > On Aug 25, 2015 6:02 AM, "Gilles Gouaillardet" <
>>>> gilles.gouaillar...@gmail.com> wrote:
>>>> > i do not know if this can be runtime detected ...
>>>> > note we should report this to intel folks and ask them to advise.
>>>> > ideally, they would provide a way to make sure libinfinipath.so does
>>>> not conflict with the jvm signal handlers.
>>>> >
>>>> > my idea is to dlopen libinfinipath only if java bindings are not used.
>>>> >
>>>> > On Tuesday, August 25, 2015, Jeff Squyres (jsquyres) <
>>>> jsquy...@cisco.com> wrote:
>>>> > Is it possible to run-time detect this situation?  E.g., probe the
>>>> signal handler, or somesuch.
>>>> >
>>>> > Rationale: I'd rather have something run-time disabled than not built.
>>>> >
>>>> > Would dlopen'ing libinfinipath change actually change its signal
>>>> handler behavior?
>>>> >
>>>> >
>>>> > > On Aug 25, 2015, at 4:27 AM, Gilles Gouaillardet <gil...@rist.or.jp>
>>>> wrote:
>>>> > >
>>>> > > Folks,
>>>> > >
>>>> > > some time ago, some crashes were reported when using java bindings.
>>>> > > one of them was caused was caused by mca_mtl_psm.so.
>>>> > > the root cause is libinfinipath.so initializer sets its own signal
>>>> handler, which
>>>> > > conflicts with the signal handler sets by the jvm.
>>>> > > the only workaround is to disable the psm mtl
>>>> > > (e.g. mpirun --mca mtl ^psm ...)
>>>> > > since mpirun --mca mtl_psm_priority 0 ... does not work
>>>> > > (libinfinipath.so is loaded, so the initializer is ran and the
>>>> signal handlers are set)
>>>> > > so the psm mtl cannot be disabled by the Java MPI_Init()
>>>> > >
>>>> > > one option is to document this
>>>> > > an other option is not to build the psm mtl if java bindings are
>>>> built
>>>> > > and an other option is to revamp mca_mtl_psm.so so it does not link
>>>> with libinfinipath.so
>>>> > > (use an intermediate component, or dlopen libinfinipath)
>>>> > >
>>>> > > any thoughts ?
>>>> > >
>>>> > > Cheers,
>>>> > >
>>>> > > Gilles
>>>> > > _______________________________________________
>>>> > > devel mailing list
>>>> > > de...@open-mpi.org
>>>> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> > > Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17838.php
>>>> >
>>>> >
>>>> > --
>>>> > Jeff Squyres
>>>> > jsquy...@cisco.com
>>>> > For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> >
>>>> > _______________________________________________
>>>> > devel mailing list
>>>> > de...@open-mpi.org
>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> > Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17840.php
>>>> >
>>>> > _______________________________________________
>>>> > devel mailing list
>>>> > de...@open-mpi.org
>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> > Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17841.php
>>>> > _______________________________________________
>>>> > devel mailing list
>>>> > de...@open-mpi.org
>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> > Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17845.php
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17847.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/08/17849.php
>>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/08/17851.php
>>
>>
>>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/08/17857.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to