Thanks Paul,

I will give it a try

Cheers,

Gilles

On Wednesday, August 26, 2015, Paul Hargrove <phhargr...@lbl.gov> wrote:

> Gilles,
>
> Is the conflict over "SIG32"?
> If so, I believe setenv PSM_RCVTHREAD=0 in the environment will disable
> InfiniPath's use of that signal.
>
> -Paul
>
> On Tue, Aug 25, 2015 at 6:02 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
>
>> i run on a centos 7 vm, and with the OFED that comes with centos
>> (I will send full details tomorrow)
>> there is no psm hardware, just infinipath libs
>>
>> a first trivial workaround in ompi would be to
>> putenv("OMPI_MCA_mtl_psm_priority=0")
>> in the java binding before invoking ompi_mpi_init,
>> but that cannot works because libinfinipath is dlopen'ed and it's signal
>> handler is set
>> also, I guess putenv("OMPI_MCA_mtl=^psm") would not work if ompi was
>> configure'd with--disable-dlopen
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On Wednesday, August 26, 2015, Ralph Castain <r...@open-mpi.org
>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote:
>>
>>> Gilles: what version of PSM were you using? and with which cards?
>>>
>>>
>>> On Aug 25, 2015, at 9:32 AM, Nathaniel Graham <nrgraha...@gmail.com>
>>> wrote:
>>>
>>> What if we modify the mpirun script to include the --mca mtl ^psm tag if
>>> java is in the run string?
>>>
>>> -Nathan
>>>
>>> On Tue, Aug 25, 2015 at 9:47 AM, Howard Pritchard <hpprit...@gmail.com>
>>> wrote:
>>>
>>>> I'll update the java FAQ.
>>>>
>>>> 2015-08-25 8:36 GMT-06:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com>:
>>>>
>>>>> On Aug 25, 2015, at 10:00 AM, Howard Pritchard <hpprit...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > I think rather than trying workarounds of dubious robustness inside
>>>>> open mpi we
>>>>> >
>>>>> > - dicument the issue on either the somewhat aged open mpi website
>>>>> faq or add it to a wiki page on github
>>>>>
>>>>> It should probably be documented in the README and the FAQ.
>>>>>
>>>>> I'd be against adding user documentation to the wiki -- this would be
>>>>> a 3rd place for users to look for information.
>>>>>
>>>>> > - file a bug against  intel psm
>>>>>
>>>>> I'd like to hear what they have to say first... :-)
>>>>>
>>>>> >
>>>>> > ----------
>>>>> >
>>>>> > sent from my smart phonr so no good type.
>>>>> >
>>>>> > Howard
>>>>> >
>>>>> > On Aug 25, 2015 6:02 AM, "Gilles Gouaillardet" <
>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>> > i do not know if this can be runtime detected ...
>>>>> > note we should report this to intel folks and ask them to advise.
>>>>> > ideally, they would provide a way to make sure libinfinipath.so does
>>>>> not conflict with the jvm signal handlers.
>>>>> >
>>>>> > my idea is to dlopen libinfinipath only if java bindings are not
>>>>> used.
>>>>> >
>>>>> > On Tuesday, August 25, 2015, Jeff Squyres (jsquyres) <
>>>>> jsquy...@cisco.com> wrote:
>>>>> > Is it possible to run-time detect this situation?  E.g., probe the
>>>>> signal handler, or somesuch.
>>>>> >
>>>>> > Rationale: I'd rather have something run-time disabled than not
>>>>> built.
>>>>> >
>>>>> > Would dlopen'ing libinfinipath change actually change its signal
>>>>> handler behavior?
>>>>> >
>>>>> >
>>>>> > > On Aug 25, 2015, at 4:27 AM, Gilles Gouaillardet <
>>>>> gil...@rist.or.jp> wrote:
>>>>> > >
>>>>> > > Folks,
>>>>> > >
>>>>> > > some time ago, some crashes were reported when using java bindings.
>>>>> > > one of them was caused was caused by mca_mtl_psm.so.
>>>>> > > the root cause is libinfinipath.so initializer sets its own signal
>>>>> handler, which
>>>>> > > conflicts with the signal handler sets by the jvm.
>>>>> > > the only workaround is to disable the psm mtl
>>>>> > > (e.g. mpirun --mca mtl ^psm ...)
>>>>> > > since mpirun --mca mtl_psm_priority 0 ... does not work
>>>>> > > (libinfinipath.so is loaded, so the initializer is ran and the
>>>>> signal handlers are set)
>>>>> > > so the psm mtl cannot be disabled by the Java MPI_Init()
>>>>> > >
>>>>> > > one option is to document this
>>>>> > > an other option is not to build the psm mtl if java bindings are
>>>>> built
>>>>> > > and an other option is to revamp mca_mtl_psm.so so it does not
>>>>> link with libinfinipath.so
>>>>> > > (use an intermediate component, or dlopen libinfinipath)
>>>>> > >
>>>>> > > any thoughts ?
>>>>> > >
>>>>> > > Cheers,
>>>>> > >
>>>>> > > Gilles
>>>>> > > _______________________________________________
>>>>> > > devel mailing list
>>>>> > > de...@open-mpi.org
>>>>> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> > > Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17838.php
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Jeff Squyres
>>>>> > jsquy...@cisco.com
>>>>> > For corporate legal information go to:
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>> >
>>>>> > _______________________________________________
>>>>> > devel mailing list
>>>>> > de...@open-mpi.org
>>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> > Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17840.php
>>>>> >
>>>>> > _______________________________________________
>>>>> > devel mailing list
>>>>> > de...@open-mpi.org
>>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> > Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17841.php
>>>>> > _______________________________________________
>>>>> > devel mailing list
>>>>> > de...@open-mpi.org
>>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> > Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17845.php
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> jsquy...@cisco.com
>>>>> For corporate legal information go to:
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17847.php
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17849.php
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/08/17851.php
>>>
>>>
>>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/08/17857.php
>>
>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>

Reply via email to