Okay, I’ll take a look - thanks!
> On Jul 15, 2016, at 7:08 AM, Gilles Gouaillardet
> <gilles.gouaillar...@gmail.com> wrote:
>
>
> Yep,
>
> The constructor of pmix2x_threadshift_t (tscon) does not initialize
> nondefault to false.
> I won't be able to investigate this until Monday, but so far, my guess is
> that if the constructor is fixed, then RHEL6 will fail like RHEL7 ...
>
> fwiw, the intercomm_create used to fail in Cisco mtt because of too many
> tasks and no over subscription, now it fails because of this bug.
>
> Cheers,
>
> Gilles
>
> On Friday, July 15, 2016, Ralph Castain <r...@open-mpi.org
> <mailto:r...@open-mpi.org>> wrote:
> That would break debugger attach. Sounds to me like it’s just an
> uninitialized variable for in_event_hdlr?
>
> > On Jul 15, 2016, at 1:20 AM, Gilles Gouaillardet <gil...@rist.or.jp
> > <javascript:;>> wrote:
> >
> > Ralph,
> >
> > i noticed MPI_Comm_spawn is broken on master and on RHEL7
> >
> > for some reason i cannot yet explain, it works just fine on RHEL6 (!)
> >
> >
> > mpirun -np 1 ./dynamic/intercomm_create
> >
> > from the ibm test suite can be used to reproduce the issue.
> >
> >
> >
> > i digged a bit and i found OPAL_ERR_DEBUGGER_RELEASE is fired in mpirun,
> > then the tasks received
> >
> > a PMIX_ERR_DEBUGGER_RELEASE notification. it seems no event handler is
> > registered, so the default handler
> >
> > kills the task.
> >
> >
> > for the time being, a trivial workaround is not to fire
> > OPAL_ERR_DEBUGGER_RELEASE in the first place
> >
> > (see patch below)
> >
> >
> > could you please have a look ?
> >
> > i am not sure whether client should not be notified at all, or whether they
> > should register a dummy handler.
> >
> > fwiw, in _event_hdlr, cd->nondefault is true on RHEL6, but false on RHEL7,
> > and that might indicate a race condition
> >
> >
> > Cheers,
> >
> >
> > Gilles
> >
> > diff --git a/orte/orted/orted_submit.c b/orte/orted/orted_submit.c
> > index b9d571c..0de0e79 100644
> > --- a/orte/orted/orted_submit.c
> > +++ b/orte/orted/orted_submit.c
> > @@ -2155,6 +2155,7 @@ static bool mpir_breakpoint_fired = false;
> >
> > static void _send_notification(void)
> > {
> > +#if 0
> > opal_buffer_t buf;
> > int status = OPAL_ERR_DEBUGGER_RELEASE;
> > orte_grpcomm_signature_t sig;
> > @@ -2209,6 +2210,7 @@ static void _send_notification(void)
> > }
> > OBJ_DESTRUCT(&sig);
> > OBJ_DESTRUCT(&buf);
> > +#endif
> > }
> >
> > static void orte_debugger_dump(void)
> >
> >
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org <javascript:;>
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> > <https://www.open-mpi.org/mailman/listinfo.cgi/devel>
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2016/07/19214.php
> > <http://www.open-mpi.org/community/lists/devel/2016/07/19214.php>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <javascript:;>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> <https://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19215.php
> <http://www.open-mpi.org/community/lists/devel/2016/07/19215.php>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19216.php