Might not - there has been a very large amount of change over the last few
months, and I confess I haven't been checking the DVM regularly. So let me
take a step back and look at that code.

I'll also include the extensions you requested on the other email - I
didn't forget them, just somewhat overwhelmed lately


On Thu, Sep 17, 2015 at 11:39 AM, Mark Santcroos <mark.santcr...@rutgers.edu
> wrote:

>
> > On 17 Sep 2015, at 20:34 , Ralph Castain <r...@open-mpi.org> wrote:
> >
> > Ouch - this is on current master HEAD?
>
> Yep!
>
> > I'm on travel right now, but I'll be back Fri evening and can look at it
> this weekend. Probably something silly that needs to be fixed.
>
> Thanks!
>
> Obviously I didn't check every single version between March and now, but
> its safe to assume that it didn't work in between either I guess.
>
>
> >
> >
> > On Thu, Sep 17, 2015 at 11:30 AM, Mark Santcroos <
> mark.santcr...@rutgers.edu> wrote:
> > Hi (Ralph),
> >
> > Over the last months I have been focussing on exec throughput, and not
> so much on the application payload (read: mainly using /bin/sleep ;-)
> > As things are stabilising now, I returned my attention to "real"
> applications.
> > To discover that launching MPI applications (build with the same Open
> MPI version) within a DVM doesn't work anymore (see error below).
> >
> > I've been doing a binary search, but that turned out to be not so
> trivial because of other problems in the window of the change.
> > So far I've narrowed it down to:
> >
> > 64ec498 - mar 5 - works on my laptop (but not on the Crays)
> > b67b361 - mar 28 - works once per DVM launch on my laptop, but
> consecutive orte-submits get a connect error
> > b209c9e - March 30 - same MPI_Init issue as HEAD
> >
> > Going further into mid-March I run into build issues with verb, runtime
> issues with default binding complaining about missing libnumactl, runtime
> tcp oob errors, etc.
> > So I don't know whether the binary search will yield much more than I
> was able to dig up now.
> >
> > What can I do to get closer to debugging the actual issue?
> >
> > Thanks!
> >
> > Mark
> >
> >
> > OMPI_PREFIX=/Users/mark/proj/openmpi/installed/HEAD
> > OMPI_MCA_orte_hnp_uri=723386368.0;usock;tcp://192.168.0.103:56672
> > OMPI_MCA_ess=tool
> > [netbook:70703] Job [11038,3] has launched
> >
> --------------------------------------------------------------------------
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> >
> >   ompi_mpi_init: ompi_rte_init failed
> >   --> Returned "(null)" (-43) instead of "Success" (0)
> >
> --------------------------------------------------------------------------
> > *** An error occurred in MPI_Init
> > *** on a NULL communicator
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> > ***    and potentially your MPI job)
> > [netbook:70704] Local abort before MPI_INIT completed completed
> successfully, but am not able to aggregate error messages, and not able to
> guarantee that all other processes were killed!
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18064.php
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18065.php
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18066.php
>

Reply via email to