> On 17 Sep 2015, at 20:34 , Ralph Castain <r...@open-mpi.org> wrote:
> 
> Ouch - this is on current master HEAD?

Yep!

> I'm on travel right now, but I'll be back Fri evening and can look at it this 
> weekend. Probably something silly that needs to be fixed.

Thanks!

Obviously I didn't check every single version between March and now, but its 
safe to assume that it didn't work in between either I guess.


> 
> 
> On Thu, Sep 17, 2015 at 11:30 AM, Mark Santcroos <mark.santcr...@rutgers.edu> 
> wrote:
> Hi (Ralph),
> 
> Over the last months I have been focussing on exec throughput, and not so 
> much on the application payload (read: mainly using /bin/sleep ;-)
> As things are stabilising now, I returned my attention to "real" applications.
> To discover that launching MPI applications (build with the same Open MPI 
> version) within a DVM doesn't work anymore (see error below).
> 
> I've been doing a binary search, but that turned out to be not so trivial 
> because of other problems in the window of the change.
> So far I've narrowed it down to:
> 
> 64ec498 - mar 5 - works on my laptop (but not on the Crays)
> b67b361 - mar 28 - works once per DVM launch on my laptop, but consecutive 
> orte-submits get a connect error
> b209c9e - March 30 - same MPI_Init issue as HEAD
> 
> Going further into mid-March I run into build issues with verb, runtime 
> issues with default binding complaining about missing libnumactl, runtime tcp 
> oob errors, etc.
> So I don't know whether the binary search will yield much more than I was 
> able to dig up now.
> 
> What can I do to get closer to debugging the actual issue?
> 
> Thanks!
> 
> Mark
> 
> 
> OMPI_PREFIX=/Users/mark/proj/openmpi/installed/HEAD
> OMPI_MCA_orte_hnp_uri=723386368.0;usock;tcp://192.168.0.103:56672
> OMPI_MCA_ess=tool
> [netbook:70703] Job [11038,3] has launched
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "(null)" (-43) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [netbook:70704] Local abort before MPI_INIT completed completed successfully, 
> but am not able to aggregate error messages, and not able to guarantee that 
> all other processes were killed!
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18064.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18065.php

Reply via email to