Fixed in https://github.com/open-mpi/ompi/pull/1959


> On Aug 11, 2016, at 6:23 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
> 
> Thanks George,
> 
> 
> fwiw, note the current behavior is a bit more "twisted" than that.
> 
> OPAL_MODEX_RECV_VALUE() returns successfully (e.g. err == OPAL_SUCCESS) but 
> the OPAL_PMIX_NODEID (e.g. val) value is -1.
> 
> that means orted did "push" OPAL_PMIX_NODEID, but with an unitialized value 
> of -1 (this is set in the constructor).
> 
> fortunatly, you used the same -1 special value if OPAL_MODEX_RECV_VALUE() had 
> failed (e.g. OPAL_ERR_NOT_FOUND),
> 
> so bottom line, your commit does fix the crash.
> 
> Cheers,
> 
> Gilles
> 
> On 8/12/2016 2:09 AM, George Bosilca wrote:
>> I just pushed a solution to this problem in 8d0baf140f. If we are unable to 
>> extract the expected information from the RTE, we simply build a 
>> non-reordered communicator and gracefully return.
>> 
>> That being said, not being able to correctly retrieve OPAL_PMIX_NODEID has 
>> the potential to drastically decrease the performance as no specialized 
>> hierarchies can be built without the RTE information.
>> 
>>   George.
>> 
>> 
>> On Wed, Aug 10, 2016 at 3:57 AM, Gilles Gouaillardet <gil...@rist.or.jp 
>> <mailto:gil...@rist.or.jp>> wrote:
>> Ralph,
>> 
>> 
>> i noticed dist-graph/distgraph_test_4 from the ibm test suite fails when 
>> using a hostfile and running no task on the host running mpirun.
>> 
>> n0$ mpirun --host n1:1,n2:1 -np 2 ./dist-graph/distgraph_test_4
>> 
>> 
>> the root cause is OPAL_PMIX_NODEID is correctly set ( 0, 1, 2) by mpirun, 
>> but for some reasons, orted sets it to -1 everywhere.
>> 
>> an indirect consequence is a crash of the test (it believes tasks run on 
>> zero distinct nodes instead of 2)
>> 
>> 
>> this occurs only master, and v2.x is fine.
>> 
>> 
>> Could you please have a look ?
>> 
>> 
>> Cheers,
>> 
>> 
>> Gilles
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
>> 
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to