Thanks George,
fwiw, note the current behavior is a bit more "twisted" than that.
OPAL_MODEX_RECV_VALUE() returns successfully (e.g. err == OPAL_SUCCESS)
but the OPAL_PMIX_NODEID (e.g. val) value is -1.
that means orted did "push" OPAL_PMIX_NODEID, but with an unitialized
value of -1 (this is set in the constructor).
fortunatly, you used the same -1 special value if
OPAL_MODEX_RECV_VALUE() had failed (e.g. OPAL_ERR_NOT_FOUND),
so bottom line, your commit does fix the crash.
Cheers,
Gilles
On 8/12/2016 2:09 AM, George Bosilca wrote:
I just pushed a solution to this problem in 8d0baf140f. If we are
unable to extract the expected information from the RTE, we simply
build a non-reordered communicator and gracefully return.
That being said, not being able to correctly retrieve
OPAL_PMIX_NODEID has the potential to drastically decrease the
performance as no specialized hierarchies can be built without the RTE
information.
George.
On Wed, Aug 10, 2016 at 3:57 AM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
Ralph,
i noticed dist-graph/distgraph_test_4 from the ibm test suite
fails when using a hostfile and running no task on the host
running mpirun.
n0$ mpirun --host n1:1,n2:1 -np 2 ./dist-graph/distgraph_test_4
the root cause is OPAL_PMIX_NODEID is correctly set ( 0, 1, 2) by
mpirun, but for some reasons, orted sets it to -1 everywhere.
an indirect consequence is a crash of the test (it believes tasks
run on zero distinct nodes instead of 2)
this occurs only master, and v2.x is fine.
Could you please have a look ?
Cheers,
Gilles
_______________________________________________
devel mailing list
devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel