I am seeing the message "Dropped message for the non-existing
communicator" when running hpcc with np=124 against r19845. This seems
to be pretty reproducible at np=124. When the job prints out the
message above some set of processes are in an MPI_Bcast and the 15
processes reporting the message are stuck in MPI_Barrier.
I am not sure how related this is to #1408 since I am not invoking the
hierarchical collectives. I just wanted to see if anyone else has tried
to run hpcc at such an np size with any success.
My next steps are to try to run this with the latest trunk and to narrow
down the failing case.
--td
- [OMPI devel] Dropped message for the non-existing communica... Terry Dontje
-