> On 16 Oct 2015, at 0:23 , Ralph Castain <r...@open-mpi.org> wrote: > Okay, that means that the dvm isn't recognizing that the jobs actually > completed.
Ok. > So the question is: what is it about those jobs? They are all the same. > Are those 6 jobs very short-lived, and the others are longer-lived? All very short lived, as thats the easiest to reproduce it. > If you look at the nodes (before you kill the dvm), are any of those procs > still there? I originally ran into this on a large machine, but can reproduce it easily on my laptop so the results I've been sending in the last messages are from runs on my laptop. The stalled orte-submits are still there obviously, but the actual task process is no longer active.