I wonder if they might be getting duplicate process names if started quickly enough. Do you get the "job has launched" message (orte-submit outputs a message after orte-dvm responds that the job launched)?
On Wed, Oct 14, 2015 at 12:04 PM, Mark Santcroos <mark.santcr...@rutgers.edu > wrote: > Hi, > > By hammering on a DVM with orte-submit I can reproducibly make orte-submit > not return, but hang instead. > The task is executed correctly though. > > It can be reproduced using the small snippet below. > Switching from sequential to "concurrent" execution of the orte-submit's > triggers the effect. > > Note that when I ctrl-c the orte-submit, I can re-use the DVM, so my hunch > would be that its a client-side issue. > > What MCA logging parameters might give more insight of whats happening? > > Thanks! > > Mark > > > > $ cat > orte_test.sh > #!/bin/sh > > for i in $(seq 42): > do > # GOOD > #orte-submit --hnp file:dvm_uri -np 1 /bin/date > > # BAD > orte-submit --hnp file:dvm_uri -np 1 /bin/date & > done > wait > ^D > $ chmod +x orte_test.sh > $ orte-dvm --report-uri dvm_uri & > DVM ready > $ ./orte_test.sh > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18165.php >