I wonder if they might be getting duplicate process names if started
quickly enough. Do you get the "job has launched" message (orte-submit
outputs a message after orte-dvm responds that the job launched)?



On Wed, Oct 14, 2015 at 12:04 PM, Mark Santcroos <mark.santcr...@rutgers.edu
> wrote:

> Hi,
>
> By hammering on a DVM with orte-submit I can reproducibly make orte-submit
> not return, but hang instead.
> The task is executed correctly though.
>
> It can be reproduced using the small snippet below.
> Switching from sequential to "concurrent" execution of the orte-submit's
> triggers the effect.
>
> Note that when I ctrl-c the orte-submit, I can re-use the DVM, so my hunch
> would be that its a client-side issue.
>
> What MCA logging parameters might give more insight of whats happening?
>
> Thanks!
>
> Mark
>
>
>
> $ cat > orte_test.sh
> #!/bin/sh
>
> for i in $(seq 42):
> do
>     # GOOD
>     #orte-submit --hnp file:dvm_uri -np 1 /bin/date
>
>     # BAD
>     orte-submit --hnp file:dvm_uri -np 1 /bin/date &
> done
> wait
> ^D
> $ chmod +x orte_test.sh
> $ orte-dvm --report-uri dvm_uri &
> DVM ready
> $ ./orte_test.sh
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/10/18165.php
>

Reply via email to