Hi!

> On 15 Oct 2015, at 4:38 , Ralph Castain <r...@open-mpi.org> wrote:
> 
> Okay, please try the attached patch. It will cause two messages to be output 
> for each job: one indicating the job has been marked terminated, and the 
> other reporting that the completion message was sent to the requestor. Let's 
> see what that tells us.

In this run of 42, 6 did not return, therefore 36 completed successfully.


$ grep TERMINATED dvm_output-patched.txt |wc -l
      72

$ grep NOTIFYING dvm_output-patched.txt |wc -l
      36

$ grep "Releasing job data" dvm_output-patched.txt |wc -l
      77

$ grep "sess_dir_finalize" dvm_output-patched.txt |wc -l
      36

$ grep "Releasing job data for.*," dvm_output-patched.txt|sort -k4 -t"," -n|wc 
-l
      35

So interestingly this is 35, and not 36.

$ grep "Releasing job data for.*," dvm_output-patched.txt|sort -k4 -t"," -n|head
[netbook:06716] [[9528,0],0] Releasing job data for [9528,2]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,8]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,9]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,10]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,12]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,13]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,14]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,15]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,16]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,17]

Which means task 1,3,4,5,6,7,11 didn't return. Which shows a clear bias towards 
the "early" tasks.


Hopefully this provides you more insight.

Thanks!

Mark

Reply via email to