User Keith Myers (UID 147145 at http://milkyway.cs.rpi.edu/milkyway/index.php) 
has asked for my help in identifying task failures at Milkyway.
At my suggestion, he installed Windows client v7.6.2, and the attached message 
log extracts show the enhanced <slot_debug> output that helped identify the 
CMS-dev problem.
In both cases, the task under scrutiny
(1) de_fast_15_3s_136_sim1Jun1_1_1434554402_7775504_0, 
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1181200273
(2) ps_fast_15_3s_136_sim1Jun1_1_1434554402_7806437_0, 
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1181298220
was declared 'Validate error', and the <stderr_txt> section is empty. In the 
special case of Milkyway@Home, these two observations are linked, because the 
science result is returned in stderr, not a separate upload file.
Also in both cases, the <slot_debug> log contains
[slot] failed to remove file slots/x/stderr.txt: unlink() failed

between 'handle_exited_app()' and 'Computation for task ... finished '
It appears that there is a race condition, whereby BOINC tries (and fails) to 
delete stderr.txt before the operating system has released the write lock. This 
(I'm presuming) also explains why the file appears empty when read off the disk 
for incorporation into the client_state structure in memory, prior to reporting 
the completed task to the project.
In order the preserve the scientific result at Milkyway (and debug and other 
useful information at other projects), the client should not initiate 
'handle_exited_app()' until it has confirmed that the write lock on stderr.txt 
has been released.

Log 1 also shows that the additional safeguards on cleaning out slots are 
working properly: if both handle_exited_app() and get_free_slot() fail to 
delete the file, the next task isn't started in the not-empty slot (11), but in 
slot 14 instead. And when slot 11 is tested again at the next get_free_slot(), 
the delete succeeds and the now-empty slot is reused.
7/8/2015 3:55:15 PM | Milkyway@Home | [slot] assigning slot 11 to 
de_fast_15_3s_136_sim1Jun1_1_1434554402_7775504_0
7/8/2015 3:55:15 PM |  | [slot] removed file slots/11/init_data.xml
7/8/2015 3:55:15 PM | Milkyway@Home | [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
 to 
slots/11/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
7/8/2015 3:55:15 PM | Milkyway@Home | [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/parameters-15-3s-sim-fast.txt to 
slots/11/astronomy_parameters.txt
7/8/2015 3:55:15 PM | Milkyway@Home | [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/stars-15-sim-1Jun1.txt to 
slots/11/stars.txt
7/8/2015 3:55:15 PM |  | [slot] removed file slots/11/boinc_temporary_exit
7/8/2015 3:55:15 PM | Milkyway@Home | Starting task 
de_fast_15_3s_136_sim1Jun1_1_1434554402_7775504_0
7/8/2015 3:55:15 PM | Milkyway@Home | [cpu_sched] Starting task 
de_fast_15_3s_136_sim1Jun1_1_1434554402_7775504_0 using 
milkyway_separation__modified_fit version 136 (opencl_nvidia_101) in slot 11
7/8/2015 3:55:16 PM | Milkyway@Home | Sending scheduler request: To fetch work.
7/8/2015 3:55:16 PM | Milkyway@Home | Reporting 1 completed tasks
7/8/2015 3:55:16 PM | Milkyway@Home | Requesting new tasks for NVIDIA GPU
7/8/2015 3:55:18 PM | Milkyway@Home | Scheduler request completed: got 1 new 
tasks
7/8/2015 3:55:18 PM |  | [slot] removed file 
projects/milkyway.cs.rpi.edu_milkyway/de_80_DR8_Rev_8_5_00004_1434551187_10653037_0_0
7/8/2015 3:55:18 PM |  | [slot] removed file 
projects/milkyway.cs.rpi.edu_milkyway/de_80_DR8_Rev_8_5_00004_1434551187_10653037_0_0.gz
7/8/2015 3:55:18 PM |  | [slot] removed file 
projects/milkyway.cs.rpi.edu_milkyway/de_80_DR8_Rev_8_5_00004_1434551187_10653037_0_0.gzt
7/8/2015 3:55:26 PM |  | [slot] cleaning out slots/2: handle_exited_app()
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/boinc_finish_called
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/boinc_task_state.xml
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/cudart32_50_35.dll
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/cufft32_50_35.dll
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/init_data.xml
7/8/2015 3:55:26 PM |  | [slot] removed file 
slots/2/Lunatics_x41zc_win32_cuda50.exe
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/mbcuda.cfg
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/result.sah
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/state.sah
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/stderr.txt
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/work_unit.sah
7/8/2015 3:55:26 PM |  | [slot] cleaning out slots/2: get_free_slot()
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/init_data.xml
7/8/2015 3:55:26 PM |  | [slot] removed file slots/2/boinc_temporary_exit
7/8/2015 3:55:31 PM |  | [slot] removed file 
projects/setiathome.berkeley.edu/30ja15ab.5711.328466.438086664199.12.107_1_0
7/8/2015 3:55:31 PM |  | [slot] removed file 
projects/setiathome.berkeley.edu/30ja15ab.5711.328466.438086664199.12.107_1_0.gz
7/8/2015 3:55:31 PM |  | [slot] removed file 
projects/setiathome.berkeley.edu/30ja15ab.5711.328466.438086664199.12.107_1_0.gzt
7/8/2015 3:56:05 PM |  | [slot] cleaning out slots/11: handle_exited_app()
7/8/2015 3:56:05 PM |  | [slot] removed file slots/11/astronomy_parameters.txt
7/8/2015 3:56:05 PM |  | [slot] removed file slots/11/boinc_finish_called
7/8/2015 3:56:05 PM |  | [slot] removed file slots/11/init_data.xml
7/8/2015 3:56:05 PM |  | [slot] removed file 
slots/11/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
7/8/2015 3:56:05 PM |  | [slot] removed file slots/11/separation_checkpoint
7/8/2015 3:56:05 PM |  | [slot] removed file slots/11/stars.txt
7/8/2015 3:56:05 PM |  | [slot] failed to remove file slots/11/stderr.txt: 
unlink() failed
7/8/2015 3:56:05 PM | Milkyway@Home | Computation for task 
de_fast_15_3s_136_sim1Jun1_1_1434554402_7775504_0 finished
7/8/2015 3:56:05 PM |  | [slot] cleaning out slots/11: get_free_slot()
7/8/2015 3:56:05 PM |  | [slot] failed to remove file slots/11/stderr.txt: 
unlink() failed
7/8/2015 3:56:05 PM | Milkyway@Home | [slot] failed to clean out dir: unlink() 
failed
7/8/2015 3:56:05 PM | Milkyway@Home | [slot] assigning slot 14 to 
de_fast_15_3s_136_sim1Jun1_1_1434554402_7777240_0
7/8/2015 3:56:05 PM |  | [slot] removed file slots/14/init_data.xml
7/8/2015 3:56:05 PM | Milkyway@Home | [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
 to 
slots/14/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
7/8/2015 3:56:05 PM | Milkyway@Home | [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/parameters-15-3s-sim-fast.txt to 
slots/14/astronomy_parameters.txt
7/8/2015 3:56:05 PM | Milkyway@Home | [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/stars-15-sim-1Jun1.txt to 
slots/14/stars.txt
7/8/2015 3:56:05 PM |  | [slot] removed file slots/14/boinc_temporary_exit
7/8/2015 3:56:05 PM | Milkyway@Home | Starting task 
de_fast_15_3s_136_sim1Jun1_1_1434554402_7777240_0
7/8/2015 3:56:05 PM | Milkyway@Home | [cpu_sched] Starting task 
de_fast_15_3s_136_sim1Jun1_1_1434554402_7777240_0 using 
milkyway_separation__modified_fit version 136 (opencl_nvidia_101) in slot 14
7/8/2015 3:56:23 PM | Milkyway@Home | Sending scheduler request: To fetch work.
7/8/2015 3:56:23 PM | Milkyway@Home | Reporting 1 completed tasks
7/8/2015 3:56:23 PM | Milkyway@Home | Requesting new tasks for NVIDIA GPU
7/8/2015 3:56:25 PM | Milkyway@Home | Scheduler request completed: got 1 new 
tasks
7/8/2015 3:56:25 PM |  | [slot] removed file 
projects/milkyway.cs.rpi.edu_milkyway/de_modfit_fast_15_3s_136_sim1Jun1_2_1434554402_7852751_0_0
7/8/2015 3:56:25 PM |  | [slot] removed file 
projects/milkyway.cs.rpi.edu_milkyway/de_modfit_fast_15_3s_136_sim1Jun1_2_1434554402_7852751_0_0.gz
7/8/2015 3:56:25 PM |  | [slot] removed file 
projects/milkyway.cs.rpi.edu_milkyway/de_modfit_fast_15_3s_136_sim1Jun1_2_1434554402_7852751_0_0.gzt
7/8/2015 3:56:55 PM |  | [slot] cleaning out slots/14: handle_exited_app()
7/8/2015 3:56:55 PM |  | [slot] removed file slots/14/astronomy_parameters.txt
7/8/2015 3:56:55 PM |  | [slot] removed file slots/14/boinc_finish_called
7/8/2015 3:56:55 PM |  | [slot] removed file slots/14/init_data.xml
7/8/2015 3:56:55 PM |  | [slot] removed file 
slots/14/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
7/8/2015 3:56:55 PM |  | [slot] removed file slots/14/separation_checkpoint
7/8/2015 3:56:55 PM |  | [slot] removed file slots/14/stars.txt
7/8/2015 3:56:55 PM |  | [slot] removed file slots/14/stderr.txt
7/8/2015 3:56:55 PM | Milkyway@Home | Computation for task 
de_fast_15_3s_136_sim1Jun1_1_1434554402_7777240_0 finished
7/8/2015 3:56:55 PM |  | [slot] cleaning out slots/11: get_free_slot()
7/8/2015 3:56:55 PM |  | [slot] removed file slots/11/stderr.txt
7/8/2015 3:56:55 PM | Milkyway@Home | [slot] assigning slot 11 to 
de_80_DR8_Rev_8_5_00004_1434551187_10549411_1
7/8/2015 3:56:55 PM |  | [slot] removed file slots/11/init_data.xml
7/8/2015 3:56:55 PM | Milkyway@Home | [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation_1.02_windows_x86_64__opencl_nvidia.exe
 to slots/11/milkyway_separation_1.02_windows_x86_64__opencl_nvidia.exe
7/8/2015 3:56:55 PM | Milkyway@Home | [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/80_rev_8_5.prmtrs to 
slots/11/astronomy_parameters.txt
7/8/2015 3:56:55 PM | Milkyway@Home | [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/80_Rev_8_3.stars to 
slots/11/stars.txt
7/8/2015 3:56:55 PM |  | [slot] removed file slots/11/boinc_temporary_exit
7/8/2015 3:56:55 PM | Milkyway@Home | Starting task 
de_80_DR8_Rev_8_5_00004_1434551187_10549411_1
7/8/2015 3:56:55 PM | Milkyway@Home | [cpu_sched] Starting task 
de_80_DR8_Rev_8_5_00004_1434551187_10549411_1 using milkyway version 102 
(opencl_nvidia) in slot 11
9964    Milkyway@Home   7/8/2015 5:40:58 PM     [slot] assigning slot 5 to 
ps_fast_15_3s_136_sim1Jun1_1_1434554402_7806437_0    
9965                    7/8/2015 5:40:58 PM     [slot] removed file 
slots/5/init_data.xml       
9966    Milkyway@Home   7/8/2015 5:40:58 PM     [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
 to 
slots/5/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
  
9967    Milkyway@Home   7/8/2015 5:40:58 PM     [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/parameters-15-3s-sim-fast.txt to 
slots/5/astronomy_parameters.txt     
9968    Milkyway@Home   7/8/2015 5:40:58 PM     [slot] linked 
../../projects/milkyway.cs.rpi.edu_milkyway/stars-15-sim-1Jun1.txt to 
slots/5/stars.txt   
9969                    7/8/2015 5:40:58 PM     [slot] removed file 
slots/5/boinc_temporary_exit        
9970    Milkyway@Home   7/8/2015 5:40:58 PM     Starting task 
ps_fast_15_3s_136_sim1Jun1_1_1434554402_7806437_0 
9971    Milkyway@Home   7/8/2015 5:40:58 PM     [cpu_sched] Starting task 
ps_fast_15_3s_136_sim1Jun1_1_1434554402_7806437_0 using 
milkyway_separation__modified_fit version 136 (opencl_nvidia_101) in slot 5   
9972    Milkyway@Home   7/8/2015 5:41:16 PM     Sending scheduler request: To 
fetch work.       
9973    Milkyway@Home   7/8/2015 5:41:16 PM     Reporting 1 completed tasks     
9974    Milkyway@Home   7/8/2015 5:41:16 PM     Requesting new tasks for NVIDIA 
GPU     
9975    Milkyway@Home   7/8/2015 5:41:18 PM     Scheduler request completed: 
got 1 new tasks    
9976                    7/8/2015 5:41:18 PM     [slot] removed file 
projects/milkyway.cs.rpi.edu_milkyway/de_80_DR8_Rev_8_5_00004_1434551187_10687133_0_0
       
9977                    7/8/2015 5:41:18 PM     [slot] removed file 
projects/milkyway.cs.rpi.edu_milkyway/de_80_DR8_Rev_8_5_00004_1434551187_10687133_0_0.gz
    
9978                    7/8/2015 5:41:18 PM     [slot] removed file 
projects/milkyway.cs.rpi.edu_milkyway/de_80_DR8_Rev_8_5_00004_1434551187_10687133_0_0.gzt
   
9979                    7/8/2015 5:41:31 PM     [slot] cleaning out slots/6: 
handle_exited_app()        
9980                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/AKv8c_r2549_winx86-64_SSE42xjfs.exe 
9981                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/boinc_finish_called 
9982                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/boinc_task_state.xml        
9983                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/init_data.xml       
9984                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/libfftw3f-3-3-4_x64.dll     
9985                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/mb_cmdline.txt      
9986                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/result.sah  
9987                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/state.sah   
9988                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/stderr.txt  
9989                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/work_unit.sah       
9991                    7/8/2015 5:41:31 PM     [slot] cleaning out slots/6: 
get_free_slot()    
9993                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/init_data.xml       
9999                    7/8/2015 5:41:31 PM     [slot] removed file 
slots/6/boinc_temporary_exit        
10004                   7/8/2015 5:41:36 PM     [slot] removed file 
projects/setiathome.berkeley.edu/07ja15aa.31640.1906454.438086664205.12.59.vlar_0_0
 
10005                   7/8/2015 5:41:36 PM     [slot] removed file 
projects/setiathome.berkeley.edu/07ja15aa.31640.1906454.438086664205.12.59.vlar_0_0.gz
      
10006                   7/8/2015 5:41:36 PM     [slot] removed file 
projects/setiathome.berkeley.edu/07ja15aa.31640.1906454.438086664205.12.59.vlar_0_0.gzt
     
10007                   7/8/2015 5:41:44 PM     [slot] cleaning out slots/3: 
handle_exited_app()        
10008                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/boinc_finish_called 
10009                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/boinc_task_state.xml        
10010                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/cudart32_50_35.dll  
10011                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/cufft32_50_35.dll   
10012                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/init_data.xml       
10013                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/Lunatics_x41zc_win32_cuda50.exe     
10014                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/mbcuda.cfg  
10015                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/result.sah  
10016                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/state.sah   
10017                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/stderr.txt  
10018                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/work_unit.sah       
10020                   7/8/2015 5:41:44 PM     [slot] cleaning out slots/3: 
get_free_slot()    
10022                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/init_data.xml       
10029                   7/8/2015 5:41:44 PM     [slot] removed file 
slots/3/boinc_temporary_exit        
10033                   7/8/2015 5:41:48 PM     [slot] cleaning out slots/5: 
handle_exited_app()        
10034                   7/8/2015 5:41:48 PM     [slot] removed file 
slots/5/astronomy_parameters.txt    
10035                   7/8/2015 5:41:48 PM     [slot] removed file 
slots/5/boinc_finish_called 
10036                   7/8/2015 5:41:48 PM     [slot] removed file 
slots/5/init_data.xml       
10037                   7/8/2015 5:41:48 PM     [slot] removed file 
slots/5/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
        
10038                   7/8/2015 5:41:48 PM     [slot] removed file 
slots/5/separation_checkpoint       
10039                   7/8/2015 5:41:48 PM     [slot] removed file 
slots/5/stars.txt   
10040                   7/8/2015 5:41:48 PM     [slot] failed to remove file 
slots/5/stderr.txt: unlink() failed        
10042                   7/8/2015 5:41:48 PM     [slot] removed file 
projects/setiathome.berkeley.edu/30no14ab.7228.271858.438086664199.12.232_1_0   
    
10043                   7/8/2015 5:41:48 PM     [slot] removed file 
projects/setiathome.berkeley.edu/30no14ab.7228.271858.438086664199.12.232_1_0.gz
    
10044                   7/8/2015 5:41:48 PM     [slot] removed file 
projects/setiathome.berkeley.edu/30no14ab.7228.271858.438086664199.12.232_1_0.gzt
   
10045   Milkyway@Home   7/8/2015 5:41:48 PM     Computation for task 
ps_fast_15_3s_136_sim1Jun1_1_1434554402_7806437_0 finished 
10046                   7/8/2015 5:41:48 PM     [slot] cleaning out slots/5: 
get_free_slot()    
10047                   7/8/2015 5:41:48 PM     [slot] removed file 
slots/5/stderr.txt  
10048   Milkyway@Home   7/8/2015 5:41:48 PM     [slot] assigning slot 5 to 
ps_modfit_fast_15_3s_136_sim1Jun1_1_1434554402_7816077_0     
10049                   7/8/2015 5:41:48 PM     [slot] removed file 
slots/5/init_data.xml
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to