[jira] [Created] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

Anthony Hsu (JIRA) Mon, 27 Aug 2018 23:47:54 -0700

Anthony Hsu created MAPREDUCE-7131:
--------------------------------------

             Summary: Job History Server has race condition where it moves 
files from intermediate to finished but thinks file is in intermediate
                 Key: MAPREDUCE-7131
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.7.4
            Reporter: Anthony Hsu



This is the race condition that can occur:

# during the first *scanIntermediateDirectory()*, 
*HistoryFileInfo.moveToDone()* is scheduled for job j1
# during the second *scanIntermediateDirectory()*, j1 is found again and put in 
the *fileStatusList* to process
# *HistoryFileInfo.moveToDone()* is processed in another thread and history 
files are moved to the finished directory
# the *HistoryFileInfo* for j1 is removed from *jobListCache*
# the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 is 
created (history, conf, and summary files will point to the intermediate user 
directory, and state will be IN_INTERMEDIATE)
# *moveToDone()* is scheduled for this new j1
# *moveToDone()* fails during *moveToDoneNow()* for the history file because 
the source path in the intermediate directory does not exist

>From this point on, while the new j1 *HistoryFileInfo* is in the 
>*jobListCache*, the JobHistoryServer will think the history file is in the 
>intermediate directory. If a user queries this job in the JobHistoryServer UI, 
>they will get

{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history 
file 
<scheme>://<host>:<port>/mr-history/intermediate/<user>/job_1529348381246_27275711-1535123223269-<user>-<jobname>-1535127026668-1-0-SUCCEEDED-<queue>-1535126980787.jhist
{code}

Noticed this issue while running 2.7.4, but the race condition seems to still 
exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

Reply via email to