[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517238#comment-14517238
 ] 

Hudson commented on MAPREDUCE-6252:
-----------------------------------

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2127/])
Moving MAPREDUCE-6252 to the 2.7.1 CHANGES.txt (devaraj: rev 
99fe03e439b0f9afd01754d998c6eb64f0f70300)
* hadoop-mapreduce-project/CHANGES.txt


> JobHistoryServer should not fail when encountering a missing directory
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6252
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6252
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 2.6.0
>            Reporter: Craig Welch
>            Assignee: Craig Welch
>             Fix For: 2.8.0, 2.7.1
>
>         Attachments: MAPREDUCE-6252.0.patch, MAPREDUCE-6252.1.patch
>
>
> The JobHistoryServer maintains a cache of job serial number parts to dfs 
> paths which it uses when seeking a job it no longer has in its memory cache, 
> multiple directories for a given serial number differentiated by time stamp.  
> At present the jobhistory server will fail any time it attempts to find a job 
> in a directory which no longer exists based on that cache - even though the 
> job may well exist in a different directory for the serial number.  Typically 
> this is not an issue, but the history cleanup process removes the directory 
> from dfs before removing it from the cache which leaves a window of time 
> where a directory may be missing from dfs which is present in the cache, 
> resulting in failure.  For some dfs's it appears that the top level directory 
> may become unavailable some time before the full deletion of the tree 
> completes which extends what might otherwise be a brief period of failure to 
> a more extended period.  Further, this also places the service at the mercy 
> of outside processes which might remove those directories.  The proposal is 
> simply to make the server resistant to this state such that encountering this 
> missing directory is not fatal and the process will continue on to seek it 
> elsewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to