[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318638#comment-14318638
 ] 

Craig Welch commented on MAPREDUCE-6252:
----------------------------------------

Not at all, will do.

> JobHistoryServer should not fail when encountering a missing directory
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6252
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6252
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 2.6.0
>            Reporter: Craig Welch
>            Assignee: Craig Welch
>         Attachments: MAPREDUCE-6252.0.patch
>
>
> The JobHistoryServer maintains a cache of job serial number parts to dfs 
> paths which it uses when seeking a job it no longer has in its memory cache, 
> multiple directories for a given serial number differentiated by time stamp.  
> At present the jobhistory server will fail any time it attempts to find a job 
> in a directory which no longer exists based on that cache - even though the 
> job may well exist in a different directory for the serial number.  Typically 
> this is not an issue, but the history cleanup process removes the directory 
> from dfs before removing it from the cache which leaves a window of time 
> where a directory may be missing from dfs which is present in the cache, 
> resulting in failure.  For some dfs's it appears that the top level directory 
> may become unavailable some time before the full deletion of the tree 
> completes which extends what might otherwise be a brief period of failure to 
> a more extended period.  Further, this also places the service at the mercy 
> of outside processes which might remove those directories.  The proposal is 
> simply to make the server resistant to this state such that encountering this 
> missing directory is not fatal and the process will continue on to seek it 
> elsewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to