[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514007#comment-14514007
 ] 

Hudson commented on MAPREDUCE-6252:
-----------------------------------

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #167 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/167/])
MAPREDUCE-6252. JobHistoryServer should not fail when encountering a (devaraj: 
rev 5e67c4d384193b38a85655c8f93193596821faa5)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestHistoryFileManager.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
* hadoop-mapreduce-project/CHANGES.txt


> JobHistoryServer should not fail when encountering a missing directory
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6252
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6252
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 2.6.0
>            Reporter: Craig Welch
>            Assignee: Craig Welch
>             Fix For: 2.8.0
>
>         Attachments: MAPREDUCE-6252.0.patch, MAPREDUCE-6252.1.patch
>
>
> The JobHistoryServer maintains a cache of job serial number parts to dfs 
> paths which it uses when seeking a job it no longer has in its memory cache, 
> multiple directories for a given serial number differentiated by time stamp.  
> At present the jobhistory server will fail any time it attempts to find a job 
> in a directory which no longer exists based on that cache - even though the 
> job may well exist in a different directory for the serial number.  Typically 
> this is not an issue, but the history cleanup process removes the directory 
> from dfs before removing it from the cache which leaves a window of time 
> where a directory may be missing from dfs which is present in the cache, 
> resulting in failure.  For some dfs's it appears that the top level directory 
> may become unavailable some time before the full deletion of the tree 
> completes which extends what might otherwise be a brief period of failure to 
> a more extended period.  Further, this also places the service at the mercy 
> of outside processes which might remove those directories.  The proposal is 
> simply to make the server resistant to this state such that encountering this 
> missing directory is not fatal and the process will continue on to seek it 
> elsewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to