[jira] [Commented] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry

Karthik Kambatla (JIRA) Thu, 20 Oct 2016 22:52:20 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15594162#comment-15594162
 ]


Karthik Kambatla commented on MAPREDUCE-6797:
---------------------------------------------

If multiple threads call {{addIfAbsent}} simultaneously, is it possible they 
process the same {{HistoryFileInfo}}? How do we ensure only one thread is 
processing a file? 

> Job history server scans can become blocked on a single, slow entry
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6797
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 2.4.0, 2.8.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Critical
>         Attachments: jstack
>
>
> There is one more piece of code in HistoryFileManager where Synchronized 
> keyword on HistoryFileInfo need to be removed. The JobHistoryServer 
> contention issue is hit on our environment where stacktrace (attached) shows 
> the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock 
> on HistoryFileInfo.
> Synchronized on isMovePending and didMoveFail has been removed by 
> Mapreduce-6684.
> {code}
> HistoryFileInfo firstValue = cache.get(key);
>     synchronized(firstValue) {  ---------------> Synchronized is not needed 
> here
>               if (firstValue.isMovePending()) {
>                 if(firstValue.didMoveFail() && 
>                     firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
>                   cache.remove(key);
>                   //Now lets try to delete it
>                   try {
>                     firstValue.delete();
>                   } catch (IOException e) {
>                     LOG.error("Error while trying to delete history files" +
>                     " that could not be moved to done.", e);
>                   }
>                 } else {
>                   LOG.warn("Waiting to remove " + key
>                       + " from JobListCache because it is not in done yet.");
>                 }
>               } else {
>                 cache.remove(key);
>               }
>             }
> {code}
> {code}
> Note: stacktrace is from hadoop-2.4.0 version and the problem exists in 
> latest hadoop as well
> "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x0000000001e13800 
> nid=0xf133 waiting for monitor entry [0x00007f7c1d8dd000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
>         - waiting to lock <0x000000040145c4d8> (a 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
>         at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
>         at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
>         at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
>         - locked <0x0000000400375388> (a 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
>         at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
>         at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
>         at 
> org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
>         at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry

Reply via email to