[ https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526438#comment-14526438 ]
Laxman commented on MAPREDUCE-6351: ----------------------------------- "Threads analysis" mentioned in description above found to be incorrect when I retrace the code flow. Pre-notification is not a problem as merger wait is guarded by size check. However, problem exists, fetchers are not proceeding and waiting for merger to free some memory and merge doing nothing. > Reducer hung in copy phase. > --------------------------- > > Key: MAPREDUCE-6351 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.6.0 > Reporter: Laxman > Attachments: jstat-gc.log, reducer-container-partial.log.zip, > thread-dumps.out > > > *Problem* > Reducer gets stuck in copy phase and doesn't make progress for very long > time. After killing this task for couple of times manually, it gets > completed. > *Observations* > - Verfied gc logs. Found no memory related issues. Attached the logs. > - Verified thread dumps. Found no thread related problems. > - On verification of logs, fetcher threads are not copying the map outputs > and they are just waiting for merge to happen. > - Merge thread is alive and in wait state. > *Analysis* > On careful observation of logs, thread dumps and code, this looks to me like > a classic case of multi-threading issue. Thread goes to wait state after it > has been notified. > Here is the suspect code flow. > *Thread #1* > Fetcher thread - notification comes first > org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(Set<T>) > {code} > synchronized(pendingToBeMerged) { > pendingToBeMerged.addLast(toMergeInputs); > pendingToBeMerged.notifyAll(); > } > {code} > *Thread #2* > Merge Thread - goes to wait state (Notification goes unconsumed) > org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() > {code} > synchronized (pendingToBeMerged) { > while(pendingToBeMerged.size() <= 0) { > pendingToBeMerged.wait(); > } > // Pickup the inputs to merge. > inputs = pendingToBeMerged.removeFirst(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)