[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman updated MAPREDUCE-6351:
------------------------------
    Attachment: thread-dumps.out
                reducer-container-partial.log.zip
                jstat-gc.log

Attached the logs (container log, thread dumps, jstat output) for reference.

Please note that, my thoughts on threading issue may be premature and 
incorrect. Irrespective of this analysis problem exists.

> Reducer hung in copy phase.
> ---------------------------
>
>                 Key: MAPREDUCE-6351
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.6.0
>            Reporter: Laxman
>         Attachments: jstat-gc.log, reducer-container-partial.log.zip, 
> thread-dumps.out
>
>
> *Problem*
> Reducer gets stuck in copy phase and doesn't make progress for very long 
> time. After killing this task for couple of times manually, it gets 
> completed. 
> *Analysis*
> - Verfied gc logs. Found no memory related issues. Attache
> - Verified thread dumps. Found no thread related problems. 
> - On verification of logs, fetcher threads are not copying the map outputs 
> and they are just waiting for merge to happen.
> - Merge thread is alive and in wait state.
> On careful observation of logs, thread dumps and code, this looks to me like 
> a classic case of multi-threading issue. Thread goes to wait state after it 
> has been notified. 
> Here is the suspect code flow.
> *Thread #1*
> Fetcher thread - notification comes first
> org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(Set<T>)
> {code}
>       synchronized(pendingToBeMerged) {
>         pendingToBeMerged.addLast(toMergeInputs);
>         pendingToBeMerged.notifyAll();
>       }
> {code}
> *Thread #2*
> Merge Thread - goes to wait state (Notification goes unconsumed)
> org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
> {code}
>         synchronized (pendingToBeMerged) {
>           while(pendingToBeMerged.size() <= 0) {
>             pendingToBeMerged.wait();
>           }
>           // Pickup the inputs to merge.
>           inputs = pendingToBeMerged.removeFirst();
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to