[ 
https://issues.apache.org/jira/browse/HADOOP-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-5664:
----------------------------------

    Attachment: 5664-0.patch

Discussing this with Owen, he had an idea that has all the properties of the 
extent design without requiring a major rewrite. Since the collection thread 
can determine when it next needs to validate the state of the buffer, it can 
freely write up to that boundary.

Attached is a preliminary patch that removes not only some of the excessive 
spill synchronization, but it also drops the synchronization on 
MapOutputBuffer.Buffer methods. Since the buffer is only really protected by 
the lock on collect, synchronizing the buffer methods can only make potential 
corruption well-ordered.

Bryan: would you mind testing this with your job? I'm still working through its 
correctness (it passes unit tests, at least), but I'm curious about its effect 
in your environment.

> Use of ReentrantLock.lock() in MapOutputBuffer takes up too much cpu time
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5664
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5664
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Bryan Duxbury
>            Assignee: Chris Douglas
>            Priority: Minor
>         Attachments: 5664-0.patch
>
>
> In examining a profile of one of my mappers today, I noticed that the method 
> ReentrantLock.lock() in MapTask$MapOutputBuffer seems to be taking up ~11 
> seconds out of around 100 seconds total. It seems like 10% is an awfully 
> large amount of time to spend in this lock. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to