[
https://issues.apache.org/jira/browse/HADOOP-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Douglas updated HADOOP-5664:
----------------------------------
Attachment: 5664-0.patch
Discussing this with Owen, he had an idea that has all the properties of the
extent design without requiring a major rewrite. Since the collection thread
can determine when it next needs to validate the state of the buffer, it can
freely write up to that boundary.
Attached is a preliminary patch that removes not only some of the excessive
spill synchronization, but it also drops the synchronization on
MapOutputBuffer.Buffer methods. Since the buffer is only really protected by
the lock on collect, synchronizing the buffer methods can only make potential
corruption well-ordered.
Bryan: would you mind testing this with your job? I'm still working through its
correctness (it passes unit tests, at least), but I'm curious about its effect
in your environment.
> Use of ReentrantLock.lock() in MapOutputBuffer takes up too much cpu time
> -------------------------------------------------------------------------
>
> Key: HADOOP-5664
> URL: https://issues.apache.org/jira/browse/HADOOP-5664
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.1
> Reporter: Bryan Duxbury
> Assignee: Chris Douglas
> Priority: Minor
> Attachments: 5664-0.patch
>
>
> In examining a profile of one of my mappers today, I noticed that the method
> ReentrantLock.lock() in MapTask$MapOutputBuffer seems to be taking up ~11
> seconds out of around 100 seconds total. It seems like 10% is an awfully
> large amount of time to spend in this lock.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.