[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165258#comment-13165258
 ] 

Robert Joseph Evans commented on MAPREDUCE-3519:
------------------------------------------------

We have seen one other deadlock in the past when trying to lock the 
Configuration object.  The other one I am thinking of was specifically withing 
Configuration itself when trying to write out the configuration to a stream and 
the reader of the stream in the same process was trying to read in a 
configuration value.  That was also fixed by making sure that the configuration 
objects were separate instances.

In yarn with the message passing framework and how initialization happens 
automatically we are moving much more towards having a single global 
Configuration object shared by all parts of the system.  I think it is time for 
us to look at moving to read/write locks on the Configuration class.  This will 
not solve all cases of deadlocks, because we could still have a writer holding 
the lock and someone else trying to read or write holding a different lock. But 
both of the situations I have seen so far are two readers deadlocked on reads. 
No writes at all. With read write locks the reads would both be able to finish 
successfully without blocking.  This would also have the advantage of a 
potential performance boost for anyone using Configuration.  It would break 
backwards compatibility for anyone who is doing what we see here and 
synchronizing on configuration so it would have to be part of the release notes 
and we would have to inform downstream projects.  But I think that the benefits 
outweigh the drawbacks.  Especially for YARN where we are already changing a 
lot. 
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() 
> to avoid others to access that configuration object. But since it is set in 
> the base class of LocalDirsHandlerService using super.init(conf), it is 
> reflected and is accessible to some other services. This is causing a 
> deadlock when accessing this configuration object from 
> LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext 
> object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to