[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796996#action_12796996 ]
Amar Kamat commented on MAPREDUCE-1342: --------------------------------------- What if we move the code from JobTracker.blacklistedTaskTrackers() to FaultyTrackersInfo. Something like {code} FaultyTrackersInfo { blacklistedTaskTrackers { synchronized (potentiallyFaultyTrackers) { synchronized (taskTrackers) { // code that we have today JobTracker.blacklistedTaskTrackers() for (TaskTracker tt : taskTrackers.values()) { // ... } } } } } blacklistedTaskTrackers() { return FaultyTrackersInfo.blacklistedTaskTrackers() } {code} This kindof solves the lock reversal issue we are facing now and also makes more sense because JobTracker.FaultyTrackersInfo is the right module to answer the blacklistedTaskTrackers() query. Thoughts? > Potential JT deadlock in faulty TT tracking > ------------------------------------------- > > Key: MAPREDUCE-1342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Attachments: cycle0.png, mapreduce-1342-1.patch, > mapreduce-1342-2.patch > > > JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, > and then calls blackListTracker, which calls removeHostCapacity, which locks > JT.taskTrackers > On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then > calls faultyTrackers.isBlacklisted() which goes on to lock > potentiallyFaultyTrackers. > I haven't produced such a deadlock, but the lock ordering here is inverted > and therefore could deadlock. > Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.