[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-1342: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-0.21.txt, patch-1342-1.txt, patch-1342-2-ydist.txt, patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Attachment: patch-1342-0.21.txt Patch for branch 0.21. Ran test-patch and ant test. All tests passed. Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-0.21.txt, patch-1342-1.txt, patch-1342-2-ydist.txt, patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1342: - Environment: Fixed a potential deadlock in the global blacklist of tasktrackers feature. Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Environment: Fixed a potential deadlock in the global blacklist of tasktrackers feature. Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1342: - Environment: (was: Fixed a potential deadlock in the global blacklist of tasktrackers feature.) Release Note: Fix for a potential deadlock in the global blacklist of tasktrackers feature. Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Status: Patch Available (was: Open) Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Attachment: patch-1342-3.txt Patch for trunk Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Attachment: patch-1342-3-ydist.txt Added comments about locking order assumptions to methods JobTracker.addNewTracker and JobTracker.removeTracker. Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Attachment: patch-1342-2.txt Patch with Arun's comments incorporated. Now, taskTrackers or potentiallyFaultyTrackers is always locked holding JobTracker lock. The newly synchronized methods are called from testcases or already synchronized methods. Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Status: Patch Available (was: Open) Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Attachment: patch-1342-2-ydist.txt Patch for Yahoo! distribution Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, patch-1342-2.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Attachment: patch-1342-2.txt Attaching the patch again. As Hudson picked up wrong patch. Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, patch-1342-2.txt, patch-1342-2.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Assignee: Amareshwari Sriramadasu Status: Open (was: Patch Available) Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Status: Patch Available (was: Open) Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1342: - Status: Open (was: Patch Available) Shouldn't we make JobTracker.getFaultCount and JobTracker.taskTrackers too? Oh, and thanks for your help Todd! Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Attachment: patch-1342.txt Patch making the methods activeTaskTrackers(), blacklistedTaskTrackers() and taskTrackerNames() synchronized. These are the methods which lock taskTrackers and then potentiallyFaultyTrackers, without JobTracker lock. Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Todd Lipcon Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Fix Version/s: 0.21.0 Affects Version/s: (was: 0.22.0) 0.20.1 Status: Patch Available (was: Open) Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1342: --- Attachment: patch-1342-ydist.txt Patch for Y! distribution. Ran test-patch and ant test. All the tests passed except TestKillSubProcesses(due to MAPREDUCE-408). Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon Fix For: 0.21.0 Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch, patch-1342-ydist.txt, patch-1342.txt JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreekanth Ramakrishnan updated MAPREDUCE-1342: -- Attachment: mapreduce-1342-1.patch Attaching a patch, removes the need to lock on faultyTrackerInfo, by changing the field to a concurrent hash map and not locking on addition and removal. Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Todd Lipcon Attachments: cycle0.png, mapreduce-1342-1.patch JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreekanth Ramakrishnan updated MAPREDUCE-1342: -- Attachment: mapreduce-1342-2.patch Attaching new patch after discussion with Amar. Made the map concurrent map and changed the getters not to lock on the map. This way we will remove the lock on the second resource for Client API's which don't lock on {{JobTracker}} Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Todd Lipcon Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
[ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-1342: --- Attachment: cycle0.png Here's the output from jcarder that shows the cycle (this was detected while running TestLostTracker with jcarder instrumentation using the branch at http://github.com/toddlipcon/jcarder/tree/cloudera) Potential JT deadlock in faulty TT tracking --- Key: MAPREDUCE-1342 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Todd Lipcon Attachments: cycle0.png JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted() which goes on to lock potentiallyFaultyTrackers. I haven't produced such a deadlock, but the lock ordering here is inverted and therefore could deadlock. Not sure if this goes back to 0.21 or just in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.