Corner-case deadlock in TaskTracker
-----------------------------------
Key: HADOOP-1461
URL: https://issues.apache.org/jira/browse/HADOOP-1461
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.12.3
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Critical
Fix For: 0.14.0
Attachments: main_taskcleanup_deadlock.txt
Thanks to Koji for the attached stack-trace...
Summary:
main()
-> offerService()
-> markUnresponsiveTasks (locks the TaskTracker here)
-> purgeTask()
-> removeTaskFromJob (waiting to lock the RunningJob object)
taskCleanup
-> purgeJob (locks the RunningJob object)
-> TIP.jobHasFinished()
-> TIP.cleanup (waiting to lock the TaskTracker)
-*-*-
Clear-case of ordering issues during synchronization... it's a corner-case
since it depends on the child-vm getting unresponsive _and_ the cleanup thread
kicking in; which is why I'm marking this for 0.14.0 rather than 0.13.0 - what
do others think about this?
-*-*-
Two possible solutions to break the deadlock cycle:
a) Make TaskTracker.purgeJob a synchronized method, thus it locks the
TaskTracker before locking the RunningJob method.
b) Make the TaskTracker.tasks map a *Collections.synchronizedMap*, thus doing
away with the need to lock the TaskTracker in TIP.cleanup
I'd prefer a) since the TaskTracker.tasks is referenced in multiple places in
synchronized methods... and hence is a less intrusive change.
-*-*-
Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.