Corner-case deadlock in TaskTracker
-----------------------------------

                 Key: HADOOP-1461
                 URL: https://issues.apache.org/jira/browse/HADOOP-1461
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.3
            Reporter: Arun C Murthy
            Assignee: Arun C Murthy
            Priority: Critical
             Fix For: 0.14.0
         Attachments: main_taskcleanup_deadlock.txt

Thanks to Koji for the attached stack-trace...

Summary:

main()
  -> offerService()
    -> markUnresponsiveTasks (locks the TaskTracker here)
      -> purgeTask() 
        -> removeTaskFromJob (waiting to lock the RunningJob object)

taskCleanup
  -> purgeJob (locks the RunningJob object)
    -> TIP.jobHasFinished()
      -> TIP.cleanup (waiting to lock the TaskTracker)

-*-*-

Clear-case of ordering issues during synchronization... it's a corner-case 
since it depends on the child-vm getting unresponsive _and_ the cleanup thread 
kicking in; which is why I'm marking this for 0.14.0 rather than 0.13.0 - what 
do others think about this?

-*-*-

Two possible solutions to break the deadlock cycle:

a) Make TaskTracker.purgeJob a synchronized method, thus it locks the 
TaskTracker before locking the RunningJob method.
b) Make the TaskTracker.tasks map a *Collections.synchronizedMap*, thus doing 
away with the need to lock the TaskTracker in TIP.cleanup

I'd prefer a) since the TaskTracker.tasks is referenced in multiple places in 
synchronized methods... and hence is a less intrusive change.

-*-*- 

Thoughts?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to