Benjamin Mahler created MESOS-429:
-------------------------------------

             Summary: Hadoop MesosScheduler has a deadlock.
                 Key: MESOS-429
                 URL: https://issues.apache.org/jira/browse/MESOS-429
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Mahler
            Assignee: Benjamin Mahler
            Priority: Blocker


This was found with the help of Brenden Matthews.

JobTracker.heartbeat (synchronized) calls MesosScheduler.assignTasks 
(synchronized)
MesosScheduler.resourceOffers (synchronized) calls into JobTracker.getJobStatus 
(synchronized).

Thread 24558: (state = BLOCKED)
 - org.apache.hadoop.mapred.JobTracker.getJobStatus(java.util.Collection, 
boolean) @bci=0, line=4592 (Interpreted frame)
 - org.apache.hadoop.mapred.JobTracker.jobsToComplete() @bci=11, line=4157 
(Interpreted frame)
 - 
org.apache.hadoop.mapred.MesosScheduler.resourceOffers(org.apache.mesos.SchedulerDriver,
 java.util.List) @bci=9, line=273 (Compiled frame)


Thread 24575: (state = BLOCKED)
 - 
org.apache.hadoop.mapred.MesosScheduler.assignTasks(org.apache.hadoop.mapreduce.server.jobtracker.TaskTracker)
 @bci=25, line=219 (Compiled frame)
 - 
org.apache.hadoop.mapred.JobTracker.heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus,
 boolean, boolean, boolean, short) @bci=507, line=2951 (Compiled frame)

The simplest fix for now would be to unsynchronize the Scheduler interface 
implementations. As a result, when we have to modify the state of 
MesosScheduler inside those methods, we need to do so in a synchronized block. 
So long as we don't invoke the JobTracker methods from these synchronized 
blocks, we won't have a deadlock. We can clean this up later, if a cleaner 
abstraction is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to