[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1895:
-----------------------------------------------

    Description: 
We have seen a scenario of lost trackers on our clusters because of the 
following:
TaskLauncher has locked a TaskTracker$RunningJob and doing localizeJob, which 
involves DFS operations. Map-event
fetcher has locked TaskTracker.runningJobs map and is waiting to lock the 
RunningJob object. TaskTracker offerService
is waiting to lock TaskTracker.runningJobs map, thus failing to send heartbeats 
in 10 minutes. 

So, I think map-event fetcher should skip jobs that are not localized.



  was:
We have seen a scenario of lost trackers on our clusters because of the 
following:
TaskLauncher has locked a TaskTracker$RunningJob and doing localizeJob, which 
involves DFS operations. Map-event
fetcher has locked TaskTracker.runningJobs map and is waiting to lock the 
RunningJob object. TaskTracker offerService
is waiting to lock TaskTracker.runningJobs map, thus failing to send heartbeats 
in 10 minutes. 

So, I think map-event fetcher should circuit jobs that are not localized.




> MapEventFetcherThread should not iterate over jobs that are not localized
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1895
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1895
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>            Reporter: Amareshwari Sriramadasu
>
> We have seen a scenario of lost trackers on our clusters because of the 
> following:
> TaskLauncher has locked a TaskTracker$RunningJob and doing localizeJob, which 
> involves DFS operations. Map-event
> fetcher has locked TaskTracker.runningJobs map and is waiting to lock the 
> RunningJob object. TaskTracker offerService
> is waiting to lock TaskTracker.runningJobs map, thus failing to send 
> heartbeats in 10 minutes. 
> So, I think map-event fetcher should skip jobs that are not localized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to