[ 
https://issues.apache.org/jira/browse/HADOOP-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661050#action_12661050
 ] 

Vivek Ratan commented on HADOOP-4984:
-------------------------------------

The refactoring work in HADOOP-4980 fixes this problem. While  I realize that 
we don't usually want to clump together more than one fix in a patch, the work 
in HADOOP-4980 went a long long way in simplifying the fix for this patch, so I 
didn't create a separate patch here. By making the _SchedulingInfo_ class be 
functionally 'outside' the scheduler class, and thus unaware of the latter's 
data structures, and by moving the generation of the display strings to the 
concerned _QueueSchedulingInfo_ and _TaskSchedulingInfo_ objects, the 
synchronization problem is easily addressed. we also don't update the QSI 
objects, preferring to show potentially slightly stale information, but without 
a performance penalty. 

> Code to create the UI display string for queues in the Capacity Scheduler 
> needs to be synchronized, and needs to better update its information
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4984
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>
> There are a couple of problems with _SchedulingInfo.toString()_, the code 
> which creates the UI display string for a queue: 
> * it needs synchronized access to the _QueueSchedulingInfo_ object, as this 
> same object can be updated by the reclaim-capacity thread, and during a 
> heartbeat.
> * the code directly updates its count of running map/reduce tasks. this 
> should be done in a better way, perhaps by calling updateQSIObjects(), rather 
> than walking through the data structures directly. It's also not clear that 
> we want to pay the performance penalty of updating the structures. it maybe 
> OK to provide slightly stale info (the 'staleness' is tiny, in a steady-state 
> and large system, where heartbeats are coming in frequently). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to