[
https://issues.apache.org/jira/browse/HADOOP-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661050#action_12661050
]
Vivek Ratan commented on HADOOP-4984:
-------------------------------------
The refactoring work in HADOOP-4980 fixes this problem. While I realize that
we don't usually want to clump together more than one fix in a patch, the work
in HADOOP-4980 went a long long way in simplifying the fix for this patch, so I
didn't create a separate patch here. By making the _SchedulingInfo_ class be
functionally 'outside' the scheduler class, and thus unaware of the latter's
data structures, and by moving the generation of the display strings to the
concerned _QueueSchedulingInfo_ and _TaskSchedulingInfo_ objects, the
synchronization problem is easily addressed. we also don't update the QSI
objects, preferring to show potentially slightly stale information, but without
a performance penalty.
> Code to create the UI display string for queues in the Capacity Scheduler
> needs to be synchronized, and needs to better update its information
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-4984
> URL: https://issues.apache.org/jira/browse/HADOOP-4984
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Reporter: Vivek Ratan
>
> There are a couple of problems with _SchedulingInfo.toString()_, the code
> which creates the UI display string for a queue:
> * it needs synchronized access to the _QueueSchedulingInfo_ object, as this
> same object can be updated by the reclaim-capacity thread, and during a
> heartbeat.
> * the code directly updates its count of running map/reduce tasks. this
> should be done in a better way, perhaps by calling updateQSIObjects(), rather
> than walking through the data structures directly. It's also not clear that
> we want to pay the performance penalty of updating the structures. it maybe
> OK to provide slightly stale info (the 'staleness' is tiny, in a steady-state
> and large system, where heartbeats are coming in frequently).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.