[ 
https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050819#comment-14050819
 ] 

Arpit Agarwal edited comment on HADOOP-10281 at 7/2/14 10:46 PM:
-----------------------------------------------------------------

Hi [~chrili],

Thanks for the updated changes! I am basically +1 for the "preview" patch, 
minus the {{HistoryRpcScheduler}}. Is this tested and ready to commit from your 
side? If so what do you think of just eliminating {{HistoryRpcScheduler}}.

_____


Just thinking aloud about a possible future optimization (and please don't 
bother doing it in the same Jira even if it makes sense!). I think we can 
eliminate the periodic decay timer and perform the decay activity very cheaply 
in the context of {{getPriorityLevel}} in a lazy manner. We would need to store 
a {{lastUpdatedTimestamp}} with each {{scheduleCacheRef}} entry, and also a 
last updated timestamp for the totalCount. Then we could do the following:
# If the {{lastUpdatedTimestamp}} for the identity's cache entry is greater 
than the decay period, update {{lastUpdatedTimestamp}} for the entry first and 
multiply the {{callCounts}} entry by {{decayFactor}}.
# If the {{lastUpdatedTimestamp}} for the {{totalCount}} is greater than the 
decay period, update both the global timestamp and the timestamp for the 
corresponding {{cacheEntry}} and then divide {{totalCount}} by {{decayFactor}}
# In either case if the time elapsed is greater than some factor n of the 
{{decayPeriod}}, then we can multiply the corresponding count by 
{{decayFactor}}^n.
# If either of the two conditions was true, recompute {{scheduleCacheRef}}.

The other advantage is we could use a smaller {{decayPeriod}} and a larger 
{{decayFactor}} without increasing timer activity, which should yield a 
smoother decay curve. The only missing part would be the periodic cleanup of 
unused {{identities}}.


was (Author: arpitagarwal):
Hi [~chrili],

Thanks for the updated changes! I am basically +1 for the "preview" patch, 
minus the {{HistoryRpcScheduler}}. Is this tested and ready to commit from your 
side? If so what do you think of just eliminating {{HistoryRpcScheduler}}.

_____


Just thinking aloud about a possible future optimization (and please don't 
bother doing it in the same Jira even if it makes sense!). I think we can 
eliminate the periodic decay timer and perform the decay activity very cheaply 
in the context of {{getPriorityLevel}}. We would need to store a 
{{lastUpdatedTimestamp}} with each {{scheduleCacheRef}} entry, and also a last 
updated timestamp for the totalCount. Then we could do the following:
# If the {{lastUpdatedTimestamp}} for the identity's cache entry is greater 
than the decay period, update {{lastUpdatedTimestamp}} for the entry first and 
multiply the {{callCounts}} entry by {{decayFactor}}.
# If the {{lastUpdatedTimestamp}} for the {{totalCount}} is greater than the 
decay period, update both the global timestamp and the timestamp for the 
corresponding {{cacheEntry}} and then divide {{totalCount}} by {{decayFactor}}
# If for either of the above, the time elapsed is greater than some factor n of 
the {{decayPeriod}}, then we can multiply the corresponding timestamp by 
{{decayFactor}}^n. This can occur after a long period of inactivity.
# If either of the two conditions was true, recompute {{scheduleCacheRef}}.

The other advantage is we could use a smaller {{decayPeriod}} and a larger 
{{decayFactor}} without increasing timer activity, which should yield a 
smoother decay curve. The only missing part would be the periodic cleanup of 
unused {{identities}}.

> Create a scheduler, which assigns schedulables a priority level
> ---------------------------------------------------------------
>
>                 Key: HADOOP-10281
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10281
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Chris Li
>            Assignee: Chris Li
>         Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, 
> HADOOP-10281.patch, HADOOP-10281.patch
>
>
> The Scheduler decides which sub-queue to assign a given Call. It implements a 
> single method getPriorityLevel(Schedulable call) which returns an integer 
> corresponding to the subqueue the FairCallQueue should place the call in.
> The HistoryRpcScheduler is one such implementation which uses the username of 
> each call and determines what % of calls in recent history were made by this 
> user.
> It is configured with a historyLength (how many calls to track) and a list of 
> integer thresholds which determine the boundaries between priority levels.
> For instance, if the scheduler has a historyLength of 8; and priority 
> thresholds of 4,2,1; and saw calls made by these users in order:
> Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice
> * Another call by Alice would be placed in queue 3, since she has already 
> made >= 4 calls
> * Another call by Bob would be placed in queue 2, since he has >= 2 but less 
> than 4 calls
> * A call by Carlos would be placed in queue 0, since he has no calls in the 
> history
> Also, some versions of this patch include the concept of a 'service user', 
> which is a user that is always scheduled high-priority. Currently this seems 
> redundant and will probably be removed in later patches, since its not too 
> useful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to