[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050819#comment-14050819 ]
Arpit Agarwal edited comment on HADOOP-10281 at 7/2/14 10:46 PM: ----------------------------------------------------------------- Hi [~chrili], Thanks for the updated changes! I am basically +1 for the "preview" patch, minus the {{HistoryRpcScheduler}}. Is this tested and ready to commit from your side? If so what do you think of just eliminating {{HistoryRpcScheduler}}. _____ Just thinking aloud about a possible future optimization (and please don't bother doing it in the same Jira even if it makes sense!). I think we can eliminate the periodic decay timer and perform the decay activity very cheaply in the context of {{getPriorityLevel}} in a lazy manner. We would need to store a {{lastUpdatedTimestamp}} with each {{scheduleCacheRef}} entry, and also a last updated timestamp for the totalCount. Then we could do the following: # If the {{lastUpdatedTimestamp}} for the identity's cache entry is greater than the decay period, update {{lastUpdatedTimestamp}} for the entry first and multiply the {{callCounts}} entry by {{decayFactor}}. # If the {{lastUpdatedTimestamp}} for the {{totalCount}} is greater than the decay period, update both the global timestamp and the timestamp for the corresponding {{cacheEntry}} and then divide {{totalCount}} by {{decayFactor}} # In either case if the time elapsed is greater than some factor n of the {{decayPeriod}}, then we can multiply the corresponding count by {{decayFactor}}^n. # If either of the two conditions was true, recompute {{scheduleCacheRef}}. The other advantage is we could use a smaller {{decayPeriod}} and a larger {{decayFactor}} without increasing timer activity, which should yield a smoother decay curve. The only missing part would be the periodic cleanup of unused {{identities}}. was (Author: arpitagarwal): Hi [~chrili], Thanks for the updated changes! I am basically +1 for the "preview" patch, minus the {{HistoryRpcScheduler}}. Is this tested and ready to commit from your side? If so what do you think of just eliminating {{HistoryRpcScheduler}}. _____ Just thinking aloud about a possible future optimization (and please don't bother doing it in the same Jira even if it makes sense!). I think we can eliminate the periodic decay timer and perform the decay activity very cheaply in the context of {{getPriorityLevel}}. We would need to store a {{lastUpdatedTimestamp}} with each {{scheduleCacheRef}} entry, and also a last updated timestamp for the totalCount. Then we could do the following: # If the {{lastUpdatedTimestamp}} for the identity's cache entry is greater than the decay period, update {{lastUpdatedTimestamp}} for the entry first and multiply the {{callCounts}} entry by {{decayFactor}}. # If the {{lastUpdatedTimestamp}} for the {{totalCount}} is greater than the decay period, update both the global timestamp and the timestamp for the corresponding {{cacheEntry}} and then divide {{totalCount}} by {{decayFactor}} # If for either of the above, the time elapsed is greater than some factor n of the {{decayPeriod}}, then we can multiply the corresponding timestamp by {{decayFactor}}^n. This can occur after a long period of inactivity. # If either of the two conditions was true, recompute {{scheduleCacheRef}}. The other advantage is we could use a smaller {{decayPeriod}} and a larger {{decayFactor}} without increasing timer activity, which should yield a smoother decay curve. The only missing part would be the periodic cleanup of unused {{identities}}. > Create a scheduler, which assigns schedulables a priority level > --------------------------------------------------------------- > > Key: HADOOP-10281 > URL: https://issues.apache.org/jira/browse/HADOOP-10281 > Project: Hadoop Common > Issue Type: Sub-task > Reporter: Chris Li > Assignee: Chris Li > Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch, > HADOOP-10281.patch, HADOOP-10281.patch > > > The Scheduler decides which sub-queue to assign a given Call. It implements a > single method getPriorityLevel(Schedulable call) which returns an integer > corresponding to the subqueue the FairCallQueue should place the call in. > The HistoryRpcScheduler is one such implementation which uses the username of > each call and determines what % of calls in recent history were made by this > user. > It is configured with a historyLength (how many calls to track) and a list of > integer thresholds which determine the boundaries between priority levels. > For instance, if the scheduler has a historyLength of 8; and priority > thresholds of 4,2,1; and saw calls made by these users in order: > Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice > * Another call by Alice would be placed in queue 3, since she has already > made >= 4 calls > * Another call by Bob would be placed in queue 2, since he has >= 2 but less > than 4 calls > * A call by Carlos would be placed in queue 0, since he has no calls in the > history > Also, some versions of this patch include the concept of a 'service user', > which is a user that is always scheduled high-priority. Currently this seems > redundant and will probably be removed in later patches, since its not too > useful. -- This message was sent by Atlassian JIRA (v6.2#6252)