[ https://issues.apache.org/jira/browse/AURORA-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kai Huang updated AURORA-1929: ------------------------------ Component/s: Scheduler > Improve explicit task history pruning. > -------------------------------------- > > Key: AURORA-1929 > URL: https://issues.apache.org/jira/browse/AURORA-1929 > Project: Aurora > Issue Type: Task > Components: Scheduler > Reporter: Kai Huang > Assignee: Kai Huang > Priority: Minor > > There are currently two types of task history pruning running by aurora: > # The implicit task history pruning running by TaskHistoryPrunner in the > background, which registers all inactive tasks upon terminal state change for > pruning. > # The explicit task history pruning initiated by `aurora_admin prune_tasks` > command, which prunes inactive tasks in the cluster. > The prune_tasks endpoint seems to be very slow when the cluster has a large > number of inactive tasks. > For example, when we use $ aurora_admin prune_tasks for 135k running tasks > (1k jobs), it takes about ~30 minutes to prune all tasks, the pruning speed > seems to max out at 3k tasks per minute. > Currently, aurora uses StreamManager to manages a single log stream append > transaction for task history pruning. Local storage ops can be added to the > transaction and then later committed as an atomic unit. However, the > StateManager removes tasks one by one in a > for-loop(https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L376), > and each RemoveTasks operation is coalesced with its previous operation, > which seems inefficient and unnecessary > (https://github.com/apache/aurora/blob/c85bffdd6f68312261697eee868d57069adda434/src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java#L324). > We need to batch all removeTasks operations and execute them all at once to > avoid additional cost of coalescing. The fix will also benefit implicit task > history pruning since it has similar underlying implementation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)