[
https://issues.apache.org/jira/browse/AURORA-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380825#comment-14380825
]
Maxim Khutornenko commented on AURORA-1227:
-------------------------------------------
I am totally OK with a tooling solution as long as it's capable to answer the
above questions. Using the existing RPCs would require pulling the entire
contents of the TaskStore to the client, which is no fun in a large cluster and
may affect scheduler perf (e.g. increased GC pressure). Perhaps we can have an
API to return a normalized view of the task store (similar to what we did in
SnapshotDeduplicator)?
> Create a "top X jobs" debug HTTP endpoint
> -----------------------------------------
>
> Key: AURORA-1227
> URL: https://issues.apache.org/jira/browse/AURORA-1227
> Project: Aurora
> Issue Type: Task
> Components: Scheduler
> Reporter: Maxim Khutornenko
>
> It may be useful to query scheduler for "top X" job names by resource
> utilization (CPU/RAM/DISK) to investigate cluster capacity shortages.
> Something like: /top/\{count\}/\{cpu|ram|disk\}/\{timestamp\} should be able
> to answer questions like "What are the 10 most memory consuming jobs now?" or
> "What were the largest CPU consuming jobs yesterday?" Since we have limited
> task history, answers to latter should be considered as a best effort.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)