[jira] [Commented] (AURORA-1227) Create a "top X jobs" debug HTTP endpoint

Maxim Khutornenko (JIRA) Wed, 25 Mar 2015 14:34:39 -0700

    [ 
https://issues.apache.org/jira/browse/AURORA-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380825#comment-14380825
 ]


Maxim Khutornenko commented on AURORA-1227:
-------------------------------------------

I am totally OK with a tooling solution as long as it's capable to answer the 
above questions. Using the existing RPCs would require pulling the entire 
contents of the TaskStore to the client, which is no fun in a large cluster and 
may affect scheduler perf (e.g. increased GC pressure). Perhaps we can have an 
API to return a normalized view of the task store (similar to what we did in 
SnapshotDeduplicator)?

> Create a "top X jobs" debug HTTP endpoint
> -----------------------------------------
>
>                 Key: AURORA-1227
>                 URL: https://issues.apache.org/jira/browse/AURORA-1227
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Maxim Khutornenko
>
> It may be useful to query scheduler for "top X" job names by resource 
> utilization (CPU/RAM/DISK) to investigate cluster capacity shortages. 
> Something like: /top/\{count\}/\{cpu|ram|disk\}/\{timestamp\} should be able 
> to answer questions like "What are the 10 most memory consuming jobs now?" or 
>  "What were the largest CPU consuming jobs yesterday?" Since we have limited 
> task history, answers to latter should be considered as a best effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AURORA-1227) Create a "top X jobs" debug HTTP endpoint

Reply via email to