[ 
https://issues.apache.org/jira/browse/MESOS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191486#comment-15191486
 ] 

Benjamin Mahler commented on MESOS-4740:
----------------------------------------

If you can't reproduce the slowness, then it seems more likely that the metrics 
computation isn't inherently slow, no? The implication of it being slow only 
sometimes seems to be that sometimes the master and/or allocator are backlogged.

"Complete waste of CPU cycles" has an assumption that the only thing we care 
about is how many CPU cycles are needed to accomplish our work. We care about 
much more than just that, for example, how simple and understandable is the 
code? By introducing event-driven counters, we'll be making the code more 
complicated. If we want to make such a tradeoff, we first have to establish a 
basis for it (there are endless places where we could reduce cpu cycles) and 
measure what we're improving (how large is the improvement). I'm not saying we 
shouldn't do it, but please first do a deeper analysis here and use benchmarks 
to demonstrate that the improvement is worth it. For example, 
https://reviews.apache.org/r/44675/ seems misdirected, I would be very 
surprised if this has a non-negligible impact on what you're seeing.

> Improve master metrics/snapshot performace
> ------------------------------------------
>
>                 Key: MESOS-4740
>                 URL: https://issues.apache.org/jira/browse/MESOS-4740
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Cong Wang
>            Assignee: Cong Wang
>
> [~drobinson] noticed retrieving metrics/snapshot statistics could be very 
> inefficient.
> {noformat}
> [user@server ~]$ time curl -s localhost:5050/metrics/snapshot
> real  0m35.654s
> user  0m0.019s
> sys   0m0.011s
> {noformat}
> MESOS-1287 introduces a timeout parameter for this query, but for 
> metric-collectors like ours they are not aware of such URL-specific 
> parameter, so we need:
> 1) We should always have a timeout and set some default value to it
> 2) Investigate why master metrics/snapshot could take such a long time to 
> complete under load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to