Thanks for the feedback, James. Replying to your points inline:
On Mon, May 22, 2017 at 10:56 AM, James Peach <jor...@gmail.com> wrote: > > > On May 19, 2017, at 11:35 AM, Zhitao Li <zhitaoli...@gmail.com> wrote: > > > > Hi, > > > > I'd like to start a conversation to talk about metrics collection > endpoints > > (especially `/metrics/snapshot`) behavior. > > > > Right now, these endpoints are served from the same master/agent's > > libprocess, and extensively uses `gauge` to chain further callbacks to > > collect various metrics (DRF allocator specifically adds several metrics > > per role). > > > > This brings a problem when the system is under load: when the > > master/allocator libprocess becomes busy, stats collection itself becomes > > slow too. Flying dark when the system is under load is specifically > painful > > for an operator. > > Yes, sampling metrics should approach zero cost. > > > I would like to explore the direction of isolating metric collection even > > when the master is slow. A couple of ideas: > > > > - (short term) reduce usage of gauge and prefer counter (since I believe > > they are less affected); > > I'd rather not squash the semantics for performance reasons. If a metric > has gauge semantics, I don't think we should represent that as a Counter. > I recall that I had a previous conversation with @bmahler and he thought that certain gauges could be expresses as the differential of two counters. We definitely cannot express a gauge as a Counter, because gauge value can decrease while counter should always be treated as monotonically increasing until process restart. > > > - alternative implementation of `gauge` which does not contend on > > master/allocator's event queue; > > This is doable in some circumstances, but not always. For example, > Master::_uptime_secs() doesn't need to run on the master queue, but > Master::_outstanding_offers arguably does. The latter could be implemented > by sampling an variable that is updated, but that's not very generic, so we > should try to think of something better. > I agree that this will not be trivial cut. The fact that this is not something trivially achievable is the primary I want to start this conversation with the general dev community. We can absorb some work to optimize on certain hot paths (I suspect roles specific ones in allocator being one of them for us), but maintaining this in the long term will definitely requires all contributors to help. w.r.t. examples, it seems that Master::_outstanding_offers simply calls a hashmap::size() on a hashmap object, so if the underlying container type conforms to C++11's thread safe requirement ( http://en.cppreference.com/w/cpp/container#Thread_safety), we should at least be able to call the size() function with the understanding that we might get slightly stale value? I think a more interesting example is Master::_task_starting(): not only that is is not calculated from a simple const method,but also that the result is actually generated by iterating on all tasks registered to the master. This means the cost of calculating this is linear to number of tasks in the cluster. > > > - serving metrics collection from a different libprocess routine. > > See MetricsProcess. One (mitigation?) approach would be to sample the > metrics at a fixed rate and then serve the cached samples from the > MetricsProcess. I expect most installations have multiple clients sampling > the metrics, so this would at least decouple the sample rate from the > metrics request rate. > That sounds a very good idea to start with. I still think certain code should be augmented so they maintain the gauge value more efficiently, but for whatever code which is harder to rewrite this can definitely improve the situation. > > > > > Any thoughts on these? > > > > -- > > Cheers, > > > > Zhitao Li > > -- Cheers, Zhitao Li