On 26/07/2016 06:56 μμ, Willy Tarreau wrote: > On Tue, Jul 26, 2016 at 05:51:08PM +0200, Pavlos Parissis wrote: >> In all my setups I have nbproc > 1 and after a lot of changes and on how I >> aggregate HAProxy >> stats and what most people want to see on graphs, I came up that with >> something like the following: >> >> { >> "frontend": { >> "www.haproxy.org": { >> "bin": "999999999999", >> "lbtot": "555555", >> ... >> }, >> "www.haproxy.com": { >> "bin": "999999999999", >> "lbtot": "555555", >> ... >> }, >> }, >> "backend": { >> "www.haproxy.org": { >> "bin": "999999999999", >> "lbtot": "555555", >> .... >> "server": { >> "srv1": { >> "bin": "999999999999", >> "lbtot": "555555", >> .... >> }, >> ... >> }, >> }, >> }, >> "haproxy": { >> "PipesFree": "555", >> ... >> , >> "per_process": { >> "id1": { >> "PipesFree": "555", >> "Process_num": "1", >> ... >> }, >> "id2": { >> "PipesFree": "555", >> "Process_num": "2", >> ... >> }, >> ... >> }, >> }, >> "server": { >> "srv1": { >> "bin": "999999999999", >> "lbtot": "555555", >> ... >> }, >> ... >> }, >> } >> >> >> Let me explain a bit: >> >> - It is very useful and handy to know stats for a server per backend but >> also across all >> backends. Thus, I include a top level key 'server' which holds stats for >> each server across all >> backends. Few server's stats has to be excluded as they are meaningless in >> this context. >> For example, status, lastchg, check_duration, check_code and few others. For >> those which aren't >> counters but fixed numbers you want to either sum them(slim) or get the >> average(weight). I >> don't do the latter in my setup. > > You probably have not looked at the output of "show stats typed", it > gives you the nature of each value letting you know how to aggregate > them (min, max, avg, sum, pick any, etc). >
I have seen it but it isn't available on 1.6. It could simplify my code, I should give a try. >> - Aggregation across multiple processes for haproxy stats(show info output) > > It's not only "show info", this one reports only the process health. > >> As you can see I provide stats per process and across all processes. >> It has been proven very useful to know the CPU utilization per process. We >> depend on the kernel >> to do the distribution of incoming connects to all processes and so far it >> works very well, but >> sometimes you see a single process to consume a lot of CPU and if you don't >> provide percentiles >> or stats per process then you are going to miss it. The metrics about >> uptime, version, >> description and few other can be excluded in the aggregation. > > These last ones are in the "pick any" type of aggregation I was talking about. > >> - nbproc > 1 and aggregation for frontend/backend/server >> My proposal doesn't cover stats for frontend/backend/server per haproxy >> process. > > But that's precisely the limitation I'm reporting :-) > >> The stats are already aggregated and few metrics are excluded. For example >> all status stuff. >> Each process performs healthchecking, so they act as little brains which >> never agree on the >> status of a server as they run their checks on different interval. > > Absolutely, but at least you want to see their stats. For example how many > times a server has switched state per process then in total (meaning a > proportional amount of possibly visible issues). > True, but in setups with ECMP in front of N HAProxy nodes which run in nbproc mode you offload application healthchecking to a dedicated daemon which runs on servers(service discovery+service availability with consul/zookeeper stuff) and you only run TCP checks from HAProxy. In our setup we don't real care about how many times a server flapped, it doesn't tell us something we don't know already, application is in broken state. But, other people may find it useful. > My issue is that if the *format* doesn't support per-process stats, we'll have > to emit a new format 3 months later for all the people who want to process it. > We've reworked the stats dump to put an end to the problem where depending on > the output format you used to have different types of information, and there > was no single representation carrying them all at once. For me now it's > essential that if we prepare a new format it's not stripped down from the > info people need, otherwise it will automatically engender yet another format. > Agree. I am fine giving per process stats for servers/frontends/backends. Adding another top level key 'per_process' in my proposal should be a good start: { "per_process": { "proc1": { "frontend": { "www.haproxy.org": { "bin": "999999999999", "lbtot": "555555", ... }, "www.haproxy.com": { "bin": "999999999999", "lbtot": "555555", ... }, }, "backend": { "www.haproxy.org": { "bin": "999999999999", "lbtot": "555555", .... "server": { "srv1": { "bin": "999999999999", "lbtot": "555555", .... }, ... }, }, }, "haproxy": { "PipesFree": "555", ... }, "server": { "srv1": { "bin": "999999999999", "lbtot": "555555", ... }, ... }, }, ... }, "frontend": { "www.haproxy.org": { "bin": "999999999999", "lbtot": "555555", ... }, "www.haproxy.com": { "bin": "999999999999", "lbtot": "555555", ... }, }, "backend": { "www.haproxy.org": { "bin": "999999999999", "lbtot": "555555", .... "server": { "srv1": { "bin": "999999999999", "lbtot": "555555", .... }, ... }, }, }, "haproxy": { "PipesFree": "555", ... }, }, "server": { "srv1": { "bin": "999999999999", "lbtot": "555555", ... }, ... }, } What do you think? Cheers, Pavlos
signature.asc
Description: OpenPGP digital signature