On Tue, Jul 26, 2016 at 05:51:08PM +0200, Pavlos Parissis wrote:
> In all my setups I have nbproc > 1 and after a lot of changes and on how I 
> aggregate HAProxy
> stats and what most people want to see on graphs, I came up that with 
> something like the following:
> 
> {
>     "frontend": {
>         "www.haproxy.org": {
>             "bin": "999999999999",
>             "lbtot": "555555",
>             ...
>         },
>         "www.haproxy.com": {
>             "bin": "999999999999",
>             "lbtot": "555555",
>             ...
>         },
>     },
>     "backend": {
>         "www.haproxy.org": {
>             "bin": "999999999999",
>             "lbtot": "555555",
>             ....
>             "server": {
>                 "srv1": {
>                     "bin": "999999999999",
>                     "lbtot": "555555",
>                    ....
>                 },
>                 ...
>             },
>         },
>     },
>     "haproxy": {
>         "PipesFree": "555",
>             ...
>         ,
>         "per_process": {
>             "id1": {
>                 "PipesFree": "555",
>                 "Process_num": "1",
>                 ...
>             },
>             "id2": {
>                 "PipesFree": "555",
>                 "Process_num": "2",
>                 ...
>             },
>             ...
>         },
>     },
>     "server": {
>         "srv1": {
>             "bin": "999999999999",
>             "lbtot": "555555",
>             ...
>         },
>         ...
>     },
> }
> 
> 
> Let me explain a bit:
> 
> - It is very useful and handy to know stats for a server per backend but also 
> across all
> backends. Thus, I include a top level key 'server' which holds stats for each 
> server across all
> backends. Few server's stats has to be excluded as they are meaningless in 
> this context.
> For example, status, lastchg, check_duration, check_code and few others. For 
> those which aren't
> counters but fixed numbers you want to either sum them(slim) or get the 
> average(weight). I
> don't do the latter in my setup.

You probably have not looked at the output of "show stats typed", it
gives you the nature of each value letting you know how to aggregate
them (min, max, avg, sum, pick any, etc).

> - Aggregation across multiple processes for haproxy stats(show info output)

It's not only "show info", this one reports only the process health.

> As you can see I provide stats per process and across all processes.
> It has been proven very useful to know the CPU utilization per process. We 
> depend on the kernel
> to do the distribution of incoming connects to all processes and so far it 
> works very well, but
> sometimes you see a single process to consume a lot of CPU and if you don't 
> provide percentiles
> or stats per process then you are going to miss it. The metrics about uptime, 
> version,
> description and few other can be excluded in the aggregation.

These last ones are in the "pick any" type of aggregation I was talking about.

> - nbproc > 1 and aggregation for frontend/backend/server
> My proposal doesn't cover stats for frontend/backend/server per haproxy 
> process.

But that's precisely the limitation I'm reporting :-)

> The stats are already aggregated and few metrics are excluded. For example 
> all status stuff.
> Each process performs healthchecking, so they act as little brains which 
> never agree on the
> status of a server as they run their checks on different interval.

Absolutely, but at least you want to see their stats. For example how many
times a server has switched state per process then in total (meaning a
proportional amount of possibly visible issues).

My issue is that if the *format* doesn't support per-process stats, we'll have
to emit a new format 3 months later for all the people who want to process it.
We've reworked the stats dump to put an end to the problem where depending on
the output format you used to have different types of information, and there
was no single representation carrying them all at once. For me now it's
essential that if we prepare a new format it's not stripped down from the
info people need, otherwise it will automatically engender yet another format.

Thanks,
Willy

Reply via email to