On 26/07/2016 03:30 μμ, Willy Tarreau wrote:
> Hi Pavlos!
> 
> On Tue, Jul 26, 2016 at 03:23:01PM +0200, Pavlos Parissis wrote:
>> Here is a suggestion { "frontend": { "www.haproxy.org": { "bin": 
>> "999999999999", "lbtot":
>> "555555", ... }, "www.haproxy.com": { "bin": "999999999999", "lbtot": 
>> "555555", ... }, }, 
>> "backend": { "www.haproxy.org": { "bin": "999999999999", "lbtot": "555555", 
>> .... "server":
>> { "srv1": { "bin": "999999999999", "lbtot": "555555", .... }, ... }
>> 
>> }, }, "haproxy": { "id1": { "PipesFree": "555", "Process_num": "1", ... }, 
>> "id2": { 
>> "PipesFree": "555", "Process_num": "2", ... }, ... }, }
> 
> Thanks. How does it scale if we later want to aggregate these ones over 
> multiple processes
> and/or nodes ? The typed output already emits a process number for each 
> field. Also, we do
> have the information of how data need to be parsed and aggregated. I suspect 
> that we want to
> produce this with the JSON output as well so that we don't lose information 
> when dumping in
> JSON mode. I would not be surprized if people find JSON easier to process 
> than our current
> format to aggregate their stats, provided we have all the fields :-)
> 
> Cheers, Willy
> 

I am glad you asked about aggregation as I deliberately didn't include 
aggregation.
In all my setups I have nbproc > 1 and after a lot of changes and on how I 
aggregate HAProxy
stats and what most people want to see on graphs, I came up that with something 
like the following:

{
    "frontend": {
        "www.haproxy.org": {
            "bin": "999999999999",
            "lbtot": "555555",
            ...
        },
        "www.haproxy.com": {
            "bin": "999999999999",
            "lbtot": "555555",
            ...
        },
    },
    "backend": {
        "www.haproxy.org": {
            "bin": "999999999999",
            "lbtot": "555555",
            ....
            "server": {
                "srv1": {
                    "bin": "999999999999",
                    "lbtot": "555555",
                   ....
                },
                ...
            },
        },
    },
    "haproxy": {
        "PipesFree": "555",
            ...
        ,
        "per_process": {
            "id1": {
                "PipesFree": "555",
                "Process_num": "1",
                ...
            },
            "id2": {
                "PipesFree": "555",
                "Process_num": "2",
                ...
            },
            ...
        },
    },
    "server": {
        "srv1": {
            "bin": "999999999999",
            "lbtot": "555555",
            ...
        },
        ...
    },
}


Let me explain a bit:

- It is very useful and handy to know stats for a server per backend but also 
across all
backends. Thus, I include a top level key 'server' which holds stats for each 
server across all
backends. Few server's stats has to be excluded as they are meaningless in this 
context.
For example, status, lastchg, check_duration, check_code and few others. For 
those which aren't
counters but fixed numbers you want to either sum them(slim) or get the 
average(weight). I
don't do the latter in my setup.

- Aggregation across multiple processes for haproxy stats(show info output)
As you can see I provide stats per process and across all processes.
It has been proven very useful to know the CPU utilization per process. We 
depend on the kernel
to do the distribution of incoming connects to all processes and so far it 
works very well, but
sometimes you see a single process to consume a lot of CPU and if you don't 
provide percentiles
or stats per process then you are going to miss it. The metrics about uptime, 
version,
description and few other can be excluded in the aggregation.


- nbproc > 1 and aggregation for frontend/backend/server
My proposal doesn't cover stats for frontend/backend/server per haproxy process.
The stats are already aggregated and few metrics are excluded. For example all 
status stuff.
Each process performs healthchecking, so they act as little brains which never 
agree on the
status of a server as they run their checks on different interval. But, if 
nbproc == 1 then
these metrics have to be included.


Cheers,
Pavlos





Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to