On Tue, Jul 26, 2016 at 09:06:05PM +0200, Pavlos Parissis wrote:
> > You probably have not looked at the output of "show stats typed", it
> > gives you the nature of each value letting you know how to aggregate
> > them (min, max, avg, sum, pick any, etc).
> > 
> 
> I have seen it but it isn't available on 1.6. It could simplify my code, I
> should give a try.

Ah indeed you're right. Well it's not in 1.6 mainline but we backported
it to hapee-1.6 in case that's relevant to the machines you're interested
in.

> >> The stats are already aggregated and few metrics are excluded. For example 
> >> all status stuff.
> >> Each process performs healthchecking, so they act as little brains which 
> >> never agree on the
> >> status of a server as they run their checks on different interval.
> > 
> > Absolutely, but at least you want to see their stats. For example how many
> > times a server has switched state per process then in total (meaning a
> > proportional amount of possibly visible issues).
> > 
> 
> True, but in setups with ECMP in front of N HAProxy nodes which run in nbproc 
> mode you offload
> application healthchecking to a dedicated daemon which runs on servers(service
> discovery+service availability with consul/zookeeper stuff) and you only run 
> TCP checks
> from HAProxy.
> 
> In our setup we don't real care about how many times a server flapped, it 
> doesn't tell us
> something we don't know already, application is in broken state.

In such a case I agree.

> But, other people may find it useful.

Anyway that was just an example, what I meant by this is that we must
take care not to selectively pick some elements and not other ones. I
prefer that the output contains 10% of useless stuff and that we never
have anything special to do for the upcoming stuff to automatically
appear than to have to explicitly add new stuff all the time! When
you see the size of the csv dump function right now, it's a joke
and I really expect the JSON dump to follow the same philosophy.

> > My issue is that if the *format* doesn't support per-process stats, we'll 
> > have
> > to emit a new format 3 months later for all the people who want to process 
> > it.
> > We've reworked the stats dump to put an end to the problem where depending 
> > on
> > the output format you used to have different types of information, and there
> > was no single representation carrying them all at once. For me now it's
> > essential that if we prepare a new format it's not stripped down from the
> > info people need, otherwise it will automatically engender yet another 
> > format.
> > 
> 
> Agree. I am fine giving per process stats for servers/frontends/backends.
> Adding another top level key 'per_process' in my proposal should be a good 
> start:
> 
> {
>     "per_process": {
>         "proc1": {
>             "frontend": {
>                 "www.haproxy.org": {
>                     "bin": "999999999999",
>                     "lbtot": "555555",
>                     ...
(...)

Yes, I think so and that's also more or less similar to what Mark proposed.
Also I'm not much worried by the extra output size, if we dump this through
HTTP we'll have it gzipped.

Also, we want to have the values typed otherwise you're fucked as we used
to be with the CSV dump in the past. The current code supports this and
that's important. I don't know how it may impact the JSON output. Maybe
some parts will be just "numbers", but I remember that certain of them
have some properties (eg: max, limit, age, percentage, PID, SNMP ID, etc).
I'm less worried about the strings, we basically have identifiers,
descriptions and outputs from what I remember. But taking a look at this
will help refine the format.

I'd like to wait for other people to have the time to participate to this
discussion, I know that some people are very careful about the relevance
and accuracy of the stats, some people may want to report other suggestions.

Cheers,
Willy

Reply via email to