De : William Dauchy
Envoyé : samedi 30 janvier 2021 16:21
> this is a follow up of commit c6464591a365bfcf509b322bdaa4d608c9395d75
> ("MAJOR: contrib/prometheus-exporter: move ftd/bkd/srv states to
> labels"). The main goal being to be better aligned with prometheus use
> cases in terms of queries. More specifically to health checks, Pierre C.
> mentioned the possible quirks he had to put in place in order to make
> use of those metrics through prometheus:
>
> by(proxy, check_status) (count_values by(proxy,
> instance) ("check_status", haproxy_server_check_status))
Indeed, we wanted to aggregate at backend level to produce a view which
represent the "number of servers per health status". Still, this was not ideal
since I'm getting the status code and not the status name.
> I am perfectly aware this introduces a lot more metrics but I don't see
> how we can improve the usability without it. The main issue remains in
> the cardinality of the states which are > 20. Prometheus recommends to
> stay below a cardinality of 10 for a given metric but I consider our
> case very specific, because highly linked to the level of precision
> haproxy exposes.
>
> Even before this patch I saw several large production setup (a few
> hundreds of MB in output) which are making use of the scope parameter to
> simply ignore the server metrics, so that the scrapping can be faster,
> and memory consumed on client side not too high. So I believe we should
> eventually continue in that direction and offer more granularity of
> filtering of the output. That being said it is already possible to
> filter out the data on prometheus client side.
True, I think this is the right approach to represent such data in Prometheus.
It will create some challenges for people like us, who use server-templates
and create thousands of backends, but we'll have to deal with it.
In addition to this update, I would add some recommendations about the user
setup in the README ("how do I prevent my prometheus instance to explode
when scrapping this endpoint?").
For server-template users:
-
params:
no-maint:
- empty
Generally speaking, to prevent all server metrics to be saved, except this one:
-
metric_relabel_configs:
- source_labels: ['__name__']
regex: 'haproxy_(process_|frontend_|backend_|server_check_status).*'
action: keep
A long-term alternative (at least for my use-case) would be to provide this
data at
backend-level, as initially suggested here:
https://www.mail-archive.com/haproxy@formilux.org/msg35369.html
> Signed-off-by: William Dauchy
> ---
> @@ -319,7 +320,7 @@ const struct ist promex_st_metric_desc[ST_F_TOTAL_FIELDS]
> = {
> [ST_F_RATE] = IST("Current number of sessions per second
> over last elapsed second."),
> [ST_F_RATE_LIM] = IST("Configured limit on new sessions per
> second."),
> [ST_F_RATE_MAX] = IST("Maximum observed number of sessions per
> second."),
> - [ST_F_CHECK_STATUS] = IST("Status of last health check
> (HCHK_STATUS_* values)."),
> + [ST_F_CHECK_STATUS] = IST("Status of last health check (0/1
> depending on current `state` label value)."),
"Status of last health check, per state value" ?
--
Pierre