Hi Pierre,

Sorry I missed you email. Thanks to William for the reminder.

Le 15/11/2019 à 15:55, Pierre Cheynier a écrit :
We've recently tried to switch to the native prometheus exporter, but went 
quickly stopped in our initiative given the output on one of our preprod server:

$ wc -l metrics.out
1478543 metrics.out
$ ls -lh metrics.out
-rw-r--r-- 1 pierre pierre 130M nov.  15 15:33 metrics.out

This is not only due to a large setup, but essentially related to server lines, 
since we extensively user server-templates for server addition/deletion at 
runtime.

# backend & servers number
$ echo "show stat -1 2 -1" | sudo socat stdio /var/lib/haproxy/stats | wc -l
1309
$ echo "show stat -1 4 -1" | sudo socat stdio /var/lib/haproxy/stats | wc -l
36360
# But a lot of them are actually "waiting to be provisioned" (especially on 
this preprod environment)
$ echo "show stat -1 4 -1" | sudo socat stdio /var/lib/haproxy/stats | grep 
MAINT | wc -l
34113

We'll filter out the server metrics as a quick fix, and will hopefully submit 
something to do it natively, but we would also like to get your feedbacks about 
some use-cases we expected to solve with this native exporter.


I addressed this issue based on a William's idea. I also proposed to add a filter to exclude all servers in maintenance from the export. Let me know if you see a better way to do so. For the moment, from the exporter point of view, it is not really hard to do such filtering.

Ultimately, one of them would be a great value-added for us: being able to 
count check_status types (and their values in the L7STS case) per backend.

So, there are 3 associated points:
* it's great to have new metrics (such as 
`haproxy_process_current_zlib_memory`), but we also noticed that some very 
useful ones were not present due to their type, example:
[ST_F_CHECK_STATUS]   = IST("untyped"),
What could be done to be able to retrieve them? (I thought about something 
similar to `HRSP_[1-5]XX`, where the different check status could be defined 
and counted).


Hum, I can add the check status. Mapping all status on integers is possible. However, having a metric per status is probably not the right solution, because it is not a counter but just a state (a boolean). If we do so, all status would be set to 0 except the current status. It is not really handy. But a mapping is possible. We already do this for the frontend/backend/server status (ST_F_STATUS).

* also for `check_status`, there is the case of L7STS and its associated values that are present in another field. Most probably it could benefit from a better representation in a prometheus output (thanks to labels)?

We can also export the metrics ST_F_CHECK_CODE. For the use of labels, I have no idea. For now, the labels are static in the exporter. And I don't know if it is pertinent to add dynamic info in labels. If so, what is your idea ? Add a "code" label associated to the check_status metric ?

* what about getting some backend-level aggregation of server metrics, such as 
the one that was previously mentioned, to avoid retrieving all the server 
metrics but still be able to get some insights?
I'm thinking about an aggregation of some fields at backend level, which was 
not previously done with the CSV output.


It is feasible. But only counters may be aggregated. It may be enabled using a parameter in the query-string. However, it is probably pertinent only when the server metrics are filtered out. Because otherwise, Prometheus can handle the aggregation itself.

--
Christopher Faulet

Reply via email to