[ 
https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936160#comment-16936160
 ] 

Guillem commented on IMPALA-8946:
---------------------------------

{quote}It sounds like the gerrit limitations are pretty hard to get around, 
unfortunately - it's a limitation of the github/gerrit plugin.
{quote}
Ok. I'll consider creating a dummy GitHub account for further collaborations.
{quote}It looks like prometheus also wants a _sum member for summary metrics? 
I'm not sure how gracefully it handles it being missing. Obviously your change 
makes this much closer to the spec. It seems like we could extend 
HistogramMetric/HdrHistogram to track the sum as well.
{quote}
Yes, that's true: I forgot to mention it on commit's message. I've seen that 
`HdrHistogram` already keeps track of `TotalSum`, so it should be as easy as 
forwarding this value. I'll upload the patch on the following days.

Thanks!

> Prometheus histograms do not follow conventions
> -----------------------------------------------
>
>                 Key: IMPALA-8946
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8946
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.3.0
>            Reporter: Guillem
>            Assignee: Guillem
>            Priority: Minor
>              Labels: observability
>         Attachments: 
> 0001-IMPALA-8946-Fix-histogram-rendering-to-Prometheus.patch
>
>
> We've been using Prometheus metrics and we've found that some standard 
> Prometheus parser can not properly interpret histograms from Impala.
> For example, Python official client 
> ([https://github.com/prometheus/client_python)] can not properly read them. 
> I've been digging a little bit why it can't read them and I've found that 
> Impala does not adhere to textual histogram conventions.
> The following link describes the conventions for rendering histograms on 
> Prometheus textual format: 
> [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries]
> This is an example of a rendered histogram on Impala 3.3 on Prometheus 
> endpoint:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_max 0
> impala_thrift_server_backend_svc_thread_wait_time_min 0
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> The linked histogram conventions say that
> {quote}Each bucket count of a histogram named x is given as a separate sample 
> line with the name x_bucket and a label \{le="y"} (where y is the upper bound 
> of the bucket).
> {quote}
> And also
> {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be 
> identical to the value of x_count.
> {quote}
> The previous example should be formatted as:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> I've found that with this format, the official python client is able to 
> properly read the histograms.
> Note also that metrics suffixed with `_min` and `_max` are also out of the 
> convention and they also break histogram parsing and maybe they need to be 
> reported as separated metrics (maybe as gauges?)
> If you are fine with doing this changes, I already have a patch to improve 
> the histogram formatting and I can submit it to review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to