[prometheus-users] Re: Maximum and Minimum Request Duration on Prometheus Classic Histograms

'Brian Candler' via Prometheus Users Mon, 23 Jun 2025 05:21:10 -0700

Also relevant:
https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33645
https://groups.google.com/g/prometheus-developers/c/dGEaTR7Hyi0


On Monday, 23 June 2025 at 13:17:57 UTC+1 Brian Candler wrote:

> Nice explanation of summaries here:
> https://grafana.com/blog/2022/03/01/how-summary-metrics-work-in-prometheus/
>
> On Monday, 23 June 2025 at 12:42:35 UTC+1 Brian Candler wrote:
>
>> Remember that histograms don't store values. All they do is increment a 
>> counter by 1; the value is only used to select which bucket to increment.  
>> This means that the amount of storage used by a histogram is very small - a 
>> fixed number of buckets with one counter each. It doesn't matter if you are 
>> processing 1 sample per second or 10,000 samples per second.
>>
>> If you wanted to retrieve the *exact* lowest or highest value, over *any* 
>> arbitrary time period that you query, you would have to store every single 
>> value into a database. Prometheus is not a event logging system, and it 
>> will never work this way. A columnar datastore like Clickhouse can do that 
>> quite well, but if the number of samples is large, you will still have a 
>> very large storage issue.
>>
>> More realistically, you could find the minimum or maximum value seen over 
>> a fixed time period (say one minute), and at the end of that minute, export 
>> the min/max value seen. That's cheap and quick. Indeed, you could do it 
>> over a relatively short time period (e.g. 1 second), and use prometheus' 
>> min/max_over_time functions if you want to query a longer period, i.e. to 
>> find the min of the mins, or the max of the maxes.  You need to make sure 
>> that every distinct min/max value ends up in the database though; either 
>> use remote_write to push them, or scrape your exporter at least twice as 
>> fast as the min/max values are changing.
>>
>> In my experience, people are often not so interested in the single 
>> minimum or maximum value, but in the quantiles, such as the 1st percentile 
>> ("the fastest 1% of queries were answered in less than X seconds") or the 
>> 99th percentile ("the slowest 1% of queries were answered in more than Y 
>> seconds"). Prometheus can help you using a data type called a "summary":
>> https://prometheus.io/docs/concepts/metric_types/#summary
>> https://prometheus.io/docs/practices/histograms/#quantiles
>>
>> A summary can give you very good estimates of the percentiles over a 
>> sliding time window (of a size you have to choose in advance), and uses a 
>> relatively small amount of storage like a histogram. It is better than a 
>> histogram in the case where you don't know in advance what the highest and 
>> lowest values are likely to be (i.e. you don't need to pre-allocate your 
>> bucket boundaries correctly).
>>
>> On Monday, 23 June 2025 at 08:15:42 UTC+1 tejaswini vadlamudi wrote:
>>
>>> Thanks Brain, for the clear heads-up and explanation!
>>>
>>> It looks to me that there is no possibility to secure exact maximum and 
>>> exact minimum values for durations (based on Prometheus histograms) :-(
>>>
>>> However, for performing exploratory data analysis on the application 
>>> software, need this summary statistics information, such as minimum and 
>>> maximum values. Legacy monitoring systems have always had this support, 
>>> which in turn expects the new technology to fit the use case to ensure 
>>> backward compatibility. 
>>>
>>> Please share what can be done in this regard to secure this info.
>>>
>>> I'm thinking out loud, please correct/add wherever possible:
>>>
>>> 1. Does changing from Prometheus to OTEL instrumentation provide this 
>>> feature (exact max and min duration time)?
>>> 2. Can metrics derived from distributed traces (instrumented with 
>>> OTEL/Jaeger) be used to obtain minimum and maximum request durations?
>>> 3. Is it possible to secure the max and min duration time with 
>>> Prometheus with any hack?
>>>       a. For Classic Histograms?
>>>       b. For Native Histograms?
>>> 4. A new PR/contribution on Prometheus to offer this support?
>>>
>>> Thanks,
>>> Teja
>>>
>>> On Thursday, June 19, 2025 at 6:38:59 PM UTC+2 Brian Candler wrote:
>>>
>>>> In general, I don't think you can get an accurate answer to that 
>>>> question from a histogram.
>>>>
>>>> You can work out which *bucket* the lowest and highest request 
>>>> durations sat in, which means you could give the lower and upper bounds of 
>>>> the minimum, and the lower and upper bounds of the maximum. Just compare 
>>>> the bucket counters at the start and end of the time range, and find the 
>>>> lowest boundary (le) which has changed, and the highest boundary which has 
>>>> changed. But this still doesn't tell you what the *actual* value was.  
>>>>
>>>> I don't think there's any point in trying to make an estimate of the 
>>>> actual value; these values are, by definition, outliers, so even if your 
>>>> data points fitted a nice distribution, these ones would be at the ends of 
>>>> the curve and subject to high error.
>>>>
>>>> Your LLM answer is essentially what it says in the documentation 
>>>> <https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile>
>>>>  
>>>> for histogram_quantile:
>>>>
>>>> *You can use histogram_quantile(0, v instant-vector) to get the 
>>>> estimated minimum value stored in a histogram.*
>>>>
>>>> *You can use histogram_quantile(1, v instant-vector) to get the 
>>>> estimated maximum value stored in a histogram.*
>>>> I thought it was worth testing. Here is a metric from my home 
>>>> prometheus server, running 2.53.4:
>>>>
>>>> *go_gc_pauses_seconds_bucket*
>>>> =>
>>>> go_gc_pauses_seconds_bucket{instance="localhost:9090", 
>>>> job="prometheus", le="6.399999999999999e-08"} 0
>>>> go_gc_pauses_seconds_bucket{instance="localhost:9090", 
>>>> job="prometheus", le="6.399999999999999e-07"} 0
>>>> go_gc_pauses_seconds_bucket{instance="localhost:9090", 
>>>> job="prometheus", le="7.167999999999999e-06"} 12193
>>>> go_gc_pauses_seconds_bucket{instance="localhost:9090", 
>>>> job="prometheus", le="8.191999999999999e-05"} 15369
>>>> go_gc_pauses_seconds_bucket{instance="localhost:9090", 
>>>> job="prometheus", le="0.0009175039999999999"} 27038
>>>> go_gc_pauses_seconds_bucket{instance="localhost:9090", 
>>>> job="prometheus", le="0.010485759999999998"} 27085
>>>> go_gc_pauses_seconds_bucket{instance="localhost:9090", 
>>>> job="prometheus", le="0.11744051199999998"} 27086
>>>> go_gc_pauses_seconds_bucket{instance="localhost:9090", 
>>>> job="prometheus", le="+Inf"} 27086
>>>>
>>>> *go_gc_pauses_seconds_bucket - go_gc_pauses_seconds_bucket offset 10m*
>>>> =>
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="6.399999999999999e-08"} 0
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="6.399999999999999e-07"} 0
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="7.167999999999999e-06"} 5
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="8.191999999999999e-05"} 5
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="0.0009175039999999999"} 10
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="0.010485759999999998"} 10
>>>> {instance="localhost:9090", job="prometheus", le="0.11744051199999998"} 
>>>> 10
>>>> {instance="localhost:9090", job="prometheus", le="+Inf"} 10
>>>>
>>>> *rate(go_gc_pauses_seconds_bucket[10m])*
>>>> =>
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="6.399999999999999e-08"} 0
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="6.399999999999999e-07"} 0
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="7.167999999999999e-06"} 0.007407407407407408
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="8.191999999999999e-05"} 0.007407407407407408
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="0.0009175039999999999"} 0.014814814814814815
>>>> {instance="localhost:9090", job="prometheus", 
>>>> le="0.010485759999999998"} 0.014814814814814815
>>>> {instance="localhost:9090", job="prometheus", le="0.11744051199999998"} 
>>>> 0.014814814814814815
>>>> {instance="localhost:9090", job="prometheus", le="+Inf"} 
>>>> 0.014814814814814815
>>>>
>>>> Those exponential bucket boundaries in scientific notation aren't very 
>>>> readable, but you can see that:
>>>> * the lowest response time must have been somewhere 
>>>> between 6.399999999999999e-07 and 7.167999999999999e-06
>>>> * the highest response time must have been somewhere between 
>>>> 8.191999999999999e-05 and 0.0009175039999999999
>>>>  
>>>> Here are the answers from the formula the LLM suggested:
>>>>
>>>>
>>>> *histogram_quantile(0, rate(go_gc_pauses_seconds_bucket[10m]))*=>
>>>> {instance="localhost:9090", job="prometheus"} *NaN*
>>>>
>>>> *histogram_quantile(1, rate(go_gc_pauses_seconds_bucket[10m]))*
>>>> =>
>>>> {instance="localhost:9090", job="prometheus"} *0.0009175039999999999*
>>>>
>>>> The lower boundary of "NaN" is not useful at all (possibly this is a 
>>>> bug?), but I found I could get a value by specifying a very low, but 
>>>> non-zero, quantile:
>>>>
>>>>
>>>> *histogram_quantile(0.000000001, 
>>>> rate(go_gc_pauses_seconds_bucket[10m]))*
>>>> =>
>>>> {instance="localhost:9090", job="prometheus"} *6.40000013056e-07*
>>>>
>>>> Those values *do* sit between the boundaries given:
>>>>
>>>> >>> 6.399999999999999e-07 < 6.40000013056e-07 <= 7.167999999999999e-06
>>>> True
>>>> >>> 8.191999999999999e-05 < 0.0009175039999999999 <= 
>>>> 0.0009175039999999999
>>>> True
>>>>
>>>> In fact, the "minimum" answer is very close to the lower edge of the 
>>>> relevant bucket, and the "maximum" is the upper edge of the relevant 
>>>> bucket.
>>>>
>>>> Therefore, these are not the *actual* minimum and maximum request 
>>>> times. In effect, they are saying "the minimum request time was *more 
>>>> than* 6.399999999999999e-07, and the maximum request time was *no more 
>>>> than* 0.0009175039999999999".  But that's as good as you can get with 
>>>> a histogram.
>>>>
>>>> On Wednesday, 18 June 2025 at 18:17:15 UTC+1 tejaswini vadlamudi wrote:
>>>>
>>>>> Including answer from Gen-AI:
>>>>>
>>>>> | Description                         | PromQL Query                   
>>>>>                                                                           
>>>>>   
>>>>>       | Notes                                                             
>>>>>   
>>>>>                             |
>>>>>
>>>>> |-------------------------------------|------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
>>>>> | Minimum request duration (1m)       | histogram_quantile(0, sum by 
>>>>> (le) (rate(http_request_duration_seconds_bucket[1m])))                    
>>>>>  
>>>>>         | Fast but may be noisy or return NaN if low traffic. Good for 
>>>>> near-real-time.                   |
>>>>> | Maximum request duration (1m)       | histogram_quantile(1, sum by 
>>>>> (le) (rate(http_request_duration_seconds_bucket[1m])))                    
>>>>>  
>>>>>         | Same as above, for longest duration estimate.                   
>>>>>   
>>>>>                               |
>>>>> | Minimum request duration (5m)       | histogram_quantile(0, sum by 
>>>>> (le) (rate(http_request_duration_seconds_bucket[5m])))                    
>>>>>  
>>>>>         | More stable, smoother estimate over a slightly longer window.   
>>>>>   
>>>>>                               |
>>>>> | Maximum request duration (5m)       | histogram_quantile(1, sum by 
>>>>> (le) (rate(http_request_duration_seconds_bucket[5m])))                    
>>>>>  
>>>>>         | Recommended when traffic is bursty or histogram series are 
>>>>> sparse.                             |
>>>>>
>>>>> Please confirm if the above answer is reliable or not. 
>>>>> On Wednesday, June 18, 2025 at 3:23:54 PM UTC+2 tejaswini vadlamudi 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I’m using Prometheus to monitor request durations via a histogram 
>>>>>> metric, e.g., http_request_duration_seconds_bucket. I would like to 
>>>>>> query:
>>>>>>
>>>>>>    - The minimum time taken by a request
>>>>>>    - The maximum time taken by a request
>>>>>>
>>>>>> …over a given time range (say, the last 1h or 24h).
>>>>>>
>>>>>> I understand that histogram buckets give cumulative counts of 
>>>>>> requests below certain durations, but I’m not sure how to extract the 
>>>>>> actual min or max values of request durations during a time window.
>>>>>>
>>>>>> Is this possible directly via PromQL? Or is there a recommended 
>>>>>> workaround (e.g., recording rules, external processing, or using 
>>>>>> histogram_quantile() in a specific way)?
>>>>>>
>>>>>> Thanks in advance for any guidance!
>>>>>>
>>>>>> Br,
>>>>>> Teja
>>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/a2f4acaa-6a78-48f0-8b02-972352e76bdcn%40googlegroups.com.

[prometheus-users] Re: Maximum and Minimum Request Duration on Prometheus Classic Histograms

Reply via email to