Re: [prometheus-users] Guaranteed ingestion of metrics with historical timestamps

Ben Kochie Wed, 22 Jun 2022 00:28:18 -0700

On Wed, Jun 22, 2022 at 2:26 AM Jeremy Collette <jeremy.collett...@gmail.com>
wrote:


> Hi Stuart and sup (email appears to be truncated),
>
> Thanks for your responses.
>
> >  *The use of timestamps in metrics is not something that should be used
> except in some very specific cases.*
>
> Stuart, after reading through your reply I did some research and talked
> with my team to understand why we are emitting metrics with timestamps. It
> seems that we are using gauges incorrectly in an effort to simplify our
> exporter. For example, we are using gauges for HTTP request duration, where
> each sample value is a request duration in milliseconds. My colleagues were
> under the assumption that if we had multiple requests to the same endpoint
> during the scraping interval, we would need to expose two separate gauge
> samples or we would have data loss. For example, if we just take the last
> latency value, there could be an intermediate value (during the same
> scraping interval) with a higher latency (that may have triggered an
> Alerting rule) that we might have missed. This is why we are exporting
> metrics with timestamps: to expose multiple gauge samples with the same
> labels in the same scrape response.
>
> After reading some more Prometheus documentation, this appears to be poor
> practice. Instead, I now understand that we should be using a histogram or
> summary in this scenario.
>

Definitely use a Histogram. Summaries are, IMO, legacy. They can not be
aggregated over multiple instances. Histograms can be aggregated. It's also
best practice that you convert all event durations to seconds. Don't worry,
Prometheus uses float64, which will not give you any precision loss. You
can convert back to nano, micro, milli, or years later in Grafana.


>
> > *You also mention something about your exporter caching things until
> they are scraped, which also sounds like something that is not advisable.*
>
> Our metric samples are based on events that are emitted by partner team
> components. To collect these samples, we built an exporter that listens for
> partner team events and caches them. Upon being scraped, these events are
> turned in to metrics that are ingested in to Prometheus. The reason that we
> purged the scraped metrics was because they were all being emitted with
> distinct timestamps. However, if we implement support for histograms and
> summaries as I mentioned above, we can remove the timestamp from metrics
> and thus can continuously emit all metrics, taking the last known sample as
> the value.
>
> > *For this use case, it's likely what they want is Prometheus in agent
> mode, which uses remote write, which can buffer and catch up.*
>
> sup, we require local Alerting / Querying to be available in our
> Prometheus instance. I believe "Agent" mode does not support this.
>

Yes, that's correct. Agent mode can not run queries/rules. It's mean to be
as lightweight as possible as a forwarder.

My second guess was going to be that you had events that should be
aggregated into a histogram. :-)


>
>
> Cheers,
>
> Jeremy
>
> On Saturday, June 18, 2022 at 4:29:42 AM UTC-7 sup...@gmail.com wrote:
>
>> For this use case, it's likely what they want is Prometheus in agent
>> mode, which uses remote write, which can buffer and catch up.
>>
>> On Sat, Jun 18, 2022, 1:13 PM Stuart Clark <stuart...@jahingo.com> wrote:
>>
>>> On 14/06/2022 18:32, Jeremy Collette wrote:
>>>
>>> Hello,
>>>
>>> We have written a custom exporter that exposes metrics with explicit
>>> timestamps, which Prometheus periodically scrapes. In the case where
>>> Prometheus becomes temporarily unavailable, these metric samples will be
>>> cached in the exporter until they are scraped, causing affected metrics to
>>> age.
>>>
>>> I understand that if a metric is older than a certain threshold, it will
>>> be rejected by Prometheus with the message:  "Error on ingesting samples
>>> that are too old or are too far into the future".
>>>
>>> I'm trying to understand if there are any guarantees surrounding the
>>> ingestion of historical metrics. Is there some metric sample age that is
>>> guaranteed to be recent enough to be ingested? For example, are samples
>>> with timestamps within the last hour always going to be considered recent?
>>> Within the last five minutes?
>>>
>>> According to this previous thread: Error on ingesting samples that are
>>> too old
>>> <https://groups.google.com/g/prometheus-users/c/rKJYm6naEow/m/zylud_J4AAAJ>,
>>> MR seems to indicate that metrics as old as 1 second can be dropped due to
>>> being too old. Is this interpretation correct? If so, is there any way to
>>> ensure metrics with timestamps won't be dropped for being too old?
>>>
>>> The use of timestamps in metrics is not something that should be used
>>> except in some very specific cases. The main use case for adding a
>>> timestamp is when you are scraping metrics into Prometheus that have been
>>> sourced from another existing metrics system (for example things like the
>>> Cloudwatch Exporter). You also mention something about your exporter
>>> caching things until they are scraped, which also sounds like something
>>> that is not advisable. The action of the exporter shouldn't really be
>>> changing depending on the requests being received (or not received).
>>>
>>> An exporter is expected to return the various metrics that reflect
>>> "now", in the same way that a directly instrumented application would be
>>> expected to return the current state of the metrics being maintained in
>>> memory. For a simple exporter the normal mechanism is for a request to be
>>> received which then triggers some mechanism to generate the metrics. For
>>> example with something like the MySQL Exporter a request would trigger a
>>> query on the connected database which then returns various information that
>>> is converted into Prometheus metrics and returned. In some situations the
>>> process to fetch information from the underlying system can be quite
>>> resource intensive or slow. In that case a common design is to decouple the
>>> information fetching process from the request handling process. One example
>>> is to perform the information fetching process on a periodic timer, with
>>> the information fetched then stored in memory. The request process then
>>> reads and returns that information - returning the same values for every
>>> request until the next cycle of the information fetching process. In none
>>> of these standard scenarios would you expect timestamps to be attached to
>>> the returned metrics.
>>>
>>> It would be good to hear a bit more about what you are trying to do, as
>>> it is highly likely that the use of timestamps in your use case is probably
>>> not the right option and they should just be dropped.
>>>
>>> --
>>> Stuart Clark
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to prometheus-use...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-users/007cf462-d87e-d4c7-316e-4007567c74a1%40Jahingo.com
>>> <https://groups.google.com/d/msgid/prometheus-users/007cf462-d87e-d4c7-316e-4007567c74a1%40Jahingo.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/6e383554-3ffc-465c-92bc-6fa260535b3fn%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/6e383554-3ffc-465c-92bc-6fa260535b3fn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmp6BJuA_GrF-N3G_Ftu8Lc391nnC84cV6%2B9oa-wZ%3D745g%40mail.gmail.com.

Re: [prometheus-users] Guaranteed ingestion of metrics with historical timestamps

Reply via email to