Re: [prometheus-users] Best option for short-lived jobs instead pushgateway?

Brian Candler Sun, 13 Mar 2022 01:28:41 -0800

If what you're interested in is the total number of download jobs and the 
total number of downloaded bytes - and not which particular job downloaded 
how many bytes - then you could use statsd_exporter.  It's like 
pushgateway, but it can add values to a counter, rather than just replacing 
values.  Then prometheus can scrape the statsd counter.  This works in many 
more scenarios, including when multiple download jobs occur between a pair 
of scrapes.


What you *don't* want is a separate prometheus timeseries per download job; 
there lies cardinality explosion and the problems you're already identified 
about where the timeseries "starts" and "ends".  Also makes it very hard to 
do aggregate calculations for reporting.

If you do need to report individually on each job and its number of 
downloaded bytes, then you're better off using an event logging system such 
as Loki or Elasticsearch.

On Saturday, 12 March 2022 at 21:13:52 UTC matt...@prometheus.io wrote:

> Prometheus does not really deal in single points. Many queries won't work. 
> You can record the finished crawl as an event, in a system of your choice 
> that handles events (any database, or log aggregators).
>
> Or, if your crawlers live for a while, treat them as "long" running.  Make 
> them expose metrics continuously using the appropriate client library, and 
> have Prometheus discover them as they come and go. The limitation here is 
> how fast you churn through instance labels, and what the cardinality 
> overall is. If a crawler lives for hours, that's going to work fine; 
> minutes, maybe; seconds, probably not.
>
> If you have a way of identifying *successions* of crawlers, you could use 
> relabeling to model these as "instances" that just happen to be different 
> containers over time. For example, if a given container crawls a specific 
> category of … somethings (even if the "category" is only a sharding key), 
> and later another container will do the same thing, you can relabel that 
> category into the instance label, making sure not to have any other "per 
> crawler container" labels that blow up the cardinality. This way, even 
> though the individual crawler process is short-lived, you treat a slightly 
> higher level as the "instance". This very much depends on the specifics of 
> your crawling process though, which you did not specify.
>
> /MR
>
> On Sat, Mar 12, 2022 at 8:54 PM Lucas Lobosque <lucas.l...@sled.com.br> 
> wrote:
>
>> Hi, I have 0 to many crawlers running at a given time, where each crawler 
>> is a docker container. I have a lot of metrics related to crawling, but 
>> lets stick to downloaded bytes. 
>>
>> Metrics are sent just before shutting down the process.
>>
>> I want to use prometheus + grafana to build dashboards and alerts for 
>> this metric. I thought that pushgateway was perfect for my use case here, 
>> since it acts as a proxy to aggregate and expose metrics from short-lived 
>> process.
>>
>> However, I noticed that once the job finishes, the value of the 
>> downloaded bytes for that crawler in that job never goes down, it keeps the 
>> value as a line, instead keeping it as a single data point.
>>
>> I came across an issue on pushgateway concluding that this behavior is by 
>> design, and will not change: 
>> https://github.com/prometheus/pushgateway/issues/19
>>
>> So, for my specific use case, what should I use to aggregate  metrics 
>> from these different jobs, in a way that data points are generated only 
>> while the job is aline, and not forever?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/3ca487ed-2e10-495d-b5f1-e5c32e9ef48bn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/3ca487ed-2e10-495d-b5f1-e5c32e9ef48bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8ab88962-7fc2-46ca-8555-ddb0f63cb46fn%40googlegroups.com.

Re: [prometheus-users] Best option for short-lived jobs instead pushgateway?

Reply via email to