If what you're interested in is the total number of download jobs and the total number of downloaded bytes - and not which particular job downloaded how many bytes - then you could use statsd_exporter. It's like pushgateway, but it can add values to a counter, rather than just replacing values. Then prometheus can scrape the statsd counter. This works in many more scenarios, including when multiple download jobs occur between a pair of scrapes.
What you *don't* want is a separate prometheus timeseries per download job; there lies cardinality explosion and the problems you're already identified about where the timeseries "starts" and "ends". Also makes it very hard to do aggregate calculations for reporting. If you do need to report individually on each job and its number of downloaded bytes, then you're better off using an event logging system such as Loki or Elasticsearch. On Saturday, 12 March 2022 at 21:13:52 UTC matt...@prometheus.io wrote: > Prometheus does not really deal in single points. Many queries won't work. > You can record the finished crawl as an event, in a system of your choice > that handles events (any database, or log aggregators). > > Or, if your crawlers live for a while, treat them as "long" running. Make > them expose metrics continuously using the appropriate client library, and > have Prometheus discover them as they come and go. The limitation here is > how fast you churn through instance labels, and what the cardinality > overall is. If a crawler lives for hours, that's going to work fine; > minutes, maybe; seconds, probably not. > > If you have a way of identifying *successions* of crawlers, you could use > relabeling to model these as "instances" that just happen to be different > containers over time. For example, if a given container crawls a specific > category of … somethings (even if the "category" is only a sharding key), > and later another container will do the same thing, you can relabel that > category into the instance label, making sure not to have any other "per > crawler container" labels that blow up the cardinality. This way, even > though the individual crawler process is short-lived, you treat a slightly > higher level as the "instance". This very much depends on the specifics of > your crawling process though, which you did not specify. > > /MR > > On Sat, Mar 12, 2022 at 8:54 PM Lucas Lobosque <lucas.l...@sled.com.br> > wrote: > >> Hi, I have 0 to many crawlers running at a given time, where each crawler >> is a docker container. I have a lot of metrics related to crawling, but >> lets stick to downloaded bytes. >> >> Metrics are sent just before shutting down the process. >> >> I want to use prometheus + grafana to build dashboards and alerts for >> this metric. I thought that pushgateway was perfect for my use case here, >> since it acts as a proxy to aggregate and expose metrics from short-lived >> process. >> >> However, I noticed that once the job finishes, the value of the >> downloaded bytes for that crawler in that job never goes down, it keeps the >> value as a line, instead keeping it as a single data point. >> >> I came across an issue on pushgateway concluding that this behavior is by >> design, and will not change: >> https://github.com/prometheus/pushgateway/issues/19 >> >> So, for my specific use case, what should I use to aggregate metrics >> from these different jobs, in a way that data points are generated only >> while the job is aline, and not forever? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to prometheus-use...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/3ca487ed-2e10-495d-b5f1-e5c32e9ef48bn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/prometheus-users/3ca487ed-2e10-495d-b5f1-e5c32e9ef48bn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8ab88962-7fc2-46ca-8555-ddb0f63cb46fn%40googlegroups.com.