Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

Brian Candler Tue, 28 Feb 2023 00:27:29 -0800

On Tuesday, 28 February 2023 at 00:45:36 UTC Christoph Anton Mitterer wrote:

I want to Prometheus merely or monitoring a few hundred nodes (thus it
seems a bit overkill to have something like Cortex, which sounds like a
system for really large number of nodes) at the university

Thanos may be simpler. Although I've not used it myself, it looks like it
can be deployed incrementally starting with the sidecars.

, though as
indicated before, we'd need both:
- details data for a like the last week or perhaps two
- far less detailed data for much longer terms (like several years)

I can offer a couple more options:

(1) Use two servers with federation.
- server 1 does the scraping and keeps the detailed data for 2 weeks
- server 2 scrapes server 1 at lower interval, using the federation endpoint

(2) Use recording rules to generate lower-resolution copies of the primary
timeseries - but then you'd still have to remote-write them to a second
server to get the longer retention, since this can't be set at timeseries
level.

Either case makes the querying more awkward. If you don't want separate
dashboards for near-term and long-term data, then it might work to stick
promxy in front of them.

Apart from saving disk space (and disks are really, really cheap these
days), I suspect the main benefit you're looking for is to get faster
queries when running over long time periods. Indeed, I believe Thanos
creates downsampled timeseries for exactly this reason, whilst still
continuing to retain all the full-resolution data as well.

Right now my Prometheus server runs in a medium sized VM, but when I
visualise via Grafana and select a time span of a month, it already
takes considerable time (like 10-15s) to render the graph.

Ah right, then that is indeed your concern.

Is this expected?

That depends. What PromQL query does your graph use? How many timeseries
does it touch? What's your scrape interval? Is your VM backed by SSDs?

For example, I have a very low performance (Celeron N2820, SATA SSD, 8GB
RAM) test box at home. I scrape data at 15 second intervals. Prometheus is
running in an lxd container, alongside many other lxd containers. The
query:

rate(ifHCInOctets{instance="gw2",ifName="pppoe-out2"}[2m])

run over a 30 day range takes less than a second - but that only touches
one timeseries. (With 2-hour chunks, I would expect a 30 day period to read
360 chunks, for a single timeseries). But it's possible that when I tested
it, it already had the relevant data cached in RAM.

If you are doing something like a Grafana dashboard, then you should
determine exactly what queries it's doing. Enabling the query log
<https://prometheus.io/docs/guides/query-log/> can also help you identify
the slowest running queries.

Another suggestion: running netdata <https://github.com/netdata/netdata>
within the VM will give you performance metrics at 1 second intervals,
which can help identify what's happening during those 10-15 seconds: e.g.
are you bottlenecked on CPU, or disk I/O, or something else.

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/43cea4c0-31a8-4dd6-8d98-3fed327ccf39n%40googlegroups.com.

Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

Reply via email to