Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-03-01 Thread Ben Kochie
On Thu, Mar 2, 2023 at 4:57 AM Christoph Anton Mitterer 
wrote:

> On Tue, 2023-02-28 at 10:25 +0100, Ben Kochie wrote:
> >
> > Debian release cycles are too slow for the pace of Prometheus
> > development.
>
> It's rather simple to pull the version from Debian unstable, if on
> needs so, and that seems pretty current.
>
>
> > You'd be better off running Prometheus using podman, or deploying
> > official binaries with Ansible[0].
>
> Well I guess view on how software should be distributed differ.
>
> The "traditional" system of having distributions has many advantages
> and is IMO a core reason for the success of Linux and OpenSource.
>
> All "modern" alternatives like flatpaks, snaps, and similar repos are
> IMO especially security wise completely inadequate (especially the fact
> that there is no trusted intermediate (like the distribution) which
> does some basic maintenance.
>

And I didn't say to use those. I said to use our official OCI container
image or release binaries.


>
> It's anyway not possible here because of security policy reasons.
>

That allows you to pull from unstable? :confused-pikachu:


>
>
> >
> > No, but It depends on your queries. Without seeing what you're
> > graphing there's no way to tell. Your queries could be complex or
> > inefficient. Kinda like writing slow SQL queries.
>
> As mentioned already in the other thread, so far I merely do only what:
> https://grafana.com/grafana/dashboards/1860-node-exporter-full/
> does.
>
>
> > There are ways to speed up graphs for specific things, for example
> > you can use recording rules to pre-render parts of the queries.
> >
> > For example, if you want to graph node CPU utilization you can have a
> > recording rule like this:
> >
> > groups:
> >   - name: node_exporter
> > interval: 60s
> > rules:
> >   - record: instance:node_cpu_utilization:ratio_rate1m
> > expr: >
> >   avg without (cpu) (
> > sum without (mode) (
> >
> > rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"
> > }[1m])
> > )
> >   )
> >
> > This will give you a single metric per node that will be faster to
> > render over longer periods of time. It also effectively down-samples
> > by only recording one point per minute.or
>
> But will dashbords like Node Exporter Full automatically use such?
> And if so... will they (or rather Prometheus) use the real time series
> (with full resolution) when needed?
>

Nope. That dashboard is meant to be generic, not efficient. It's a nice
demo, but not something I use or recommend other than to get ideas.


>
> If so, then the idea would be to create such a rule for every metric
> I'm interested in and that is slow, right?
>
>
>
> > Also "Medium sized VM" doesn't give us any indication of how much CPU
> > or memory you have. Prometheus uses page cache for database access.
> > So maybe your system is lacking enough memory to effectively cache
> > the data you're accessing.
>
> Right now it's 2 (virtual CPUs) with 4.5 GB RAM... I'd guess it might
> need more CPU?
>

Maybe not CPU right now. What do the metrics say? ;-)


>
> Previously I suspected IO to be the reason, and while in fact IO is
> slow (the backend seems to deliver only ~100MB/s)... there seems to be
> nearly no IO at all while waiting for the "slow graph" (which is Node
> Export Full's "CPU Basic" panel), e.g. when selecting the last 30 days.
>
> Kinda surprising... does Prometheus read it's TSDB really that
> efficiently?
>

Without seeing more of what's going on in your system, it's hard to say.
You have adequate CPU and memory for 40 nodes. You'll probably want about
2x what you have for 300 nodes.

>From what I can tell so far, downsampling isn't going to fix your
performance problem. Something else is going on.


>
>
> Could it be a problem, when the Grafana runs on another VM? Though
> there didn't seem to be any network bottleneck... and I guess Grafana
> just always accesses Prometheus via TCP, so there should be no further
> positive caching effect when both run on the same node?
>

No, not likely a problem. I have seen much larger installs running without
problem.


>
>
> > No, we've talked about having variable retention times, but nobody
> > has implemented this. It's possible to script this via the DELETE
> > endpoint[1]. It would be easy enough to write a cron job that deletes
> > specific metrics older than X, but I haven't seen this packaged into
> > a simple tool. I would love to see something like this created.
> >
> > [1]:
> > https://prometheus.io/docs/prometheus/latest/querying/api/#delete-
> > series
>
> Does it make sense to open a feature request ticket for that?
>
>
There already are tons of issues about this. The problem is nobody wants to
write the code and maintain it. Prometheus is an open source project, not a
company.


> I mean it would solve at least my storage "issue" (well it's not really
> a showstopper... as it was mentioned one could simply by a big 

Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-03-01 Thread Christoph Anton Mitterer
On Tue, 2023-02-28 at 10:25 +0100, Ben Kochie wrote:
> 
> Debian release cycles are too slow for the pace of Prometheus
> development.

It's rather simple to pull the version from Debian unstable, if on
needs so, and that seems pretty current.


> You'd be better off running Prometheus using podman, or deploying
> official binaries with Ansible[0].

Well I guess view on how software should be distributed differ.

The "traditional" system of having distributions has many advantages
and is IMO a core reason for the success of Linux and OpenSource.

All "modern" alternatives like flatpaks, snaps, and similar repos are
IMO especially security wise completely inadequate (especially the fact
that there is no trusted intermediate (like the distribution) which
does some basic maintenance.

It's anyway not possible here because of security policy reasons.


> 
> No, but It depends on your queries. Without seeing what you're
> graphing there's no way to tell. Your queries could be complex or
> inefficient. Kinda like writing slow SQL queries.

As mentioned already in the other thread, so far I merely do only what:
https://grafana.com/grafana/dashboards/1860-node-exporter-full/
does.


> There are ways to speed up graphs for specific things, for example
> you can use recording rules to pre-render parts of the queries.
> 
> For example, if you want to graph node CPU utilization you can have a
> recording rule like this:
> 
> groups:
>   - name: node_exporter
>     interval: 60s
>     rules:
>       - record: instance:node_cpu_utilization:ratio_rate1m
>         expr: >
>           avg without (cpu) (
>             sum without (mode) (
>              
> rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"
> }[1m])
>             )
>           )
> 
> This will give you a single metric per node that will be faster to
> render over longer periods of time. It also effectively down-samples
> by only recording one point per minute.

But will dashbords like Node Exporter Full automatically use such?
And if so... will they (or rather Prometheus) use the real time series
(with full resolution) when needed?

If so, then the idea would be to create such a rule for every metric
I'm interested in and that is slow, right?



> Also "Medium sized VM" doesn't give us any indication of how much CPU
> or memory you have. Prometheus uses page cache for database access.
> So maybe your system is lacking enough memory to effectively cache
> the data you're accessing.

Right now it's 2 (virtual CPUs) with 4.5 GB RAM... I'd guess it might
need more CPU?

Previously I suspected IO to be the reason, and while in fact IO is
slow (the backend seems to deliver only ~100MB/s)... there seems to be
nearly no IO at all while waiting for the "slow graph" (which is Node
Export Full's "CPU Basic" panel), e.g. when selecting the last 30 days.

Kinda surprising... does Prometheus read it's TSDB really that
efficiently?


Could it be a problem, when the Grafana runs on another VM? Though
there didn't seem to be any network bottleneck... and I guess Grafana
just always accesses Prometheus via TCP, so there should be no further
positive caching effect when both run on the same node?


> No, we've talked about having variable retention times, but nobody
> has implemented this. It's possible to script this via the DELETE
> endpoint[1]. It would be easy enough to write a cron job that deletes
> specific metrics older than X, but I haven't seen this packaged into
> a simple tool. I would love to see something like this created.
> 
> [1]: 
> https://prometheus.io/docs/prometheus/latest/querying/api/#delete-
> series 

Does it make sense to open a feature request ticket for that?

I mean it would solve at least my storage "issue" (well it's not really
a showstopper... as it was mentioned one could simply by a big check
HDD/SSD).

And could via the same way be something made that downsamples data for
times longer ago?


Both together would really give quite some flexibility.

For metrics where old data is "boring" one could just delete
everything older than e.g. 2 weeks, while keeping full details for that
time.

For metrics where one is interested in larger time ranges, but where
sample resolution doesn't matter so much, one could downsample it...
like everything older then 2 weeks... then even more for everything
older than 6 months, then even more for everything older than 1 year...
and so on.

For few metrics where full resolution data is interesting over a really
long time span, one could just keep it.



> > Seem at least quite big to me... that would - assuming all days can
> > be
> > compressed roughly to that (which isn't sure of course) - mean for
> > one
> > year one needs ~ 250 GB for that 40 nodes or about 6,25 GB per node
> > (just for the data for node exporter with a 15s interval).
> 
> Without seeing a full meta.json and the size of the files in one dir,
> it's hard to say exactly if this is good or bad. It depends a bit on
> how 

Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-03-01 Thread Christoph Anton Mitterer
Hey Brian

On Tue, 2023-02-28 at 00:27 -0800, Brian Candler wrote:
> 
> I can offer a couple more options:
> 
> (1) Use two servers with federation.
> - server 1 does the scraping and keeps the detailed data for 2 weeks
> - server 2 scrapes server 1 at lower interval, using the federation
> endpoint

I had thought about that as well. Though it feels a bit "ugly".


> (2) Use recording rules to generate lower-resolution copies of the
> primary timeseries - but then you'd still have to remote-write them
> to a second server to get the longer retention, since this can't be
> set at timeseries level.

I had (very briefly) read about the recording rules (merely just that
they exist ^^) ... but wouldn't these give me a new name for the
metric?

If so, I'd need to adapt e.g.
https://grafana.com/grafana/dashboards/1860-node-exporter-full/ to use
the metrics generated by the recording rules,... which again seems
quite some maintenance effort.

Plus, as you even wrote below, I'd need users to use different
dashboards, AFAIU, one where the detailed data is used, one where the
downsampled data is used.
Sure that would work as a workaround, but is of course not really a
good solution, as one would rather want to "seamlessly" move from the
detailed to less-detailed data.


> Either case makes the querying more awkward.  If you don't want
> separate dashboards for near-term and long-term data, then it might
> work to stick promxy in front of them.

Which would however make the setup more complex again.


> Apart from saving disk space (and disks are really, really cheap
> these days), I suspect the main benefit you're looking for is to get
> faster queries when running over long time periods.  Indeed, I
> believe Thanos creates downsampled timeseries for exactly this
> reason, whilst still continuing to retain all the full-resolution
> data as well.

I guess I may have too look into that, how complex it's setup would be.



> That depends.  What PromQL query does your graph use? How many
> timeseries does it touch? What's your scrape interval?

So far I've just been playing with the ones from:
https://grafana.com/grafana/dashboards/1860-node-exporter-full/
So all queries in that and all time series that uses.

Interval is 15s.


> Is your VM backed by SSDs?

I think it's a Ceph cluster what the super computing centre uses for
that, but I have no idea what that runs upon. Probably HDDs.


> Another suggestion: running netdata within the VM will give you
> performance metrics at 1 second intervals, which can help identify
> what's happening during those 10-15 seconds: e.g. are you
> bottlenecked on CPU, or disk I/O, or something else.

Good idea, thanks.


Thanks,
Chris.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e35d617dbaab44de43da049414103ff1e9102e61.camel%40gmail.com.


Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-02-28 Thread Ben Kochie
On Tue, Feb 28, 2023 at 1:45 AM Christoph Anton Mitterer 
wrote:

> Hy Stuart, Julien and Ben,
>
> Hope you don't mind that I answer all three replies in one... don't
> wanna spam the list ;-)
>

Thanks!


>
>
>
> On Tue, 2023-02-21 at 07:31 +, Stuart Clark wrote:
> > Prometheus itself cannot do downsampling, but other related projects
> > such as Cortex & Thanos have such features.
>
> Uhm, I see. Unfortunately neither is packaged for Debian. Plus it seems
> to make the overall system even more complex.
>

I do not recommend using Debian packages for Prometheus. Debian release
cycles are too slow for the pace of Prometheus development. Every release
brings new improvements. In the last year we've made improvements to memory
use and query performance. Debian also ignores the Go source vendor
versions we provide which leads to bugs.

You'd be better off running Prometheus using podman, or deploying official
binaries with Ansible[0].

[0]: https://github.com/prometheus-community/ansible


>
> I want to Prometheus merely or monitoring a few hundred nodes (thus it
> seems a bit overkill to have something like Cortex, which sounds like a
> system for really large number of nodes) at the university, though as
> indicated before, we'd need both:
> - details data for a like the last week or perhaps two
> - far less detailed data for much longer terms (like several years)
>
> Right now my Prometheus server runs in a medium sized VM, but when I
> visualise via Grafana and select a time span of a month, it already
> takes considerable time (like 10-15s) to render the graph.
>
> Is this expected?
>

No, but It depends on your queries. Without seeing what you're graphing
there's no way to tell. Your queries could be complex or inefficient. Kinda
like writing slow SQL queries.

There are ways to speed up graphs for specific things, for example you can
use recording rules to pre-render parts of the queries.

For example, if you want to graph node CPU utilization you can have a
recording rule like this:

groups:
  - name: node_exporter
interval: 60s
rules:
  - record: instance:node_cpu_utilization:ratio_rate1m
expr: >
  avg without (cpu) (
sum without (mode) (

rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[1m])
)
  )

This will give you a single metric per node that will be faster to render
over longer periods of time. It also effectively down-samples by only
recording one point per minute.

Also "Medium sized VM" doesn't give us any indication of how much CPU or
memory you have. Prometheus uses page cache for database access. So maybe
your system is lacking enough memory to effectively cache the data you're
accessing.


>
>
>
>
> On Tue, 2023-02-21 at 11:45 +0100, Julien Pivotto wrote:
> > We would love to have this in the future but it would require careful
> > planning and design document.
>
> So native support is nothing on the near horizon?
>
> And I guess it's really not possible to "simply" ( ;-) ) have different
> retention times for different metrics?
>

No, we've talked about having variable retention times, but nobody has
implemented this. It's possible to script this via the DELETE endpoint[1].
It would be easy enough to write a cron job that deletes specific metrics
older than X, but I haven't seen this packaged into a simple tool. I would
love to see something like this created.

[1]:
https://prometheus.io/docs/prometheus/latest/querying/api/#delete-series


>
>
>
>
> On Tue, 2023-02-21 at 15:52 +0100, Ben Kochie wrote:
> > This is mostly unnecessary in Prometheus because it uses compression
> > in the TSDB samples. What would take up a lot of space in an RRD file
> > takes up very little space in Prometheus.
>
> Well right now I scrape only the node-exporter data from 40 hosts at a
> 15s interval plus the metrics from prometheus itself.
> I'm doing this on test install since the 21st of February.
> Retention time is still at it's default.
>
> That gives me:
> # du --apparent-size -l -c -s --si /var/lib/prometheus/metrics2/*
> 68M /var/lib/prometheus/metrics2/01GSST2X0KDHZ0VM2WEX0FPS2H
> 481M/var/lib/prometheus/metrics2/01GSVQWH7BB6TDCEWXV4QFC9V2
> 501M/var/lib/prometheus/metrics2/01GSXNP1T77WCEM44CGD7E95QH
> 485M/var/lib/prometheus/metrics2/01GSZKFK53BQRXFAJ7RK9EDHQX
> 490M/var/lib/prometheus/metrics2/01GT1H90WKAHYGSFED5W2BW49Q
> 487M/var/lib/prometheus/metrics2/01GT3F2SJ6X22HFFPFKMV6DB3B
> 498M/var/lib/prometheus/metrics2/01GT5CW8HNJSGFJH2D3ADGC9HH
> 490M/var/lib/prometheus/metrics2/01GT7ANS5KDVHVQZJ7RTVNQQGH
> 501M/var/lib/prometheus/metrics2/01GT98FETDR3PN34ZP59Y0KNXT
> 172M/var/lib/prometheus/metrics2/01GT9X2BPN51JGB6QVK2X8R3BR
> 60M /var/lib/prometheus/metrics2/01GTAASP91FSFGBBH8BBN2SQDJ
> 60M /var/lib/prometheus/metrics2/01GTAHNDG070WXY8WGDVS22D2Y
> 171M/var/lib/prometheus/metrics2/01GTAHNHQ587CQVGWVDAN26V8S
> 102M/var/lib/prometheus/metrics2/chunks_head
> 

Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-02-28 Thread Brian Candler
On Tuesday, 28 February 2023 at 00:45:36 UTC Christoph Anton Mitterer wrote:

I want to Prometheus merely or monitoring a few hundred nodes (thus it 
seems a bit overkill to have something like Cortex, which sounds like a 
system for really large number of nodes) at the university


Thanos may be simpler. Although I've not used it myself, it looks like it 
can be deployed incrementally starting with the sidecars.

 

, though as 
indicated before, we'd need both: 
- details data for a like the last week or perhaps two 
- far less detailed data for much longer terms (like several years)


I can offer a couple more options:

(1) Use two servers with federation.
- server 1 does the scraping and keeps the detailed data for 2 weeks
- server 2 scrapes server 1 at lower interval, using the federation endpoint

(2) Use recording rules to generate lower-resolution copies of the primary 
timeseries - but then you'd still have to remote-write them to a second 
server to get the longer retention, since this can't be set at timeseries 
level.

Either case makes the querying more awkward.  If you don't want separate 
dashboards for near-term and long-term data, then it might work to stick 
promxy in front of them.

Apart from saving disk space (and disks are really, really cheap these 
days), I suspect the main benefit you're looking for is to get faster 
queries when running over long time periods.  Indeed, I believe Thanos 
creates downsampled timeseries for exactly this reason, whilst still 
continuing to retain all the full-resolution data as well.

Right now my Prometheus server runs in a medium sized VM, but when I 
visualise via Grafana and select a time span of a month, it already 
takes considerable time (like 10-15s) to render the graph.


Ah right, then that is indeed your concern.
 

Is this expected?


That depends.  What PromQL query does your graph use? How many timeseries 
does it touch? What's your scrape interval?  Is your VM backed by SSDs?

For example, I have a very low performance (Celeron N2820, SATA SSD, 8GB 
RAM) test box at home.  I scrape data at 15 second intervals. Prometheus is 
running in an lxd container, alongside many other lxd containers.  The 
query:

rate(ifHCInOctets{instance="gw2",ifName="pppoe-out2"}[2m])

run over a 30 day range takes less than a second - but that only touches 
one timeseries. (With 2-hour chunks, I would expect a 30 day period to read 
360 chunks, for a single timeseries).  But it's possible that when I tested 
it, it already had the relevant data cached in RAM.

If you are doing something like a Grafana dashboard, then you should 
determine exactly what queries it's doing.  Enabling the query log 
 can also help you identify 
the slowest running queries.

Another suggestion: running netdata  
within the VM will give you performance metrics at 1 second intervals, 
which can help identify what's happening during those 10-15 seconds: e.g. 
are you bottlenecked on CPU, or disk I/O, or something else.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/43cea4c0-31a8-4dd6-8d98-3fed327ccf39n%40googlegroups.com.


Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-02-27 Thread Christoph Anton Mitterer
Hy Stuart, Julien and Ben,

Hope you don't mind that I answer all three replies in one... don't
wanna spam the list ;-)



On Tue, 2023-02-21 at 07:31 +, Stuart Clark wrote:
> Prometheus itself cannot do downsampling, but other related projects 
> such as Cortex & Thanos have such features.

Uhm, I see. Unfortunately neither is packaged for Debian. Plus it seems
to make the overall system even more complex.

I want to Prometheus merely or monitoring a few hundred nodes (thus it
seems a bit overkill to have something like Cortex, which sounds like a
system for really large number of nodes) at the university, though as
indicated before, we'd need both:
- details data for a like the last week or perhaps two
- far less detailed data for much longer terms (like several years)

Right now my Prometheus server runs in a medium sized VM, but when I
visualise via Grafana and select a time span of a month, it already
takes considerable time (like 10-15s) to render the graph.

Is this expected?




On Tue, 2023-02-21 at 11:45 +0100, Julien Pivotto wrote:
> We would love to have this in the future but it would require careful
> planning and design document.

So native support is nothing on the near horizon?

And I guess it's really not possible to "simply" ( ;-) ) have different
retention times for different metrics?




On Tue, 2023-02-21 at 15:52 +0100, Ben Kochie wrote:
> This is mostly unnecessary in Prometheus because it uses compression
> in the TSDB samples. What would take up a lot of space in an RRD file
> takes up very little space in Prometheus.

Well right now I scrape only the node-exporter data from 40 hosts at a
15s interval plus the metrics from prometheus itself.
I'm doing this on test install since the 21st of February.
Retention time is still at it's default.

That gives me:
# du --apparent-size -l -c -s --si /var/lib/prometheus/metrics2/*
68M /var/lib/prometheus/metrics2/01GSST2X0KDHZ0VM2WEX0FPS2H
481M/var/lib/prometheus/metrics2/01GSVQWH7BB6TDCEWXV4QFC9V2
501M/var/lib/prometheus/metrics2/01GSXNP1T77WCEM44CGD7E95QH
485M/var/lib/prometheus/metrics2/01GSZKFK53BQRXFAJ7RK9EDHQX
490M/var/lib/prometheus/metrics2/01GT1H90WKAHYGSFED5W2BW49Q
487M/var/lib/prometheus/metrics2/01GT3F2SJ6X22HFFPFKMV6DB3B
498M/var/lib/prometheus/metrics2/01GT5CW8HNJSGFJH2D3ADGC9HH
490M/var/lib/prometheus/metrics2/01GT7ANS5KDVHVQZJ7RTVNQQGH
501M/var/lib/prometheus/metrics2/01GT98FETDR3PN34ZP59Y0KNXT
172M/var/lib/prometheus/metrics2/01GT9X2BPN51JGB6QVK2X8R3BR
60M /var/lib/prometheus/metrics2/01GTAASP91FSFGBBH8BBN2SQDJ
60M /var/lib/prometheus/metrics2/01GTAHNDG070WXY8WGDVS22D2Y
171M/var/lib/prometheus/metrics2/01GTAHNHQ587CQVGWVDAN26V8S
102M/var/lib/prometheus/metrics2/chunks_head
21k /var/lib/prometheus/metrics2/queries.active
427M/var/lib/prometheus/metrics2/wal
5,0Gtotal

Not sure whether I understood meta.json correctly (haven't found a
documentation for minTime/maxTime) but I guess that the big ones
correspond to 64800s?

Seem at least quite big to me... that would - assuming all days can be
compressed roughly to that (which isn't sure of course) - mean for one
year one needs ~ 250 GB for that 40 nodes or about 6,25 GB per node
(just for the data for node exporter with a 15s interval).

Does that sound reasonable/expected?



> What's actually more
> difficult is doing all the index loads for this long period of time.
> But Prometheus uses mmap to opportunistically access the data on
> disk.

And is there anything that can be done to improve that? Other than
simply using some fast NVMe or so?



Thanks,
Chris.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/45f21aedf2412705809fc69522055ca82b2f95f2.camel%40gmail.com.


Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-02-21 Thread Ben Kochie
This is mostly unnecessary in Prometheus because it uses compression in the
TSDB samples. What would take up a lot of space in an RRD file takes up
very little space in Prometheus.

A basic nearline 20TB HDD can easily store 600,000 series for 10 years at
full 15s resolution.

This is possible because the average sample point size in Prometheus is
about 1.5 bytes per sample. So 1.5 bytes * 5760 samples/day * 365 days * 10
years =~ 30MiB.

So for your example, looking up the data for a single metric over a long
period of time is still pretty cheap. What's actually more difficult is
doing all the index loads for this long period of time. But Prometheus uses
mmap to opportunistically access the data on disk.

On Tue, Feb 21, 2023 at 4:29 AM Christoph Anton Mitterer 
wrote:

> Hey.
>
> I wondered whether one can to with Prometheus something similar that is
> possible with systems using RRD (e.g. Ganlia).
>
> Depending on the kind of metrics, like for those from the node exporter,
> one may want a very high sample resolution (and thus short scraping
> interval) for like the last 2 days,... but the further one goes back the
> less interesting those data becomes, at least in that resolution (ever
> looked a how much IO a server had 2 years ago per 15s)?
>
> What one may however want is a rough overview of these metrics for those
> time periods longer ago, e.g. in order to see some trends.
>
>
> For other values, e.g. the total used disk space on a shared filesystem or
> maybe a tape library, one may not need such high resolution for the last 2
> days, but therefore want the data (with low sample resolution, e.g. 1
> sample per day) going back much longer, like the last 10 years.
>
>
> With Ganglia/RRD it one would then simply use multiple RRDs, each for
> different time spans and with different resolutions... and RRD would
> interpolate it's samples accordingly.
>
>
> Can anything like this be done with Prometheus? Or is that completely out
> of scope?
>
>
> I saw that one can set the retention period, but that seems to affect
> everything.
>
> So even if I have e.g. my low resolution tape library total size, which I
> could scrape only every hour or so, ... it wouldn't really help me.
> In order to keep data for that like the last 10 years, I'd need to set the
> retention time to that.
>
> But then the high resolution samples like from the node exporter would
> also be kept that long (with full resolution).
>
>
> Thanks,
> Chris.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/36e3506c-1fba-48e4-b3d9-ead908767cf2n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmqSw%3DQH-0Lx7GuXAUQawKX8omgyKGC6FPG3%2BcC3hW%3DyKA%40mail.gmail.com.


Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-02-21 Thread Julien Pivotto
On 21 Feb 07:31, Stuart Clark wrote:
> On 21/02/2023 03:29, Christoph Anton Mitterer wrote:
> > Hey.
> > 
> > I wondered whether one can to with Prometheus something similar that is
> > possible with systems using RRD (e.g. Ganlia).
> > 
> > Depending on the kind of metrics, like for those from the node exporter,
> > one may want a very high sample resolution (and thus short scraping
> > interval) for like the last 2 days,... but the further one goes back the
> > less interesting those data becomes, at least in that resolution (ever
> > looked a how much IO a server had 2 years ago per 15s)?
> > 
> > What one may however want is a rough overview of these metrics for those
> > time periods longer ago, e.g. in order to see some trends.
> > 
> > 
> > For other values, e.g. the total used disk space on a shared filesystem
> > or maybe a tape library, one may not need such high resolution for the
> > last 2 days, but therefore want the data (with low sample resolution,
> > e.g. 1 sample per day) going back much longer, like the last 10 years.
> > 
> > 
> > With Ganglia/RRD it one would then simply use multiple RRDs, each for
> > different time spans and with different resolutions... and RRD would
> > interpolate it's samples accordingly.
> > 
> > 
> > Can anything like this be done with Prometheus? Or is that completely
> > out of scope?
> > 
> > 
> > I saw that one can set the retention period, but that seems to affect
> > everything.
> > 
> > So even if I have e.g. my low resolution tape library total size, which
> > I could scrape only every hour or so, ... it wouldn't really help me.
> > In order to keep data for that like the last 10 years, I'd need to set
> > the retention time to that.
> > 
> > But then the high resolution samples like from the node exporter would
> > also be kept that long (with full resolution).
> > 
> Prometheus itself cannot do downsampling, but other related projects such as
> Cortex & Thanos have such features.


We would love to have this in the future but it would require careful
planning and design document.


-- 
Julien Pivotto
@roidelapluie

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/Y/SgxLFrcQ4fL0mo%40nixos.


Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-02-20 Thread Stuart Clark

On 21/02/2023 03:29, Christoph Anton Mitterer wrote:

Hey.

I wondered whether one can to with Prometheus something similar that 
is possible with systems using RRD (e.g. Ganlia).


Depending on the kind of metrics, like for those from the node 
exporter, one may want a very high sample resolution (and thus short 
scraping interval) for like the last 2 days,... but the further one 
goes back the less interesting those data becomes, at least in that 
resolution (ever looked a how much IO a server had 2 years ago per 15s)?


What one may however want is a rough overview of these metrics for 
those time periods longer ago, e.g. in order to see some trends.



For other values, e.g. the total used disk space on a shared 
filesystem or maybe a tape library, one may not need such high 
resolution for the last 2 days, but therefore want the data (with low 
sample resolution, e.g. 1 sample per day) going back much longer, like 
the last 10 years.



With Ganglia/RRD it one would then simply use multiple RRDs, each for 
different time spans and with different resolutions... and RRD would 
interpolate it's samples accordingly.



Can anything like this be done with Prometheus? Or is that completely 
out of scope?



I saw that one can set the retention period, but that seems to affect 
everything.


So even if I have e.g. my low resolution tape library total size, 
which I could scrape only every hour or so, ... it wouldn't really 
help me.
In order to keep data for that like the last 10 years, I'd need to set 
the retention time to that.


But then the high resolution samples like from the node exporter would 
also be kept that long (with full resolution).


Prometheus itself cannot do downsampling, but other related projects 
such as Cortex & Thanos have such features.


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f72674b0-a7de-ec89-955a-608b521cb754%40Jahingo.com.


[prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-02-20 Thread Christoph Anton Mitterer
Hey.

I wondered whether one can to with Prometheus something similar that is 
possible with systems using RRD (e.g. Ganlia).

Depending on the kind of metrics, like for those from the node exporter, 
one may want a very high sample resolution (and thus short scraping 
interval) for like the last 2 days,... but the further one goes back the 
less interesting those data becomes, at least in that resolution (ever 
looked a how much IO a server had 2 years ago per 15s)?

What one may however want is a rough overview of these metrics for those 
time periods longer ago, e.g. in order to see some trends.


For other values, e.g. the total used disk space on a shared filesystem or 
maybe a tape library, one may not need such high resolution for the last 2 
days, but therefore want the data (with low sample resolution, e.g. 1 
sample per day) going back much longer, like the last 10 years.


With Ganglia/RRD it one would then simply use multiple RRDs, each for 
different time spans and with different resolutions... and RRD would 
interpolate it's samples accordingly.


Can anything like this be done with Prometheus? Or is that completely out 
of scope?


I saw that one can set the retention period, but that seems to affect 
everything.

So even if I have e.g. my low resolution tape library total size, which I 
could scrape only every hour or so, ... it wouldn't really help me.
In order to keep data for that like the last 10 years, I'd need to set the 
retention time to that.

But then the high resolution samples like from the node exporter would also 
be kept that long (with full resolution).


Thanks,
Chris.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/36e3506c-1fba-48e4-b3d9-ead908767cf2n%40googlegroups.com.