Hey, I'm using Prometheus v2.29.1. My scrape interval is 15 seconds and I'm 
measuring RAM using "container_memory_working_set_bytes"(metrics used to 
check k8s pod usage) 

Using "Status" in the Prometheus web UI, I see the following Head Stats:

Number of Series  7644889
Number of Chunks  8266039
Number of Label Pairs  9968
Like I mentioned above, We're getting* the average Metrics Per node as 8257* 
and we have around 300 targets now, which makes our total metrics around 
2,100,000.
*Are you monitoring Kubernetes pods by any chance?  *I'm not monitoring any 
pods, I connect to certain nodes that send in custom metrics. Since I'm 
using a pod and not a node, the resources assigned to this pod are 
exclusive.

On Thursday, 10 February 2022 at 00:20:04 UTC-8 Brian Candler wrote:

> What prometheus version? How often are you polling? How are you measuring 
> the RAM utilisation?
>
> Let me give you a comparison.  I have a prometheus instance here which is 
> polling 161 node_exporter targets, 38 snmp_exporter targets, 46 
> blackbox_exporter targets, and a handful of others, with a 1 minute scrape 
> interval. It's running inside an lxd container, and uses a grand total of 
> *2.5GB 
> RAM* (as reported by "free" inside the container, "used" column).  The 
> entire physical server has 16GB of RAM, and is running a bunch of other 
> monitoring tools in other containers as well.  The physical host has 9GB of 
> available RAM (as reported by "free" on the host, "available" column).
>
> This is with prometheus-2.33.0, under Ubuntu 18.04, although I haven't 
> noticed significantly higher RAM utilisation with older versions of 
> prometheus.
>
> Using "Status" in the Prometheus web UI, I see the following Head Stats:
>
> Number of Series  525141
> Number of Chunks  525141
> Number of Label Pairs  15305
>
> I can use a relatively expensive query to count the individual metrics at 
> the current instance in time (takes a few seconds):
>     count by (job) ({__name__=~".+"})
>
> This shows 391,863 metrics for node(*), 99,175 metrics for snmp, 23,138 
> metrics for haproxy (keepalived), and roughly 10,000 other metrics in total.
>
> (*) Given that there are 161 node targets, that's an average of 2433 
> metrics per node (from node_exporter).
>
> In summary, I find prometheus to be extremely frugal in its use of RAM, 
> and therefore if you're getting OOM problems then there must be something 
> different about your system.
>
> Are you monitoring kubernetes pods by any chance?  Is there a lot of churn 
> in those pods (i.e. pods being created and destroyed)?  If you generate 
> large numbers of short-lived timeseries, then that will require a lot more 
> memory.  The Head Stats figures is the place to start.
>
> Aside: a week or two ago, there was an isolated incident where this server 
> started using more CPU and RAM.  Memory usage graphs showed the RAM growing 
> steadily over a period of about 5 hours; at that point, it was under so 
> much memory pressure I couldn't log in to diagnose, and was forced to 
> reboot.  However since node_exporter is only returning the overall RAM on 
> the host, not per-container, I can't tell which of the many containers 
> running on that host was the culprit.
>
> [image: ram.png]
> This server is also running victoriametrics, nfsen, loki, smokeping, 
> oxidized, netdisco, nagios, and some other bits and bobs - so it could have 
> been any one of those.  In fact, given that Ubuntu does various daily 
> housecleaning activities at 06:25am, it could have been any of those as 
> well.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/cb2f43ff-eebf-48cc-a77a-637482430448n%40googlegroups.com.

Reply via email to