[prometheus-users] Prometheus High RAM Investigation

Shubham Shrivastav Wed, 09 Feb 2022 18:45:56 -0800

Hi all, 

I've been investigating Prometheus memory utilization over the last couple 
of days.

Based on *pprof* command outputs, I do see a lot of memory utilized by
*getOrSet* function, but according to docs, it's just for creating new
series, so not sure what I can do about it.

Pprof "top" output:
https://pastebin.com/bAF3fGpN

Also, to figure out if I have any metrics that I can remove I ran ./tsdb
analyze on memory *(output here: https://pastebin.com/twsFiuRk
<https://pastebin.com/twsFiuRk>)*

I did find some metrics having more cardinality than others but the
difference was not very massive.

With ~100 nodes our RAM takes around 15 Gigs.

We're getting* average Metrics Per node: 8257*
Our estimation is around 200 nodes, which will make our RAM go through the
roof.

Present Situation:
Prometheus Containers got restarted due to OOM and I have fewer targets now
(~6). That's probably why numbers seem low, but the metrics pulled will be
the same.
I was trying to recognize the pattern

Some metrics:

*process_resident_memory_bytes{instance="localhost:9090", job="prometheus"}
1536786432*
*go_memstats_alloc_bytes{instance="localhost:9090", job="prometheus"}
908149496*

Apart from distributing our load over multiple Prometheus nodes, are there
any alternatives?

TIA,
Shubham

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/a74b38ab-ee70-46c6-bd5c-563aede095f4n%40googlegroups.com.

[prometheus-users] Prometheus High RAM Investigation

Reply via email to