Hi all, 

I've been investigating Prometheus memory utilization over the last couple 
of days.

Based on *pprof* command outputs, I do see a lot of memory utilized by 
*getOrSet* function, but according to docs, it's just for creating new 
series, so not sure what I can do about it.


Pprof "top" output: 
https://pastebin.com/bAF3fGpN

Also, to figure out if I have any metrics that I can remove I ran ./tsdb 
analyze on memory *(output here: https://pastebin.com/twsFiuRk 
<https://pastebin.com/twsFiuRk>)*

I did find some metrics having more cardinality than others but the 
difference was not very massive.

With ~100 nodes our RAM takes around 15 Gigs.

We're getting* average Metrics Per node: 8257*
Our estimation is around 200 nodes, which will make our RAM go through the 
roof.

Present Situation:
Prometheus Containers got restarted due to OOM and I have fewer targets now 
(~6). That's probably why numbers seem low, but the metrics pulled will be 
the same.
I was trying to recognize the pattern 

Some metrics: 

*process_resident_memory_bytes{instance="localhost:9090", job="prometheus"} 
1536786432*
*go_memstats_alloc_bytes{instance="localhost:9090", job="prometheus"} 
908149496*

Apart from distributing our load over multiple Prometheus nodes, are there 
any alternatives?



TIA,
Shubham

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a74b38ab-ee70-46c6-bd5c-563aede095f4n%40googlegroups.com.

Reply via email to