Hi all, I've been investigating Prometheus memory utilization over the last couple of days.
Based on *pprof* command outputs, I do see a lot of memory utilized by *getOrSet* function, but according to docs, it's just for creating new series, so not sure what I can do about it. Pprof "top" output: https://pastebin.com/bAF3fGpN Also, to figure out if I have any metrics that I can remove I ran ./tsdb analyze on memory *(output here: https://pastebin.com/twsFiuRk <https://pastebin.com/twsFiuRk>)* I did find some metrics having more cardinality than others but the difference was not very massive. With ~100 nodes our RAM takes around 15 Gigs. We're getting* average Metrics Per node: 8257* Our estimation is around 200 nodes, which will make our RAM go through the roof. Present Situation: Prometheus Containers got restarted due to OOM and I have fewer targets now (~6). That's probably why numbers seem low, but the metrics pulled will be the same. I was trying to recognize the pattern Some metrics: *process_resident_memory_bytes{instance="localhost:9090", job="prometheus"} 1536786432* *go_memstats_alloc_bytes{instance="localhost:9090", job="prometheus"} 908149496* Apart from distributing our load over multiple Prometheus nodes, are there any alternatives? TIA, Shubham -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a74b38ab-ee70-46c6-bd5c-563aede095f4n%40googlegroups.com.

