On Fri, Mar 5, 2021 at 12:10 AM Dhruv Patel <dhruvpatel5...@gmail.com> wrote:
> Hi Folks, > We are seeing an issue in our current Prometheus Setup where we are not > able to ingest beyond 22 million metrics/min. We have run several Load Test > at 25 Million, 29 Million and 35 Million but the ingestion rate remains > constant around the same 22 million metrics/min. Moreover, we are also > seeing that our CPU Usage is around 70% and have more than 50% memory > available memory. Looking at this it feels like we are not hitting resource > limitations but something to do with lock contention. > > *Prometheus Version:* 2.9.1 > *Host Shape:* x7-enclave-104 (It is a bare metal host with 104 processor > units). More info can be obtained in below screenshots > *Memory Info: * > total used free shared > buff/cache available > Mem: 754G 88G 528G 67M 136G > 719G > Swap: 1.0G 0B 1.0G > Total: 755G 88G 529G > > We also ran some profiling during our load test setup at 20Million, 22 > Million and 25 Million and have seen an increase in time taken taken for > running runtime.mallocgc which leads to an increased usage in > runtime.futex. Some how we are not able to figure out what could be the > issue of the lock contention. I have attached our profiling results at > different load test levels if thats any useful. Any ideas on what could be > causing the high time taken in runtime malloc gc? > Prometheus is written in Go. The runtime.mallocgc function is called every time Prometheus allocates a new object during its operation. It looks like Prometheus 2.9.1 allocates a lot during the load test. The runtime.futex is used internally by Go runtime during objects' allocation and subsequent objects' deallocation (aka garbage collection). It looks like the Go runtime used in Prometheus 2.9.1 isn't optimized well for programs with frequent object allocations that run on systems with many CPU cores. This should be improved in Go 1.15 - Allocation of small objects now performs much better at high core counts, and has lower worst-case latency <https://tip.golang.org/doc/go1.15#runtime> . So it is recommended repeating the load test on to the latest available version of Prometheus, which is hopefully built with at least Go 1.15 - see https://github.com/prometheus/prometheus/releases . Additionally, you can run the load test on VictoriaMetrics and compare its scalability with Prometheus. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter . > > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to prometheus-users+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/abccd4c0-c69d-4869-8598-899b3de693f7n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/abccd4c0-c69d-4869-8598-899b3de693f7n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- Best Regards, Aliaksandr Valialkin, CTO VictoriaMetrics -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAPbKnmC5W-Q_Y5krMZNK-tnJsNUbjxcX2Cebqncrzq%3DQy%2BSa_Q%40mail.gmail.com.