Hello, I'm not "a Prometheus dev" but this is something that I am interested in. Could you open up a PR with the benchmark and the fix? I'll help out with reviewing.
Thanks, Giedrius On Saturday, 4 January 2025 at 09:52:01 UTC+2 Rafał Dowgird wrote: > Dear and Esteemed Prometheus developers, > > I'd like to discuss with you a performance problem with Pushgateway, > namely that the complexity of adding n metrics might get quadratic > (O(n^2)). Details follow. > > We have a mixed push/scrape system where Pushgateway handles some of the > metrics which come from batch jobs. While migrating some jobs to > Pushgateway we hit a performance bottleneck. We worked around this by > sharding Pushgateway. Still the sharded setup is more complex and the > amount of data wasn't that big, so we investigated the Pushgateway side of > things. > > It seems that the root of the problem is that every push operation causes > recalculation of hashes for all metrics already existing in the database. > This is how the consistency check logic works at present. > > I have created a simple benchmark to isolate/demonstrate the problem: > https://github.com/dowgird/pushgateway/commit/ > e0629ecb999c2f22cf098c87c78fc71cd0414733 > > The output demonstrates that subsequent pushes of metrics get linearly > slower: > > I: 100 elapsed:220.379138ms diff:220.379138ms > I: 200 elapsed:505.576881ms diff:285.197743ms > I: 300 elapsed:841.153205ms diff:335.576324ms > . > . > . > I: 2700 elapsed:21.806380441s diff:1.391117119s > I: 2800 elapsed:23.229272852s diff:1.422892411s > I: 2900 elapsed:24.674250223s diff:1.444977371s > > Possible fix doesn't look very complicated algorithmically (memorizing the > hashes should work). Code-wise it's a bit more complex, which is a part of > why I'm writing this message. I can contribute the fix but this would > require some discussion of client API. > > The other part is that I understand from documentation and communications > on github issues that Pushgateway is not meant to be high performance. That > said, I still think it would be beneficial to remove this particular > performance bottleneck - there seem to be other people hitting it ( > https://github.com/prometheus/pushgateway/issues/643 might be caused by > this). > > Would you be open to accepting a fix for this issue? > > -- > Rafał -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/prometheus-developers/b53b5c14-a4ea-4c21-8a53-aeb1cb0a6036n%40googlegroups.com.

