Hello,

I'm not "a Prometheus dev" but this is something that I am interested in. 
Could you open up a PR with the benchmark and the fix? I'll help out with 
reviewing.

Thanks,
Giedrius

On Saturday, 4 January 2025 at 09:52:01 UTC+2 Rafał Dowgird wrote:

> Dear and Esteemed Prometheus developers,
>
> I'd like to discuss with you a performance problem with Pushgateway, 
> namely that the complexity of adding n metrics might get quadratic 
> (O(n^2)). Details follow.
>
> We have a mixed push/scrape system where Pushgateway handles some of the 
> metrics which come from batch jobs. While migrating some jobs to 
> Pushgateway we hit a performance bottleneck. We worked around this by 
> sharding Pushgateway. Still the sharded setup is more complex and the 
> amount of data wasn't that big, so we investigated the Pushgateway side of 
> things.
>
> It seems that the root of the problem is that every push operation causes 
> recalculation of hashes for all metrics already existing in the database. 
> This is how the consistency check logic works at present.
>
> I have created a simple benchmark to isolate/demonstrate the problem: 
> https://github.com/dowgird/pushgateway/commit/
> e0629ecb999c2f22cf098c87c78fc71cd0414733
>
> The output demonstrates that subsequent pushes of metrics get linearly 
> slower:
>
> I: 100 elapsed:220.379138ms diff:220.379138ms
> I: 200 elapsed:505.576881ms diff:285.197743ms
> I: 300 elapsed:841.153205ms diff:335.576324ms
> .
> .
> .
> I: 2700 elapsed:21.806380441s diff:1.391117119s
> I: 2800 elapsed:23.229272852s diff:1.422892411s
> I: 2900 elapsed:24.674250223s diff:1.444977371s
>
> Possible fix doesn't look very complicated algorithmically (memorizing the 
> hashes should work). Code-wise it's a bit more complex, which is a part of 
> why I'm writing this message. I can contribute the fix but this would 
> require some discussion of client API.
>
> The other part is that I understand from documentation and communications 
> on github issues that Pushgateway is not meant to be high performance. That 
> said, I still think it would be beneficial to remove this particular 
> performance bottleneck - there seem to be other people hitting it (
> https://github.com/prometheus/pushgateway/issues/643 might be caused by 
> this).
>
> Would you be open to accepting a fix for this issue?
>
> --
> Rafał

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-developers/b53b5c14-a4ea-4c21-8a53-aeb1cb0a6036n%40googlegroups.com.

Reply via email to