Hi Community,

I've done further investigation on the issue and found the following

1. The perf of the read operation was decreased due to the lock contention
in the Prometheus TimeWindowQuantiles APIs. 3 out of 4 CommitProcWorker
threads were blocked on the TimeWindowQuantiles.insert() API when the test
was.

2. The perf of the write operation was decreased because of the high CPU
usage from Prometheus Summary type of metrics. The CPU usage of
CommitProcessor increased about 50% when Prometheus was disabled compared
to enabled (46% vs 80% with 4 CPU, 63% vs 99% with 12 CPU).


Prometheus integration is a great feature, however the negative performance
impact is very significant.  I wonder if anyone has any thoughts on how to
reduce the perf impact.



Thanks,


Li


On Tue, Apr 6, 2021 at 12:33 PM Li Wang <li4w...@gmail.com> wrote:

> Hi,
>
> I would like to reach out to the community to see if anyone has some
> insights or experience with the performance impact of enabling prometheus
> metrics.
>
> I have done load comparison tests for Prometheus enabled vs disabled and
> found the performance is reduced about 40%-60% for both read and write
> oeprations (i.e. getData, getChildren and createNode).
>
> The load test was done with Zookeeper 3.7, cluster size of 5 participants
> and 5 observers, each ZK server has 10G heap size and 4 cpu, 500 concurrent
> users sending requests.
>
> The performance impact is quite significant.  I wonder if this is expected
> and what are things we can do to have ZK performing the same while
> leveraging the new feature of Prometheus metric.
>
> Best,
>
> Li
>
>
>
>

Reply via email to