Would it help to keep per thread metrics that are either reported independently or are merged at the end of a reporting period?
On Mon, Apr 26, 2021 at 8:51 PM Li Wang <li4w...@gmail.com> wrote: > Hi Community, > > I've done further investigation on the issue and found the following > > 1. The perf of the read operation was decreased due to the lock contention > in the Prometheus TimeWindowQuantiles APIs. 3 out of 4 CommitProcWorker > threads were blocked on the TimeWindowQuantiles.insert() API when the test > was. > > 2. The perf of the write operation was decreased because of the high CPU > usage from Prometheus Summary type of metrics. The CPU usage of > CommitProcessor increased about 50% when Prometheus was disabled compared > to enabled (46% vs 80% with 4 CPU, 63% vs 99% with 12 CPU). > > > Prometheus integration is a great feature, however the negative performance > impact is very significant. I wonder if anyone has any thoughts on how to > reduce the perf impact. > > > > Thanks, > > > Li > > > On Tue, Apr 6, 2021 at 12:33 PM Li Wang <li4w...@gmail.com> wrote: > > > Hi, > > > > I would like to reach out to the community to see if anyone has some > > insights or experience with the performance impact of enabling prometheus > > metrics. > > > > I have done load comparison tests for Prometheus enabled vs disabled and > > found the performance is reduced about 40%-60% for both read and write > > oeprations (i.e. getData, getChildren and createNode). > > > > The load test was done with Zookeeper 3.7, cluster size of 5 participants > > and 5 observers, each ZK server has 10G heap size and 4 cpu, 500 > concurrent > > users sending requests. > > > > The performance impact is quite significant. I wonder if this is > expected > > and what are things we can do to have ZK performing the same while > > leveraging the new feature of Prometheus metric. > > > > Best, > > > > Li > > > > > > > > >