Awesome work Li, indeed. I’m happy to review the PR. Thanks for the contribution.
Andor > On 2021. May 4., at 3:49, Li Wang <li4w...@gmail.com> wrote: > > Hi, > > Thanks Ted and Enrico again for the inputs and discussions. I would like to > share some updates from my side. > > 1. The main issue is that Prometheus summary computation is expensive and > locked/synchronized. To reduce the perf impact, I changed the > PrometheusLabelledSummary.add() operation to be async, so it is handled by > separate threads from the main thread. > > 2. Ran benchmark against the changes. The perf of read and write operations > are significantly improved. . > > a) For read operation, the avg latency is reduced about 50% and avg > throughout is increased about 95%. > b) For write operation, the avg latency is reduced about 28% and avg > throughput is increased about 20%. > > 3. I opened a JIRA for the issue and can submit a PR for the change. > > https://issues.apache.org/jira/browse/ZOOKEEPER-4289 > > Please let me know if you have any thoughts or questions. > > Thanks, > > Li > > > > On Wed, Apr 28, 2021 at 11:32 PM Enrico Olivelli <eolive...@gmail.com> > wrote: > >> Il giorno mer 28 apr 2021 alle ore 21:48 Li Wang <li4w...@gmail.com> ha >> scritto: >>> >>> Hi Enrico, >>> >>> Please see my comments inline. >>> >>> Thanks, >>> >>> Li. >>> >>> On Mon, Apr 26, 2021 at 10:42 PM Enrico Olivelli <eolive...@gmail.com> >>> wrote: >>> >>>> Sorry for top posting. >>>> >>>> I have never seen in other applications that Prometheus has such a >>>> significant impact. >>> >>> >>> Please see my reply in the previous response. >>> >>> >>>> >>>> The first things that come into my mind: >>>> - collect a couple of dumps with some perf tool and dig into the >> problem >>> >>> >>> Heap dump, thread dump and CPU profiling with Prometheus enabled vs >>> disabled have been collected. That's how we found the issue. >>> >>> >>>> - verify that we have the latest version of Prometheus client >>>> >>> >>> ZK uses 0.9.0 and the latest version of Prometheus client is 0.10.0. >>> Checked in the change logs, it doesn't look like there are major changes >>> between these two. https://github.com/prometheus/client_java/releases We >>> can upgrade the version and see if it makes any difference. >>> >>> - tune the few knobs we have in the Prometheus client >>>> >>> >>> What are the knobs we can tune? Can you provide more info on this? >> >> Unfortunately there is nothing we can tune directly with the zookeeper >> configuration file >> >> In ZK code we currently have only some basic configuration of >> summaries, and we are using mostly the default values >> >> https://github.com/apache/zookeeper/blob/11c07921c15e2fb7692375327b53f26a583b77ca/zookeeper-metrics-providers/zookeeper-prometheus-metrics/src/main/java/org/apache/zookeeper/metrics/prometheus/PrometheusMetricsProvider.java#L340 >> there is no way to fine tune the buckets and all of the structures. >> >> we should work on looking for optimal values >> > > > > >> >> >>> >>>> >>>> In Apache Pulsar and in Apache Bookkeeper we have some customizations >> in >>>> the Prometheus metrics collectors, we could the a look and port those >> to >>>> Zookeeper (initially I worked on the Prometheus integration based on my >>>> usecases I have with Pulsar, Bookkeeper and other systems that already >> use >>>> Prometheus, but here in Zookeeper we are using the basic Prometheus >> client) >>>> >>> >>> Are there any customizations for minimizing the performance impact? Is >>> there anything that we can port to ZK to help the issue? >> >> Yes, there is much we can port, this is the Prometheus Metrics >> provider in BookKeeper >> >> https://github.com/apache/bookkeeper/tree/master/bookkeeper-stats-providers/prometheus-metrics-provider/src/main/java/org/apache/bookkeeper/stats/prometheus >> >> Enrico >> >>> >>> >>>> Enrico >>>> >>>> Il Mar 27 Apr 2021, 06:35 Ted Dunning <ted.dunn...@gmail.com> ha >> scritto: >>>> >>>>> Batching metrics reporting is very similar to option (c) but with >> locking >>>>> like option (a). That can usually be made faster by passing a >> reference >>>> to >>>>> the metrics accumulator to the reporting thread which can do the >> batch >>>>> update without locks. Usually requires ping-pong metrics >> accumulators so >>>>> that a thread can accumulate in one accumulator for a bit, pass that >> to >>>> the >>>>> merge thread and switch to using the alternate accumulator. Since all >>>>> threads typically report at the same time, this avoids a stampede on >> the >>>>> global accumulator. >>>>> >>>>> >>>>> On Mon, Apr 26, 2021 at 9:30 PM Li Wang <li4w...@gmail.com> wrote: >>>>> >>>>>> batching metrics reporting can help. For example, in the >>>> CommitProcessor, >>>>>> increasing the maxCommitBatchSize helps improving the the >> performance >>>> of >>>>>> write operation. >>>>>> >>>>>> >>>>>> On Mon, Apr 26, 2021 at 9:21 PM Li Wang <li4w...@gmail.com> wrote: >>>>>> >>>>>>> Yes, I am thinking that handling metrics reporting in a separate >>>>> thread, >>>>>>> so it doesn't impact the "main" thread. >>>>>>> >>>>>>> Not sure about the idea of merging at the end of a reporting >> period. >>>>> Can >>>>>>> you elaborate a bit on it? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Li >>>>>>> >>>>>>> On Mon, Apr 26, 2021 at 9:11 PM Ted Dunning < >> ted.dunn...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>>> Would it help to keep per thread metrics that are either >> reported >>>>>>>> independently or are merged at the end of a reporting period? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Apr 26, 2021 at 8:51 PM Li Wang <li4w...@gmail.com> >> wrote: >>>>>>>> >>>>>>>>> Hi Community, >>>>>>>>> >>>>>>>>> I've done further investigation on the issue and found the >>>> following >>>>>>>>> >>>>>>>>> 1. The perf of the read operation was decreased due to the >> lock >>>>>>>> contention >>>>>>>>> in the Prometheus TimeWindowQuantiles APIs. 3 out of 4 >>>>>> CommitProcWorker >>>>>>>>> threads were blocked on the TimeWindowQuantiles.insert() API >> when >>>>> the >>>>>>>> test >>>>>>>>> was. >>>>>>>>> >>>>>>>>> 2. The perf of the write operation was decreased because of >> the >>>> high >>>>>> CPU >>>>>>>>> usage from Prometheus Summary type of metrics. The CPU usage >> of >>>>>>>>> CommitProcessor increased about 50% when Prometheus was >> disabled >>>>>>>> compared >>>>>>>>> to enabled (46% vs 80% with 4 CPU, 63% vs 99% with 12 CPU). >>>>>>>>> >>>>>>>>> >>>>>>>>> Prometheus integration is a great feature, however the >> negative >>>>>>>> performance >>>>>>>>> impact is very significant. I wonder if anyone has any >> thoughts >>>> on >>>>>> how >>>>>>>> to >>>>>>>>> reduce the perf impact. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> >>>>>>>>> Li >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Apr 6, 2021 at 12:33 PM Li Wang <li4w...@gmail.com> >>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I would like to reach out to the community to see if anyone >> has >>>>> some >>>>>>>>>> insights or experience with the performance impact of >> enabling >>>>>>>> prometheus >>>>>>>>>> metrics. >>>>>>>>>> >>>>>>>>>> I have done load comparison tests for Prometheus enabled vs >>>>> disabled >>>>>>>> and >>>>>>>>>> found the performance is reduced about 40%-60% for both >> read and >>>>>> write >>>>>>>>>> oeprations (i.e. getData, getChildren and createNode). >>>>>>>>>> >>>>>>>>>> The load test was done with Zookeeper 3.7, cluster size of 5 >>>>>>>> participants >>>>>>>>>> and 5 observers, each ZK server has 10G heap size and 4 >> cpu, 500 >>>>>>>>> concurrent >>>>>>>>>> users sending requests. >>>>>>>>>> >>>>>>>>>> The performance impact is quite significant. I wonder if >> this >>>> is >>>>>>>>> expected >>>>>>>>>> and what are things we can do to have ZK performing the same >>>> while >>>>>>>>>> leveraging the new feature of Prometheus metric. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> >>>>>>>>>> Li >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>