Hi Andor, Ted, Enrico, Thanks again for your discussions and inputs. The changes have been submitted. https://github.com/apache/zookeeper/pull/1698
Would you mind reviewing the PR? Best, Li On Wed, May 5, 2021 at 10:00 AM Li Wang <li4w...@gmail.com> wrote: > Thanks Andor! > > Li > > On Tue, May 4, 2021 at 2:00 PM Andor Molnar <an...@apache.org> wrote: > >> Awesome work Li, indeed. >> I’m happy to review the PR. Thanks for the contribution. >> >> Andor >> >> >> >> > On 2021. May 4., at 3:49, Li Wang <li4w...@gmail.com> wrote: >> > >> > Hi, >> > >> > Thanks Ted and Enrico again for the inputs and discussions. I would >> like to >> > share some updates from my side. >> > >> > 1. The main issue is that Prometheus summary computation is expensive >> and >> > locked/synchronized. To reduce the perf impact, I changed the >> > PrometheusLabelledSummary.add() operation to be async, so it is handled >> by >> > separate threads from the main thread. >> > >> > 2. Ran benchmark against the changes. The perf of read and write >> operations >> > are significantly improved. . >> > >> > a) For read operation, the avg latency is reduced about 50% and avg >> > throughout is increased about 95%. >> > b) For write operation, the avg latency is reduced about 28% and avg >> > throughput is increased about 20%. >> > >> > 3. I opened a JIRA for the issue and can submit a PR for the change. >> > >> > https://issues.apache.org/jira/browse/ZOOKEEPER-4289 >> > >> > Please let me know if you have any thoughts or questions. >> > >> > Thanks, >> > >> > Li >> > >> > >> > >> > On Wed, Apr 28, 2021 at 11:32 PM Enrico Olivelli <eolive...@gmail.com> >> > wrote: >> > >> >> Il giorno mer 28 apr 2021 alle ore 21:48 Li Wang <li4w...@gmail.com> >> ha >> >> scritto: >> >>> >> >>> Hi Enrico, >> >>> >> >>> Please see my comments inline. >> >>> >> >>> Thanks, >> >>> >> >>> Li. >> >>> >> >>> On Mon, Apr 26, 2021 at 10:42 PM Enrico Olivelli <eolive...@gmail.com >> > >> >>> wrote: >> >>> >> >>>> Sorry for top posting. >> >>>> >> >>>> I have never seen in other applications that Prometheus has such a >> >>>> significant impact. >> >>> >> >>> >> >>> Please see my reply in the previous response. >> >>> >> >>> >> >>>> >> >>>> The first things that come into my mind: >> >>>> - collect a couple of dumps with some perf tool and dig into the >> >> problem >> >>> >> >>> >> >>> Heap dump, thread dump and CPU profiling with Prometheus enabled vs >> >>> disabled have been collected. That's how we found the issue. >> >>> >> >>> >> >>>> - verify that we have the latest version of Prometheus client >> >>>> >> >>> >> >>> ZK uses 0.9.0 and the latest version of Prometheus client is 0.10.0. >> >>> Checked in the change logs, it doesn't look like there are major >> changes >> >>> between these two. https://github.com/prometheus/client_java/releases >> We >> >>> can upgrade the version and see if it makes any difference. >> >>> >> >>> - tune the few knobs we have in the Prometheus client >> >>>> >> >>> >> >>> What are the knobs we can tune? Can you provide more info on this? >> >> >> >> Unfortunately there is nothing we can tune directly with the zookeeper >> >> configuration file >> >> >> >> In ZK code we currently have only some basic configuration of >> >> summaries, and we are using mostly the default values >> >> >> >> >> https://github.com/apache/zookeeper/blob/11c07921c15e2fb7692375327b53f26a583b77ca/zookeeper-metrics-providers/zookeeper-prometheus-metrics/src/main/java/org/apache/zookeeper/metrics/prometheus/PrometheusMetricsProvider.java#L340 >> >> there is no way to fine tune the buckets and all of the structures. >> >> >> >> we should work on looking for optimal values >> >> >> > >> > >> > >> > >> >> >> >> >> >>> >> >>>> >> >>>> In Apache Pulsar and in Apache Bookkeeper we have some customizations >> >> in >> >>>> the Prometheus metrics collectors, we could the a look and port those >> >> to >> >>>> Zookeeper (initially I worked on the Prometheus integration based on >> my >> >>>> usecases I have with Pulsar, Bookkeeper and other systems that >> already >> >> use >> >>>> Prometheus, but here in Zookeeper we are using the basic Prometheus >> >> client) >> >>>> >> >>> >> >>> Are there any customizations for minimizing the performance impact? >> Is >> >>> there anything that we can port to ZK to help the issue? >> >> >> >> Yes, there is much we can port, this is the Prometheus Metrics >> >> provider in BookKeeper >> >> >> >> >> https://github.com/apache/bookkeeper/tree/master/bookkeeper-stats-providers/prometheus-metrics-provider/src/main/java/org/apache/bookkeeper/stats/prometheus >> >> >> >> Enrico >> >> >> >>> >> >>> >> >>>> Enrico >> >>>> >> >>>> Il Mar 27 Apr 2021, 06:35 Ted Dunning <ted.dunn...@gmail.com> ha >> >> scritto: >> >>>> >> >>>>> Batching metrics reporting is very similar to option (c) but with >> >> locking >> >>>>> like option (a). That can usually be made faster by passing a >> >> reference >> >>>> to >> >>>>> the metrics accumulator to the reporting thread which can do the >> >> batch >> >>>>> update without locks. Usually requires ping-pong metrics >> >> accumulators so >> >>>>> that a thread can accumulate in one accumulator for a bit, pass that >> >> to >> >>>> the >> >>>>> merge thread and switch to using the alternate accumulator. Since >> all >> >>>>> threads typically report at the same time, this avoids a stampede on >> >> the >> >>>>> global accumulator. >> >>>>> >> >>>>> >> >>>>> On Mon, Apr 26, 2021 at 9:30 PM Li Wang <li4w...@gmail.com> wrote: >> >>>>> >> >>>>>> batching metrics reporting can help. For example, in the >> >>>> CommitProcessor, >> >>>>>> increasing the maxCommitBatchSize helps improving the the >> >> performance >> >>>> of >> >>>>>> write operation. >> >>>>>> >> >>>>>> >> >>>>>> On Mon, Apr 26, 2021 at 9:21 PM Li Wang <li4w...@gmail.com> wrote: >> >>>>>> >> >>>>>>> Yes, I am thinking that handling metrics reporting in a separate >> >>>>> thread, >> >>>>>>> so it doesn't impact the "main" thread. >> >>>>>>> >> >>>>>>> Not sure about the idea of merging at the end of a reporting >> >> period. >> >>>>> Can >> >>>>>>> you elaborate a bit on it? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> >> >>>>>>> Li >> >>>>>>> >> >>>>>>> On Mon, Apr 26, 2021 at 9:11 PM Ted Dunning < >> >> ted.dunn...@gmail.com> >> >>>>>> wrote: >> >>>>>>> >> >>>>>>>> Would it help to keep per thread metrics that are either >> >> reported >> >>>>>>>> independently or are merged at the end of a reporting period? >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Mon, Apr 26, 2021 at 8:51 PM Li Wang <li4w...@gmail.com> >> >> wrote: >> >>>>>>>> >> >>>>>>>>> Hi Community, >> >>>>>>>>> >> >>>>>>>>> I've done further investigation on the issue and found the >> >>>> following >> >>>>>>>>> >> >>>>>>>>> 1. The perf of the read operation was decreased due to the >> >> lock >> >>>>>>>> contention >> >>>>>>>>> in the Prometheus TimeWindowQuantiles APIs. 3 out of 4 >> >>>>>> CommitProcWorker >> >>>>>>>>> threads were blocked on the TimeWindowQuantiles.insert() API >> >> when >> >>>>> the >> >>>>>>>> test >> >>>>>>>>> was. >> >>>>>>>>> >> >>>>>>>>> 2. The perf of the write operation was decreased because of >> >> the >> >>>> high >> >>>>>> CPU >> >>>>>>>>> usage from Prometheus Summary type of metrics. The CPU usage >> >> of >> >>>>>>>>> CommitProcessor increased about 50% when Prometheus was >> >> disabled >> >>>>>>>> compared >> >>>>>>>>> to enabled (46% vs 80% with 4 CPU, 63% vs 99% with 12 CPU). >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Prometheus integration is a great feature, however the >> >> negative >> >>>>>>>> performance >> >>>>>>>>> impact is very significant. I wonder if anyone has any >> >> thoughts >> >>>> on >> >>>>>> how >> >>>>>>>> to >> >>>>>>>>> reduce the perf impact. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Thanks, >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Li >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Tue, Apr 6, 2021 at 12:33 PM Li Wang <li4w...@gmail.com> >> >>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> Hi, >> >>>>>>>>>> >> >>>>>>>>>> I would like to reach out to the community to see if anyone >> >> has >> >>>>> some >> >>>>>>>>>> insights or experience with the performance impact of >> >> enabling >> >>>>>>>> prometheus >> >>>>>>>>>> metrics. >> >>>>>>>>>> >> >>>>>>>>>> I have done load comparison tests for Prometheus enabled vs >> >>>>> disabled >> >>>>>>>> and >> >>>>>>>>>> found the performance is reduced about 40%-60% for both >> >> read and >> >>>>>> write >> >>>>>>>>>> oeprations (i.e. getData, getChildren and createNode). >> >>>>>>>>>> >> >>>>>>>>>> The load test was done with Zookeeper 3.7, cluster size of 5 >> >>>>>>>> participants >> >>>>>>>>>> and 5 observers, each ZK server has 10G heap size and 4 >> >> cpu, 500 >> >>>>>>>>> concurrent >> >>>>>>>>>> users sending requests. >> >>>>>>>>>> >> >>>>>>>>>> The performance impact is quite significant. I wonder if >> >> this >> >>>> is >> >>>>>>>>> expected >> >>>>>>>>>> and what are things we can do to have ZK performing the same >> >>>> while >> >>>>>>>>>> leveraging the new feature of Prometheus metric. >> >>>>>>>>>> >> >>>>>>>>>> Best, >> >>>>>>>>>> >> >>>>>>>>>> Li >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >> >> >>