Awesome work Li, indeed.
I’m happy to review the PR. Thanks for the contribution.

Andor



> On 2021. May 4., at 3:49, Li Wang <li4w...@gmail.com> wrote:
> 
> Hi,
> 
> Thanks Ted and Enrico again for the inputs and discussions. I would like to
> share some updates from my side.
> 
> 1. The main issue is that Prometheus summary computation is expensive and
> locked/synchronized. To reduce the perf impact, I changed the
> PrometheusLabelledSummary.add() operation to be async, so it is handled by
> separate threads from the main thread.
> 
> 2. Ran benchmark against the changes. The perf of read and write operations
> are significantly improved.  .
> 
> a) For read operation, the avg latency is reduced about 50% and avg
> throughout is increased about 95%.
> b) For write operation, the avg latency is reduced about 28% and avg
> throughput is increased about 20%.
> 
> 3. I opened a JIRA for the issue and can submit a PR for the change.
> 
> https://issues.apache.org/jira/browse/ZOOKEEPER-4289
> 
> Please let me know if you have any thoughts or questions.
> 
> Thanks,
> 
> Li
> 
> 
> 
> On Wed, Apr 28, 2021 at 11:32 PM Enrico Olivelli <eolive...@gmail.com>
> wrote:
> 
>> Il giorno mer 28 apr 2021 alle ore 21:48 Li Wang <li4w...@gmail.com> ha
>> scritto:
>>> 
>>> Hi Enrico,
>>> 
>>> Please see my comments inline.
>>> 
>>> Thanks,
>>> 
>>> Li.
>>> 
>>> On Mon, Apr 26, 2021 at 10:42 PM Enrico Olivelli <eolive...@gmail.com>
>>> wrote:
>>> 
>>>> Sorry for top posting.
>>>> 
>>>> I have never seen in other applications that Prometheus has such a
>>>> significant impact.
>>> 
>>> 
>>> Please see my reply in the previous response.
>>> 
>>> 
>>>> 
>>>> The first things that come into my mind:
>>>> - collect a couple of dumps with some perf tool and dig into the
>> problem
>>> 
>>> 
>>> Heap dump, thread dump and CPU profiling with Prometheus enabled vs
>>> disabled have been collected. That's how we found the issue.
>>> 
>>> 
>>>> -  verify that we have the latest version of Prometheus client
>>>> 
>>> 
>>> ZK uses 0.9.0 and the latest version of Prometheus client is 0.10.0.
>>> Checked in the change logs, it doesn't look like there are major changes
>>> between these two. https://github.com/prometheus/client_java/releases We
>>> can upgrade the version and see if it makes any difference.
>>> 
>>> - tune the few knobs we have in the Prometheus client
>>>> 
>>> 
>>> What are the knobs we can tune? Can you provide more info on this?
>> 
>> Unfortunately there is nothing we can tune directly with the zookeeper
>> configuration file
>> 
>> In ZK code we currently have only some basic configuration of
>> summaries, and we are using mostly the default values
>> 
>> https://github.com/apache/zookeeper/blob/11c07921c15e2fb7692375327b53f26a583b77ca/zookeeper-metrics-providers/zookeeper-prometheus-metrics/src/main/java/org/apache/zookeeper/metrics/prometheus/PrometheusMetricsProvider.java#L340
>> there is no way to fine tune the buckets and all of the structures.
>> 
>> we should work on looking for optimal values
>> 
> 
> 
> 
> 
>> 
>> 
>>> 
>>>> 
>>>> In Apache Pulsar and in Apache Bookkeeper we have some customizations
>> in
>>>> the Prometheus metrics collectors, we could the a look and port those
>> to
>>>> Zookeeper (initially I worked on the Prometheus integration based on my
>>>> usecases I have with Pulsar, Bookkeeper and other systems that already
>> use
>>>> Prometheus, but here in Zookeeper we are using the basic Prometheus
>> client)
>>>> 
>>> 
>>> Are there any customizations for minimizing the performance impact?  Is
>>> there anything that we can port to ZK to help the issue?
>> 
>> Yes, there is much we can port, this is the Prometheus Metrics
>> provider in BookKeeper
>> 
>> https://github.com/apache/bookkeeper/tree/master/bookkeeper-stats-providers/prometheus-metrics-provider/src/main/java/org/apache/bookkeeper/stats/prometheus
>> 
>> Enrico
>> 
>>> 
>>> 
>>>> Enrico
>>>> 
>>>> Il Mar 27 Apr 2021, 06:35 Ted Dunning <ted.dunn...@gmail.com> ha
>> scritto:
>>>> 
>>>>> Batching metrics reporting is very similar to option (c) but with
>> locking
>>>>> like option (a). That can usually be made faster by passing a
>> reference
>>>> to
>>>>> the metrics accumulator to the reporting thread which can do the
>> batch
>>>>> update without locks. Usually requires ping-pong metrics
>> accumulators so
>>>>> that a thread can accumulate in one accumulator for a bit, pass that
>> to
>>>> the
>>>>> merge thread and switch to using the alternate accumulator. Since all
>>>>> threads typically report at the same time, this avoids a stampede on
>> the
>>>>> global accumulator.
>>>>> 
>>>>> 
>>>>> On Mon, Apr 26, 2021 at 9:30 PM Li Wang <li4w...@gmail.com> wrote:
>>>>> 
>>>>>> batching metrics reporting can help. For example, in the
>>>> CommitProcessor,
>>>>>> increasing the maxCommitBatchSize helps improving the the
>> performance
>>>> of
>>>>>> write operation.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Apr 26, 2021 at 9:21 PM Li Wang <li4w...@gmail.com> wrote:
>>>>>> 
>>>>>>> Yes, I am thinking that handling metrics reporting in a separate
>>>>> thread,
>>>>>>> so it doesn't impact the "main" thread.
>>>>>>> 
>>>>>>> Not sure about the idea of merging at the end of a reporting
>> period.
>>>>> Can
>>>>>>> you elaborate a bit on it?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Li
>>>>>>> 
>>>>>>> On Mon, Apr 26, 2021 at 9:11 PM Ted Dunning <
>> ted.dunn...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Would it help to keep per thread metrics that are either
>> reported
>>>>>>>> independently or are merged at the end of a reporting period?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Apr 26, 2021 at 8:51 PM Li Wang <li4w...@gmail.com>
>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Community,
>>>>>>>>> 
>>>>>>>>> I've done further investigation on the issue and found the
>>>> following
>>>>>>>>> 
>>>>>>>>> 1. The perf of the read operation was decreased due to the
>> lock
>>>>>>>> contention
>>>>>>>>> in the Prometheus TimeWindowQuantiles APIs. 3 out of 4
>>>>>> CommitProcWorker
>>>>>>>>> threads were blocked on the TimeWindowQuantiles.insert() API
>> when
>>>>> the
>>>>>>>> test
>>>>>>>>> was.
>>>>>>>>> 
>>>>>>>>> 2. The perf of the write operation was decreased because of
>> the
>>>> high
>>>>>> CPU
>>>>>>>>> usage from Prometheus Summary type of metrics. The CPU usage
>> of
>>>>>>>>> CommitProcessor increased about 50% when Prometheus was
>> disabled
>>>>>>>> compared
>>>>>>>>> to enabled (46% vs 80% with 4 CPU, 63% vs 99% with 12 CPU).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Prometheus integration is a great feature, however the
>> negative
>>>>>>>> performance
>>>>>>>>> impact is very significant.  I wonder if anyone has any
>> thoughts
>>>> on
>>>>>> how
>>>>>>>> to
>>>>>>>>> reduce the perf impact.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Li
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Apr 6, 2021 at 12:33 PM Li Wang <li4w...@gmail.com>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I would like to reach out to the community to see if anyone
>> has
>>>>> some
>>>>>>>>>> insights or experience with the performance impact of
>> enabling
>>>>>>>> prometheus
>>>>>>>>>> metrics.
>>>>>>>>>> 
>>>>>>>>>> I have done load comparison tests for Prometheus enabled vs
>>>>> disabled
>>>>>>>> and
>>>>>>>>>> found the performance is reduced about 40%-60% for both
>> read and
>>>>>> write
>>>>>>>>>> oeprations (i.e. getData, getChildren and createNode).
>>>>>>>>>> 
>>>>>>>>>> The load test was done with Zookeeper 3.7, cluster size of 5
>>>>>>>> participants
>>>>>>>>>> and 5 observers, each ZK server has 10G heap size and 4
>> cpu, 500
>>>>>>>>> concurrent
>>>>>>>>>> users sending requests.
>>>>>>>>>> 
>>>>>>>>>> The performance impact is quite significant.  I wonder if
>> this
>>>> is
>>>>>>>>> expected
>>>>>>>>>> and what are things we can do to have ZK performing the same
>>>> while
>>>>>>>>>> leveraging the new feature of Prometheus metric.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> 
>>>>>>>>>> Li
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 

Reply via email to