[jira] [Created] (KAFKA-16129) Add integration test for KIP-977

2024-01-14 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-16129:
--

 Summary: Add integration test for KIP-977
 Key: KAFKA-16129
 URL: https://issues.apache.org/jira/browse/KAFKA-16129
 Project: Kafka
  Issue Type: Task
Reporter: Qichao Chu


*{{metrics.verbosity}}* will be a new dynamic config introduced to control the 
verbosity(fan-out rate) of the metrics. It's a config with JSON format 
specifying the condition controlling fan-out of the metrics. If the value 
*{{high}}* is set for the *{{level}}* key of the configured JSON(see below for 
example values), high fan-out tags(e.g. {*}{{partition}}{*})will be added to 
metrics specified by the {{*name*}} filter and will apply to all the topics 
that meet the conditions in the {{*filters*}} section. In the *{{low}}* 
settings, these tags will be assigned with an empty value. We elected to make 
it central so that this implementation can be generalized in the future either 
into a library, or allow other means for centralized control.

More details: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics

 

The following 3 tests will be done for common metrics collectors: JMX, 
Prometheus, and OpenTelemetry.
 # The partition tag can be observed from metrics if high verbosity is used
 # The partition tag should result in an empty string or be filtered out by the 
metrics collector if default verbosity is used
 # Dynamically setting the verbosity can result in the behavior defined in the 
above tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16128) Use metrics.verbosity in throughput metrics

2024-01-14 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-16128:
--

 Summary: Use metrics.verbosity in throughput metrics
 Key: KAFKA-16128
 URL: https://issues.apache.org/jira/browse/KAFKA-16128
 Project: Kafka
  Issue Type: Task
Reporter: Qichao Chu


*{{metrics.verbosity}}* will be a new dynamic config introduced to control the 
verbosity(fan-out rate) of the metrics. It's a config with JSON format 
specifying the condition controlling fan-out of the metrics.

This task will link it to the following throughput metrics. Specifically, in 
the event level 'high' is used, the partition dimension should be added.

 
|Metrics Name|Meaning|
|{{MessagesInPerSec}}|Messages entered the partition, per second|
|{{BytesInPerSec}}|Bytes entered the partition, per second|
|{{BytesOutPerSec}}|Bytes retrieved from the partition, per second|
|{{BytesRejectedPerSec}}|Bytes exceeding max message size in a partition, per 
second|
|{{TotalProduceRequestsPerSec}}|Produce request count for a partition, per 
second|
|{{TotalFetchRequestsPerSec}}|Fetch request count for a partition, per second|
|{{FailedProduceRequestsPerSec}}|Failed to produce request count for a 
partition, per second|
|{{FailedFetchRequestsPerSec}}|Failed to fetch request count for a partition, 
per second|
|{{FetchMessageConversionsPerSec}}|Broker side conversions(de-compressions) for 
a partition, per second|
|{{ProduceMessageConversionsPerSec}}|Broker side conversions(compressions) for 
a partition, per second|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16127) Reset corresponding verbosity configs when topic is deleted

2024-01-14 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-16127:
--

 Summary: Reset corresponding verbosity configs when topic is 
deleted
 Key: KAFKA-16127
 URL: https://issues.apache.org/jira/browse/KAFKA-16127
 Project: Kafka
  Issue Type: Task
Reporter: Qichao Chu


*{{metrics.verbosity}}* will be a new dynamic config introduced to control the 
verbosity(fan-out rate) of the metrics. It's a config with JSON format 
specifying the condition controlling fan-out of the metrics.

When the topic is deleted, the corresponding topic config should be reset to 
default value because:
 # If the topic is automatically recreated, the verbosity from the previous 
generation may cause too much/too little metrics to be emitted
 # Too many unused configurations may cause user error when configuring this 
field. The typical workflow would be getting the value first, then adding or 
modifying the config to reflect the latest requirement. If we don't delete 
unused entries, it could only grow instead of evolve.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16125) Add parser for the metrics.verbosity config

2024-01-14 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-16125:
--

 Summary: Add parser for the metrics.verbosity config
 Key: KAFKA-16125
 URL: https://issues.apache.org/jira/browse/KAFKA-16125
 Project: Kafka
  Issue Type: Task
Reporter: Qichao Chu
Assignee: Qichao Chu


*{{metrics.verbosity}}* will be a new dynamic config introduced to control the 
verbosity(fan-out rate) of the metrics. It's a config with JSON format 
specifying the condition controlling fan-out of the metrics.

The parser will have the following validation responsibilities:
 # Validate that the config conforms to JSON format.
 # Validate that the level field is configured correctly with a set level
 # Validate that the names field pattern is acceptable to 
{{java.util.regex-compatible}}
 # Validate that the filters field conforms to JSON map format
 # Validate that all the filter patterns are acceptable to 
{{java.util.regex-compatible}}
 # Validate that no other field exists in the config

After parsing and validating the config, the parser should:
 * Generate a config object which contains the level, names, and filter fields. 
There should exist a default verbosity level too.
 * Inside the object, the predicates should also be generated to allow fast 
querying of the rules
 * Overwrite/update the existing predicates if the config is updated

The config object should:
 * Determine the verbosity level supplied with the topic name and the metrics 
name
 * If not defined, fall back to the default config level



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16124) Create metrics.verbosity dynamic config for controlling metrics verbosity

2024-01-14 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-16124:
--

 Summary: Create metrics.verbosity dynamic config for controlling 
metrics verbosity
 Key: KAFKA-16124
 URL: https://issues.apache.org/jira/browse/KAFKA-16124
 Project: Kafka
  Issue Type: Task
Reporter: Qichao Chu
Assignee: Qichao Chu


*{{metrics.verbosity}}* will be a new dynamic config introduced to control the 
verbosity(fan-out rate) of the metrics. It's a config with JSON format 
specifying the condition controlling fan-out of the metrics.

More details: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Kafka trunk test & build stability

2024-01-14 Thread Qichao Chu
Hi Divij and all,

Regarding the speeding up of the build & de-flaking tests, LinkedIn has
done some great work which we probably can borrow ideas from.
In the LinkedIn/Kafka repo, we can see one of their most recent PRs
 only took < 9 min(unit
test) + < 12 min (integration-test) + < 9 (code check) = < 30 min to finish
all the checks:

   1. Similar to what David(mumrah) has mentioned/experimented with, the
   LinkedIn team used GitHub Actions, which displayed the results in a cleaner
   way directly from GitHub.
   2. Each top-level package is checked separately to increase the
   concurrency. To further boost the speed for integration tests, the tests
   inside one package are divided into sub-groups(A-Z) based on their
   names(see this job
    for
   details).
   3. Once the tests are running at a smaller granularity with a decent
   runner, heavy integration tests are less likely to be flaky, and flaky
   tests are easier to catch.


--
Qichao


On Wed, Jan 10, 2024 at 2:57 PM Divij Vaidya 
wrote:

> Hey folks
>
> We seem to have a handle on the OOM issues with the multiple fixes
> community members made. In
> https://issues.apache.org/jira/browse/KAFKA-16052,
> you can see the "before" profile in the description and the "after" profile
> in the latest comment to see the difference. To prevent future recurrence,
> we have an ongoing solution at https://github.com/apache/kafka/pull/15101
> and after that we will start another once to get rid of mockito mocks at
> the end of every test suite using a similar extension. Note that this
> doesn't solve the flaky test problems in the trunk but it removes the
> aspect of build failures due to OOM (one of the many problems).
>
> To fix the flaky test problem, we probably need to run our tests in a
> separate CI environment (like Apache Beam does) instead of sharing the 3
> hosts that run our CI with many many other Apache projects. This assumption
> is based on the fact that the tests are less flaky when running on laptops
> / powerful EC2 machines. One of the avenues to get funding for these
> Kafka-only hosts is
>
> https://aws.amazon.com/blogs/opensource/aws-promotional-credits-open-source-projects/
> . I will start the conversation on this one with AWS & Apache Infra in the
> next 1-2 months.
>
> --
> Divij Vaidya
>
>
>
> On Tue, Jan 9, 2024 at 9:21 PM Colin McCabe  wrote:
>
> > Sorry, but to put it bluntly, the current build setup isn't good enough
> at
> > partial rebuilds that build caching would make sense. All Kafka devs have
> > had the experience of needing to clean the build directory in order to
> get
> > a valid build. The scala code esspecially seems to have this issue.
> >
> > regards,
> > Colin
> >
> >
> > On Tue, Jan 2, 2024, at 07:00, Nick Telford wrote:
> > > Addendum: I've opened a PR with what I believe are the changes
> necessary
> > to
> > > enable Remote Build Caching, if you choose to go that route:
> > > https://github.com/apache/kafka/pull/15109
> > >
> > > On Tue, 2 Jan 2024 at 14:31, Nick Telford 
> > wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> Regarding building a "dependency graph"... Gradle already has this
> > >> information, albeit fairly coarse-grained. You might be able to get
> some
> > >> considerable improvement by configuring the Gradle Remote Build Cache.
> > It
> > >> looks like it's currently disabled explicitly:
> > >> https://github.com/apache/kafka/blob/trunk/settings.gradle#L46
> > >>
> > >> The trick is to have trunk builds write to the cache, and PR builds
> only
> > >> read from it. This way, any PR based on trunk should be able to cache
> > not
> > >> only the compilation, but also the tests from dependent modules that
> > >> haven't changed (e.g. for a PR that only touches the connect/streams
> > >> modules).
> > >>
> > >> This would probably be preferable to having to hand-maintain some
> > >> rules/dependency graph in the CI configuration, and it's quite
> > >> straight-forward to configure.
> > >>
> > >> Bonus points if the Remote Build Cache is readable publicly, enabling
> > >> contributors to benefit from it locally.
> > >>
> > >> Regards,
> > >> Nick
> > >>
> > >> On Tue, 2 Jan 2024 at 13:00, Lucas Brutschy  > .invalid>
> > >> wrote:
> > >>
> > >>> Thanks for all the work that has already been done on this in the
> past
> > >>> days!
> > >>>
> > >>> Have we considered running our test suite with
> > >>> -XX:+HeapDumpOnOutOfMemoryError and uploading the heap dumps as
> > >>> Jenkins build artifacts? This could speed up debugging. Even if we
> > >>> store them only for a day and do it only for trunk, I think it could
> > >>> be worth it. The heap dumps shouldn't contain any secrets, and I
> > >>> checked with the ASF infra team, and they are not concerned about the
> > >>> additional disk usage.
> > >>>
> > >>> Cheers,
> > >>> Lucas
> > >>>
> > >>> On Wed, Dec 27, 2023 at 2:25 PM Divij 

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-29 Thread Qichao Chu
Hi Vahid,

Thank you so much for the review and voting!

Best,
Qichao

On Wed, Nov 29, 2023 at 2:55 PM Vahid Hashemian  wrote:

> Hi Qichao,
>
> Thanks for answering my questions and updating the KIP accordingly.
>
> It looks good to me.
>
> --Vahid
>
>
> On Tue, Nov 28, 2023, 7:22 PM Qichao Chu  wrote:
>
> > Hi Vahid,
> >
> > Thank you for taking the time to review the KIP and asking great
> questions.
> >
> > The execution path mentioned is exactly how this KIP is going to
> function.
> > We believe this config, compared with many other configurations like
> quota,
> > will not pose significant alteration to the broker's functionality. Thus
> > overwriting the permanent config dynamically is acceptable. After
> thinking
> > about this topic again, I think your question makes a lot of sense and we
> > shouldn't override the permanent config. Permanent config usually
> > represents the normal operation condition, so the temporal config should
> be
> > removed after the cluster is restarted/re-provisioned to be 'clean'. This
> > also helps to prevent undesirable behavior and to reduce the risk
> involved
> > in operation. I have updated the KIP to mention this behavior.
> >
> > `Medium` level is not mentioned as we don't have any medium-level metrics
> > for now. This is for future extension. I have also updated the KIP to
> > reflect this information.
> >
> > Best,
> > Qichao
> >
> >
> >
> > On Tue, Nov 28, 2023 at 9:22 AM Vahid Hashemian 
> wrote:
> >
> > > Hi Qichao,
> > >
> > > Thanks for proposing this KIP. It'd be super valuable to have the
> ability
> > > to have those partition level metrics for Kafka topics.
> > >
> > > Sorry I'm late to the discussion. I just wanted to bring up a point
> > > for clarification and one question:
> > >
> > > Let's assume that a production cluster cannot afford to enable high
> > > verbosity on a permanent basis (at least not for all topics) due to
> > > performance concerns.
> > >
> > > Since this new config can be set dynamically, in case of an issue or
> > > investigation that warrants obtaining partition level metrics, one can
> > > simply enable high verbosity for select topic(s), temporarily collect
> > > metrics at partition level, and then change the config back to the
> > previous
> > > setting. Since the config values are not set incrementally, the
> operator
> > > would need to run a `describe` to get the existing config first, and
> then
> > > amend it to enable high verbosity for the topic(s) of interest.
> Finally,
> > > when the investigation concludes, the config has to be reverted to its
> > > permanent setting.
> > >
> > > If the above execution path makes sense, in case the operator forgets
> to
> > > take an inventory of the existing (permanent) config and simply
> > overwrites
> > > it, then that permanent config will be gone and not retrievable. Is
> this
> > > correct?
> > >
> > > We usually don't need to temporarily change broker configs and I see
> this
> > > config as one that can be temporarily changed. So keeping track of what
> > the
> > > value was before the change is rather important.
> > >
> > > Aside from this point, my question is: What's the impact of `medium`
> > > setting for `level`? I couldn't find it described in the KIP.
> > >
> > > Thanks!
> > > --Vahid
> > >
> > >
> > >
> > > On Mon, Nov 13, 2023 at 5:34 AM Divij Vaidya 
> > > wrote:
> > >
> > > > Thank you for updating the KIP Qichao.
> > > >
> > > > I don't have any more questions or suggestions. Looks good to move
> > > forward
> > > > from my perspective.
> > > >
> > > >
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Fri, Nov 10, 2023 at 2:25 PM Qichao Chu 
> > > > wrote:
> > > >
> > > > > Thank you again for the nice suggestions, Jorge!
> > > > > I will wait for Divij's response and move it to the vote stage once
> > the
> > > > > generic filter part reached concensus.
> > > > >
> > > > > Qichao Chu
> > > > > Software Engineer | Data - Kafka
> > > > > [image: Uber] <https://uber.com/>
> > > > >
> >

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-28 Thread Qichao Chu
Hi Vahid,

Thank you for taking the time to review the KIP and asking great questions.

The execution path mentioned is exactly how this KIP is going to function.
We believe this config, compared with many other configurations like quota,
will not pose significant alteration to the broker's functionality. Thus
overwriting the permanent config dynamically is acceptable. After thinking
about this topic again, I think your question makes a lot of sense and we
shouldn't override the permanent config. Permanent config usually
represents the normal operation condition, so the temporal config should be
removed after the cluster is restarted/re-provisioned to be 'clean'. This
also helps to prevent undesirable behavior and to reduce the risk involved
in operation. I have updated the KIP to mention this behavior.

`Medium` level is not mentioned as we don't have any medium-level metrics
for now. This is for future extension. I have also updated the KIP to
reflect this information.

Best,
Qichao



On Tue, Nov 28, 2023 at 9:22 AM Vahid Hashemian  wrote:

> Hi Qichao,
>
> Thanks for proposing this KIP. It'd be super valuable to have the ability
> to have those partition level metrics for Kafka topics.
>
> Sorry I'm late to the discussion. I just wanted to bring up a point
> for clarification and one question:
>
> Let's assume that a production cluster cannot afford to enable high
> verbosity on a permanent basis (at least not for all topics) due to
> performance concerns.
>
> Since this new config can be set dynamically, in case of an issue or
> investigation that warrants obtaining partition level metrics, one can
> simply enable high verbosity for select topic(s), temporarily collect
> metrics at partition level, and then change the config back to the previous
> setting. Since the config values are not set incrementally, the operator
> would need to run a `describe` to get the existing config first, and then
> amend it to enable high verbosity for the topic(s) of interest. Finally,
> when the investigation concludes, the config has to be reverted to its
> permanent setting.
>
> If the above execution path makes sense, in case the operator forgets to
> take an inventory of the existing (permanent) config and simply overwrites
> it, then that permanent config will be gone and not retrievable. Is this
> correct?
>
> We usually don't need to temporarily change broker configs and I see this
> config as one that can be temporarily changed. So keeping track of what the
> value was before the change is rather important.
>
> Aside from this point, my question is: What's the impact of `medium`
> setting for `level`? I couldn't find it described in the KIP.
>
> Thanks!
> --Vahid
>
>
>
> On Mon, Nov 13, 2023 at 5:34 AM Divij Vaidya 
> wrote:
>
> > Thank you for updating the KIP Qichao.
> >
> > I don't have any more questions or suggestions. Looks good to move
> forward
> > from my perspective.
> >
> >
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Fri, Nov 10, 2023 at 2:25 PM Qichao Chu 
> > wrote:
> >
> > > Thank you again for the nice suggestions, Jorge!
> > > I will wait for Divij's response and move it to the vote stage once the
> > > generic filter part reached concensus.
> > >
> > > Qichao Chu
> > > Software Engineer | Data - Kafka
> > > [image: Uber] <https://uber.com/>
> > >
> > >
> > > On Fri, Nov 10, 2023 at 6:49 AM Jorge Esteban Quilcate Otoya <
> > > quilcate.jo...@gmail.com> wrote:
> > >
> > > > Hi Qichao,
> > > >
> > > > Thanks for updating the KIP, all updates look good to me.
> > > >
> > > > Looking forward to see this KIP moving forward!
> > > >
> > > > Cheers,
> > > > Jorge.
> > > >
> > > >
> > > >
> > > > On Wed, 8 Nov 2023 at 08:55, Qichao Chu 
> > wrote:
> > > >
> > > > > Hi Divij,
> > > > >
> > > > > Thank you for the feedback. I updated the KIP to make it a little
> bit
> > > > more
> > > > > generic: filters will stay in an array instead of different
> top-level
> > > > > objects. In this way, if we need language filters in the future.
> The
> > > > logic
> > > > > relationship of filters is also added.
> > > > >
> > > > > Hi Jorge,
> > > > >
> > > > > Thank you for the review and great comments. Here is the reply for
> > each
> > > > of
> > > > > the suggestions:
> > > > >
> > &

Re: [VOTE] KIP-997: Partition-Level Throughput Metrics

2023-11-21 Thread Qichao Chu
Hi All,

It would be nice if we could have more people to review and vote for this
KIP.
Many thanks!

Qichao


On Mon, Nov 20, 2023 at 2:43 PM Qichao Chu  wrote:

> @Matthias: yeah it should be 977, sorry for the confusion.
> Btw, do you want to cast another binding vote for it?
>
> Best,
> Qichao Chu
>
>
> On Fri, Nov 17, 2023 at 12:45 AM Matthias J. Sax  wrote:
>
>> This is KIP-977, right? Not as the subject says.
>>
>> Guess we won't be able to fix this now. Hope it does not cause confusion
>> down the line...
>>
>>
>> -Matthias
>>
>> On 11/16/23 4:43 AM, Kamal Chandraprakash wrote:
>> > +1 (non-binding). Thanks for the KIP!
>> >
>> > On Thu, Nov 16, 2023 at 9:00 AM Satish Duggana <
>> satish.dugg...@gmail.com>
>> > wrote:
>> >
>> >> Thanks Qichao for the KIP.
>> >>
>> >> +1 (binding)
>> >>
>> >> ~Satish.
>> >>
>> >> On Thu, 16 Nov 2023 at 02:20, Jorge Esteban Quilcate Otoya
>> >>  wrote:
>> >>>
>> >>> Qichao, thanks again for leading this proposal!
>> >>>
>> >>> +1 (non-binding)
>> >>>
>> >>> Cheers,
>> >>> Jorge.
>> >>>
>> >>> On Wed, 15 Nov 2023 at 19:17, Divij Vaidya 
>> >> wrote:
>> >>>
>> >>>> +1 (binding)
>> >>>>
>> >>>> I was involved in the discussion thread for this KIP and support it
>> in
>> >> its
>> >>>> current form.
>> >>>>
>> >>>> --
>> >>>> Divij Vaidya
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, Nov 15, 2023 at 10:55 AM Qichao Chu > >
>> >>>> wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> I'd like to call a vote for KIP-977: Partition-Level Throughput
>> >> Metrics.
>> >>>>>
>> >>>>> Please take a look here:
>> >>>>>
>> >>>>>
>> >>>>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
>> >>>>>
>> >>>>> Best,
>> >>>>> Qichao Chu
>> >>>>>
>> >>>>
>> >>
>> >
>>
>


Re: [VOTE] KIP-997: Partition-Level Throughput Metrics

2023-11-19 Thread Qichao Chu
@Matthias: yeah it should be 977, sorry for the confusion.
Btw, do you want to cast another binding vote for it?

Best,
Qichao Chu


On Fri, Nov 17, 2023 at 12:45 AM Matthias J. Sax  wrote:

> This is KIP-977, right? Not as the subject says.
>
> Guess we won't be able to fix this now. Hope it does not cause confusion
> down the line...
>
>
> -Matthias
>
> On 11/16/23 4:43 AM, Kamal Chandraprakash wrote:
> > +1 (non-binding). Thanks for the KIP!
> >
> > On Thu, Nov 16, 2023 at 9:00 AM Satish Duggana  >
> > wrote:
> >
> >> Thanks Qichao for the KIP.
> >>
> >> +1 (binding)
> >>
> >> ~Satish.
> >>
> >> On Thu, 16 Nov 2023 at 02:20, Jorge Esteban Quilcate Otoya
> >>  wrote:
> >>>
> >>> Qichao, thanks again for leading this proposal!
> >>>
> >>> +1 (non-binding)
> >>>
> >>> Cheers,
> >>> Jorge.
> >>>
> >>> On Wed, 15 Nov 2023 at 19:17, Divij Vaidya 
> >> wrote:
> >>>
> >>>> +1 (binding)
> >>>>
> >>>> I was involved in the discussion thread for this KIP and support it in
> >> its
> >>>> current form.
> >>>>
> >>>> --
> >>>> Divij Vaidya
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Nov 15, 2023 at 10:55 AM Qichao Chu 
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I'd like to call a vote for KIP-977: Partition-Level Throughput
> >> Metrics.
> >>>>>
> >>>>> Please take a look here:
> >>>>>
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
> >>>>>
> >>>>> Best,
> >>>>> Qichao Chu
> >>>>>
> >>>>
> >>
> >
>


[VOTE] KIP-997: Partition-Level Throughput Metrics

2023-11-15 Thread Qichao Chu
Hi all,

I'd like to call a vote for KIP-977: Partition-Level Throughput Metrics.

Please take a look here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics

Best,
Qichao Chu


Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-10 Thread Qichao Chu
Thank you again for the nice suggestions, Jorge!
I will wait for Divij's response and move it to the vote stage once the
generic filter part reached concensus.

Qichao Chu
Software Engineer | Data - Kafka
[image: Uber] <https://uber.com/>


On Fri, Nov 10, 2023 at 6:49 AM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi Qichao,
>
> Thanks for updating the KIP, all updates look good to me.
>
> Looking forward to see this KIP moving forward!
>
> Cheers,
> Jorge.
>
>
>
> On Wed, 8 Nov 2023 at 08:55, Qichao Chu  wrote:
>
> > Hi Divij,
> >
> > Thank you for the feedback. I updated the KIP to make it a little bit
> more
> > generic: filters will stay in an array instead of different top-level
> > objects. In this way, if we need language filters in the future. The
> logic
> > relationship of filters is also added.
> >
> > Hi Jorge,
> >
> > Thank you for the review and great comments. Here is the reply for each
> of
> > the suggestions:
> >
> > 1) The words describing the property are now updated to include more
> > details of the keys in the JSON. It also explicitly mentions the JSON
> > nature of the config now.
> > 2) The JSON entries should be non-conflict so the order is not relevant.
> If
> > there's conflict, the conflict resolution rules are stated in the KIP. To
> > make it more clear, ordering and duplication rules are updated in the
> > Restrictions section of the *level* property.
> > 3) Yeah we did take a look at the RecordingLevel config and it does not
> > work for this case. The RecodingLevel config does not offer the
> capability
> > of filtering and it has a drawback of needing to be added to all the
> future
> > sensors. To reduce the duplication, I propose we merge the RecordingLevel
> > to this more generic config in the future. Please take a look into the
> > *Using
> > the Existing RecordingLevel Config* section under *Rejected Alternatives*
> > for more details.
> > 4) This suggestion makes a lot of sense. My idea is to create a
> > table/form/doc in the documentation for the verbosity levels of all
> metric
> > series. If it's too verbose to be in the docs, I will update the KIP to
> > include this info. I will create a JIRA for this effort once the KIP is
> > approved.
> > 5) Sure we can expand to all other series, added to the KIP.
> > 6) Added a new section(*Working with the Configuration via CLI)* with the
> > user experience details
> > 7) Links are updated.
> >
> > Please take another look and let me know if you have any more concerns.
> >
> > Best,
> > Qichao Chu
> > Software Engineer | Data - Kafka
> > [image: Uber] <https://uber.com/>
> >
> >
> > On Wed, Nov 8, 2023 at 6:29 AM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Hi Qichao,
> > >
> > > Thanks for the KIP! This will be a valuable contribution and improve
> the
> > > tooling for troubleshooting.
> > >
> > > I have a couple of comments:
> > >
> > > 1. It's unclear from the `metrics.verbosity` description what the
> > supported
> > > values are. In the description mentions "If the value is high ... In
> the
> > > low settings" but I think it's referring to the `level` property
> > > specifically instead of the whole value that is now JSON. Could you
> > clarify
> > > this?
> > >
> > > 2. Could we state in which order the JSON entries are going to be
> > > evaluated? I guess the last entry wins if it overlaps previous values,
> > but
> > > better to make this explicit.
> > >
> > > 3. Kafka metrics library has a `RecordingLevel` configuration -- have
> we
> > > considered aligning these concepts and maybe reuse it instead of
> > > `verbosityLevel`? Then we can reuse the levels: INFO, DEBUG, TRACE.
> > >
> > > 4. Not sure if within the scope of the KIP, but would be helpful to
> > > document the metrics with the verbosity level attached to the metrics.
> > > Maybe creating a JIRA ticket to track this would be enough if we can't
> > > cover it as part of the KIP.
> > >
> > > 5. Could we consider the following client-related metrics as well:
> > >   - BytesRejectedPerSec
> > >   - TotalProduceRequestsPerSec
> > >   - TotalFetchRequestsPerSec
> > >   - FailedProduceRequestsPerSec
> > >   - FailedFetchRequestsPerSec
> > >   - FetchMessageConversionsPerSec
> > >  

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-07 Thread Qichao Chu
Hi Divij,

Thank you for the feedback. I updated the KIP to make it a little bit more
generic: filters will stay in an array instead of different top-level
objects. In this way, if we need language filters in the future. The logic
relationship of filters is also added.

Hi Jorge,

Thank you for the review and great comments. Here is the reply for each of
the suggestions:

1) The words describing the property are now updated to include more
details of the keys in the JSON. It also explicitly mentions the JSON
nature of the config now.
2) The JSON entries should be non-conflict so the order is not relevant. If
there's conflict, the conflict resolution rules are stated in the KIP. To
make it more clear, ordering and duplication rules are updated in the
Restrictions section of the *level* property.
3) Yeah we did take a look at the RecordingLevel config and it does not
work for this case. The RecodingLevel config does not offer the capability
of filtering and it has a drawback of needing to be added to all the future
sensors. To reduce the duplication, I propose we merge the RecordingLevel
to this more generic config in the future. Please take a look into the *Using
the Existing RecordingLevel Config* section under *Rejected Alternatives*
for more details.
4) This suggestion makes a lot of sense. My idea is to create a
table/form/doc in the documentation for the verbosity levels of all metric
series. If it's too verbose to be in the docs, I will update the KIP to
include this info. I will create a JIRA for this effort once the KIP is
approved.
5) Sure we can expand to all other series, added to the KIP.
6) Added a new section(*Working with the Configuration via CLI)* with the
user experience details
7) Links are updated.

Please take another look and let me know if you have any more concerns.

Best,
Qichao Chu
Software Engineer | Data - Kafka
[image: Uber] <https://uber.com/>


On Wed, Nov 8, 2023 at 6:29 AM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi Qichao,
>
> Thanks for the KIP! This will be a valuable contribution and improve the
> tooling for troubleshooting.
>
> I have a couple of comments:
>
> 1. It's unclear from the `metrics.verbosity` description what the supported
> values are. In the description mentions "If the value is high ... In the
> low settings" but I think it's referring to the `level` property
> specifically instead of the whole value that is now JSON. Could you clarify
> this?
>
> 2. Could we state in which order the JSON entries are going to be
> evaluated? I guess the last entry wins if it overlaps previous values, but
> better to make this explicit.
>
> 3. Kafka metrics library has a `RecordingLevel` configuration -- have we
> considered aligning these concepts and maybe reuse it instead of
> `verbosityLevel`? Then we can reuse the levels: INFO, DEBUG, TRACE.
>
> 4. Not sure if within the scope of the KIP, but would be helpful to
> document the metrics with the verbosity level attached to the metrics.
> Maybe creating a JIRA ticket to track this would be enough if we can't
> cover it as part of the KIP.
>
> 5. Could we consider the following client-related metrics as well:
>   - BytesRejectedPerSec
>   - TotalProduceRequestsPerSec
>   - TotalFetchRequestsPerSec
>   - FailedProduceRequestsPerSec
>   - FailedFetchRequestsPerSec
>   - FetchMessageConversionsPerSec
>   - ProduceMessageConversionsPerSec
> Would be great to have these from day 1 instead of requiring a following
> KIP to extend this. Could be implemented in separate PRs if needed.
>
> 6. To make it clearer how the user experience would be, could we provide an
> example of:
> - how the broker configuration will be provided by default, and
> - how the CLI tooling would be used to change the configuration?
> - Maybe a couple of scenarios: adding a new metric config, a second one
> with overlapping values, and
> - describing the expected metrics to be mapped
>
> A couple of nits:
> - The first link "MessagesInPerSec metrics" is pointing to
> https://kafka.apache.org/documentation/#uses_metrics -- is this the
> correct
> reference? It doesn't seem too relevant.
> - Also, the link to ReplicaManager points to a line that has change
> already; better to have a permalink to a specific commit: e.g.
>
> https://github.com/apache/kafka/blob/edc7e10a745c350ad1efa9e4866370dc8ea0e034/core/src/main/scala/kafka/server/ReplicaManager.scala#L1218
>
> Cheers,
> Jorge.
>
> On Tue, 7 Nov 2023 at 17:06, Qichao Chu  wrote:
>
> > Hi Divij,
> >
> > It would be very nice if you could take a look at the recent changes,
> thank
> > you!
> > If there's no more required changes, shall we move to vote stage?
> >
> > Best,
> > Qichao Chu
> > Software Engineer

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-07 Thread Qichao Chu
Hi Divij,

It would be very nice if you could take a look at the recent changes, thank
you!
If there's no more required changes, shall we move to vote stage?

Best,
Qichao Chu
Software Engineer | Data - Kafka
[image: Uber] <https://uber.com/>


On Thu, Nov 2, 2023 at 12:06 AM Qichao Chu  wrote:

> Hi Divij,
>
> Thank you for the very quick response and the nice suggestions. I have
> updated the KIP with the following thoughts.
>
> 1. I checked the Java documentation and it seems the regex engine in utils
> <https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html>
>  is
> not 100% compatible with PCRE, though it is very close. I stated
> the Java implementation as the requirement since we are most likely to
> target a JVM language.
> 2. Agreed with the filter limitation. For now, let's keep it topic only.
> With that in mind, I feel we do have cases where a user wants to list many
> topics. Although regex is also possible, an array will make things faster.
> This makes me add two options for the topic filter.
> 3. It seems not many configs are using JSON, this was the intention for me
> to use a compound string. However since JSON is used widely in the project,
> and given the benefits you mentioned earlier, I tried to make the config a
> JSON array. The change is to make it compatible with multi-level settings.
>
> Let me know what you think. Many thanks!
>
> Best,
> Qichao Chu
> Software Engineer | Data - Kafka
> [image: Uber] <https://uber.com/>
>
>
> On Wed, Nov 1, 2023 at 9:43 PM Divij Vaidya 
> wrote:
>
>> Thank you for making the changes Qichao.
>>
>> We are now entering in the territory of defining a declarative schema for
>> filters. In the new input format, the type is string but we are imposing a
>> schema for the string and we should clearly call out the schema. You can
>> perhaps choose to adopt a schema such as below:
>>
>> metricLevel = High | Low (default: Low)
>> metricNameRegEx = regEx (default: .*)
>> nameOfDimension = string
>> dimensionRegEx = regEx
>> dimensionFilter = [=] (default: [])
>>
>> Final Value schema = "level"=$metricLevel, "name"=$metricNameRegEx,
>> $dimensionFilter
>>
>> Further we need to answer questions such as :
>> 1. which regEx format do we support (it should probably be Perl-compatible
>> regular expressions (PCRE) because Java's regEx is compatible with it)
>> 2. should we restrict the dimensionFilter to at max length 1 and value
>> "topic" only for now. Later when we want to expand, we can expand filters
>> for other dimensions as well such as partitions.
>> 3. if we are coming up with our stringified-schema, why not use json? It
>> would save us from building a parsing utility for the schema. (I like it
>> in
>> its current format but there is a case to be made for json as well)
>> 4. what happens when there are contradictory regEx rules, e.g. a topic
>> defined in high as well as low. It is generally solved by defining
>> precedence. In our case, we can choose that high has more precedence than
>> low.
>>
>> What do you think?
>>
>> --
>> Divij Vaidya
>>
>>
>>
>> On Wed, Nov 1, 2023 at 2:07 PM Qichao Chu 
>> wrote:
>>
>> > Hi Divij,
>> >
>> > Thank you for the review and the great suggestions, again. I have
>> updated
>> > the corresponding content, can you take another look?
>> > Regarding the KIP-544 style regex, I have added it to the new property
>> too.
>> > It's expanded to include multiple sections for better future extension.
>> >
>> > Best,
>> > Qichao Chu
>> > Software Engineer | Data - Kafka
>> > [image: Uber] <https://uber.com/>
>> >
>> >
>> > On Mon, Oct 30, 2023 at 6:26 PM Divij Vaidya 
>> > wrote:
>> >
>> > > Hey *Qichao*
>> > >
>> > > Thank you for the update on the KIP. I like the idea of incremental
>> > > delivery and adding which metrics support this verbosity as a later
>> KIP.
>> > > But I also want to ensure that we wouldn't have to change the current
>> > > config when adding that in future. Hence, we need some discussion on
>> it
>> > in
>> > > the scope of the KIP.
>> > >
>> > > About the dynamic configuration:
>> > > Do we need to add the "default" mode? I am asking because it may
>> inhibit
>> > us
>> > > from adding the allowList option in future. Instead if we could
>> reph

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-01 Thread Qichao Chu
Hi Divij,

Thank you for the very quick response and the nice suggestions. I have
updated the KIP with the following thoughts.

1. I checked the Java documentation and it seems the regex engine in utils
<https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html>
is
not 100% compatible with PCRE, though it is very close. I stated
the Java implementation as the requirement since we are most likely to
target a JVM language.
2. Agreed with the filter limitation. For now, let's keep it topic only.
With that in mind, I feel we do have cases where a user wants to list many
topics. Although regex is also possible, an array will make things faster.
This makes me add two options for the topic filter.
3. It seems not many configs are using JSON, this was the intention for me
to use a compound string. However since JSON is used widely in the project,
and given the benefits you mentioned earlier, I tried to make the config a
JSON array. The change is to make it compatible with multi-level settings.

Let me know what you think. Many thanks!

Best,
Qichao Chu
Software Engineer | Data - Kafka
[image: Uber] <https://uber.com/>


On Wed, Nov 1, 2023 at 9:43 PM Divij Vaidya  wrote:

> Thank you for making the changes Qichao.
>
> We are now entering in the territory of defining a declarative schema for
> filters. In the new input format, the type is string but we are imposing a
> schema for the string and we should clearly call out the schema. You can
> perhaps choose to adopt a schema such as below:
>
> metricLevel = High | Low (default: Low)
> metricNameRegEx = regEx (default: .*)
> nameOfDimension = string
> dimensionRegEx = regEx
> dimensionFilter = [=] (default: [])
>
> Final Value schema = "level"=$metricLevel, "name"=$metricNameRegEx,
> $dimensionFilter
>
> Further we need to answer questions such as :
> 1. which regEx format do we support (it should probably be Perl-compatible
> regular expressions (PCRE) because Java's regEx is compatible with it)
> 2. should we restrict the dimensionFilter to at max length 1 and value
> "topic" only for now. Later when we want to expand, we can expand filters
> for other dimensions as well such as partitions.
> 3. if we are coming up with our stringified-schema, why not use json? It
> would save us from building a parsing utility for the schema. (I like it in
> its current format but there is a case to be made for json as well)
> 4. what happens when there are contradictory regEx rules, e.g. a topic
> defined in high as well as low. It is generally solved by defining
> precedence. In our case, we can choose that high has more precedence than
> low.
>
> What do you think?
>
> --
> Divij Vaidya
>
>
>
> On Wed, Nov 1, 2023 at 2:07 PM Qichao Chu  wrote:
>
> > Hi Divij,
> >
> > Thank you for the review and the great suggestions, again. I have updated
> > the corresponding content, can you take another look?
> > Regarding the KIP-544 style regex, I have added it to the new property
> too.
> > It's expanded to include multiple sections for better future extension.
> >
> > Best,
> > Qichao Chu
> > Software Engineer | Data - Kafka
> > [image: Uber] <https://uber.com/>
> >
> >
> > On Mon, Oct 30, 2023 at 6:26 PM Divij Vaidya 
> > wrote:
> >
> > > Hey *Qichao*
> > >
> > > Thank you for the update on the KIP. I like the idea of incremental
> > > delivery and adding which metrics support this verbosity as a later
> KIP.
> > > But I also want to ensure that we wouldn't have to change the current
> > > config when adding that in future. Hence, we need some discussion on it
> > in
> > > the scope of the KIP.
> > >
> > > About the dynamic configuration:
> > > Do we need to add the "default" mode? I am asking because it may
> inhibit
> > us
> > > from adding the allowList option in future. Instead if we could
> rephrase
> > > the config as: "metric.verbosity.high" which takes values as a regEx
> > > (default will be empty), then we wouldn't have to worry about
> > > future-proofness of this KIP. Notably this is an existing pattern used
> by
> > > KIP-544.
> > > Alternatively, if you choose to stick to the current configuration
> > pattern,
> > > please provide information on how this config will look like when we
> add
> > > allow listing in future.
> > >
> > > About the perf test:
> > > Motivation - The motivation of perf test is to provide users with a
> hint
> > on
> > > what perf penalty they can expect and whether default has degraded perf
> &

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-01 Thread Qichao Chu
Hi Divij,

Thank you for the review and the great suggestions, again. I have updated
the corresponding content, can you take another look?
Regarding the KIP-544 style regex, I have added it to the new property too.
It's expanded to include multiple sections for better future extension.

Best,
Qichao Chu
Software Engineer | Data - Kafka
[image: Uber] <https://uber.com/>


On Mon, Oct 30, 2023 at 6:26 PM Divij Vaidya 
wrote:

> Hey *Qichao*
>
> Thank you for the update on the KIP. I like the idea of incremental
> delivery and adding which metrics support this verbosity as a later KIP.
> But I also want to ensure that we wouldn't have to change the current
> config when adding that in future. Hence, we need some discussion on it in
> the scope of the KIP.
>
> About the dynamic configuration:
> Do we need to add the "default" mode? I am asking because it may inhibit us
> from adding the allowList option in future. Instead if we could rephrase
> the config as: "metric.verbosity.high" which takes values as a regEx
> (default will be empty), then we wouldn't have to worry about
> future-proofness of this KIP. Notably this is an existing pattern used by
> KIP-544.
> Alternatively, if you choose to stick to the current configuration pattern,
> please provide information on how this config will look like when we add
> allow listing in future.
>
> About the perf test:
> Motivation - The motivation of perf test is to provide users with a hint on
> what perf penalty they can expect and whether default has degraded perf
> (due to additional "empty" labels).
> Dimensions of the test could be - scrape interval, utilization of broker
> (no traffic vs. heavy traffic), number of partitions (small/200 to
> large/2k).
> Things to collect during perf test - number of mbeans registered with JMX,
> CPU, heap utilization
> Expected results - As long as we can prove that there is no additional
> usage (significant) of CPU or heap after this change for the "default
> mode", we should be good. For the "high" mode, we should document the
> expected increase for users but it is not a blocker to implement this KIP.
>
>
> *Kirk*, I have tried to clarify the expectation on performance, does that
> address your question earlier? Also, I am happy with having a Kafka level
> dynamic config that we can use to filter our metric/dimensionality since we
> have a precedence at KIP-544. Hence, my suggestion to push this filtering
> to metric library can be ignored.
>
> --
> Divij Vaidya
>
>
>
> On Sat, Oct 28, 2023 at 11:37 AM Qichao Chu 
> wrote:
>
> > Hello Everyone,
> >
> > Can I ask for some feedback regarding KIP-977
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
> > >
> > ?
> >
> > Best,
> > Qichao Chu
> > Software Engineer | Data - Kafka
> > [image: Uber] <https://uber.com/>
> >
> >
> > On Mon, Oct 16, 2023 at 7:34 PM Qichao Chu  wrote:
> >
> > > Hi Divij and Kirk,
> > >
> > > Thank you both for providing the valuable feedback and sorry for the
> > > delay. I have just updated the KIP to address the comments.
> > >
> > >1. Instead of using a topic-level control, global verbosity control
> > >makes more sense if we want to extend it in the future. It would be
> > very
> > >difficult if we want to apply the topic allowlist everywhere
> > >2. Also, the topic allowlist was not dynamic which makes everything
> > >quite complex, especially for the topic lifecycle management. By
> > using the
> > >dynamic global config, debugging could be easier, and management of
> > the
> > >config is also made easier.
> > >3. More details are included in the test section.
> > >
> > > One thing that still misses is the performance numbers. I will get it
> > > ready with our internal clusters and share out soon.
> > >
> > > Many thanks for the review!
> > > Qichao
> > >
> > > On Tue, Sep 12, 2023 at 8:31 AM Kirk True  wrote:
> > >
> > >> Oh, and does metrics.partition.level.reporting.topics allow for regex?
> > >>
> > >> > On Sep 12, 2023, at 8:26 AM, Kirk True  wrote:
> > >> >
> > >> > Hi Qichao,
> > >> >
> > >> > Thanks for the KIP!
> > >> >
> > >> > Divij—questions/comments inline...
> > >> >
> > >> >> On Sep 11, 2023, at 4:32 AM, Divij Vaidya  >
> > >> wro

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-10-28 Thread Qichao Chu
Hello Everyone,

Can I ask for some feedback regarding KIP-977
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics>
?

Best,
Qichao Chu
Software Engineer | Data - Kafka
[image: Uber] <https://uber.com/>


On Mon, Oct 16, 2023 at 7:34 PM Qichao Chu  wrote:

> Hi Divij and Kirk,
>
> Thank you both for providing the valuable feedback and sorry for the
> delay. I have just updated the KIP to address the comments.
>
>1. Instead of using a topic-level control, global verbosity control
>makes more sense if we want to extend it in the future. It would be very
>difficult if we want to apply the topic allowlist everywhere
>2. Also, the topic allowlist was not dynamic which makes everything
>quite complex, especially for the topic lifecycle management. By using the
>dynamic global config, debugging could be easier, and management of the
>config is also made easier.
>3. More details are included in the test section.
>
> One thing that still misses is the performance numbers. I will get it
> ready with our internal clusters and share out soon.
>
> Many thanks for the review!
> Qichao
>
> On Tue, Sep 12, 2023 at 8:31 AM Kirk True  wrote:
>
>> Oh, and does metrics.partition.level.reporting.topics allow for regex?
>>
>> > On Sep 12, 2023, at 8:26 AM, Kirk True  wrote:
>> >
>> > Hi Qichao,
>> >
>> > Thanks for the KIP!
>> >
>> > Divij—questions/comments inline...
>> >
>> >> On Sep 11, 2023, at 4:32 AM, Divij Vaidya 
>> wrote:
>> >>
>> >> Thank you for the proposal Qichao.
>> >>
>> >> I agree with the motivation here and understand the tradeoff here
>> >> between observability vs. increased metric dimensions (metric fan-out
>> >> as you say in the KIP).
>> >>
>> >> High level comments:
>> >>
>> >> 1. I would urge you to consider the extensibility of the proposal for
>> >> other types of metrics. Tomorrow, if we want to selectively add
>> >> "partition" dimension to another metric, would we have to modify the
>> >> code where each metric is emitted? Alternatively, could we abstract
>> >> out this config in a "Kafka Metrics" library. The code provides all
>> >> information about this library and this library can choose which
>> >> dimensions it wants to add to the final metrics that are emitted based
>> >> on declarative configuration.
>> >
>> > I’d agree with this if it doesn’t place a burden on the callers. Are
>> there any potential call sites that don’t have the partition information
>> readily available?
>> >
>> >> 2. Can we offload the handling of this dimension filtering to the
>> >> metric framework? Have you explored whether prometheus or other
>> >> libraries provide the ability to dynamically change dimensions
>> >> associated with metrics?
>> >
>> > I’m not familiar with the downstream metrics providers’ capabilities.
>> This is a greatest common denominator scenario, right? We’d have to be
>> reasonable sure that the heavily used providers *all* support such dynamic
>> filtering.
>> >
>> > Also—and correct me as needed as I’m not familiar with the area—if we
>> relegate partition filtering to a lower layer, we’d still need to store the
>> metric data in memory until it’s flushed, yes? If so, is that overhead of
>> any concern?
>> >
>> >> Implementation level comments:
>> >>
>> >> 1. In the test plan section, please mention what kind of integ and/or
>> >> unit tests will be added and what they will assert. As an example, you
>> >> can add a section, "functionality tests", which would assert that new
>> >> metric config is being respected and another section, "performance
>> >> tests", which could be a system test and assert that no regression
>> >> caused wrt resources occupied by metrics from one version to another.
>> >> 2. Please mention why or why not are we considering dynamically
>> >> setting the configuration (i.e. without a broker restart)? I would
>> >> imagine that the ability to dynamically configure for a specific topic
>> >> will be very useful especially to debug production situations that you
>> >> mention in the motivation.
>> >
>> > +1
>> >
>> >> 3. You mention that we want to start with metrics closely related to
>> >> producer &am

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-10-16 Thread Qichao Chu
;> config is read-only). We will be in a situation where some brokers are
> >> sending metrics with a new "partition" dimension and some brokers are
> >> sending metrics with no partition dimension. Is that acceptable to JMX
> >> / prometheus collectors? Would it break them? Please clarify how
> >> upgrades will work in the compatibility section.
> >> 5. Could you please quantify (with an experiment) the expected perf
> >> impact of adding the partition dimension? This could be done as part
> >> of "test plan" section and would serve as a data point for users to
> >> understand the potential impact if they decide to turn it on.
> >
> > Is there some guidance on the level of precision and detail expected
> when providing the performance numbers in the KIP?
> >
> > This notion of proving out the performance impact is important, I agree.
> Anecdotally, there was another KIP I was following for which performance
> numbers were requested, as is reasonable. But that caused the KIP to go a
> bit sideways as a result because it wasn’t able to get consensus on a) the
> different scenarios to test, and b) the quantitative goal for each. I’m not
> really sure the rigo(u)r that’s expected at this stage in the development
> of a new feature.
> >
> > Thanks,
> > Kirk
> >
> >>
> >> --
> >> Divij Vaidya
> >>
> >>
> >> On Sat, Sep 9, 2023 at 8:18 PM Qichao Chu 
> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> Although this has been discussed many times, I would like to start a
> new
> >>> discussion regarding the introduction of partition-level throughput
> >>> metrics. Please review the KIP and I'm eager to know everyone's
> thoughts:
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
> >>>
> >>> TL;DR: The KIP proposes to add partition-level throughput metrics and
> a new
> >>> configuration to control the fan-out rate.
> >>>
> >>> Thank you all for the review and have a nice weekend!
> >>>
> >>> Best,
> >>> Qichao
>
>


[DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-09-09 Thread Qichao Chu
Hi All,

Although this has been discussed many times, I would like to start a new
discussion regarding the introduction of partition-level throughput
metrics. Please review the KIP and I'm eager to know everyone's thoughts:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics

TL;DR: The KIP proposes to add partition-level throughput metrics and a new
configuration to control the fan-out rate.

Thank you all for the review and have a nice weekend!

Best,
Qichao


[jira] [Created] (KAFKA-15447) Partition Level Throughput Metrics

2023-09-09 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-15447:
--

 Summary: Partition Level Throughput Metrics
 Key: KAFKA-15447
 URL: https://issues.apache.org/jira/browse/KAFKA-15447
 Project: Kafka
  Issue Type: Improvement
  Components: metrics
Reporter: Qichao Chu
Assignee: Qichao Chu


Please find the details at 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-977+Partition-Level+Throughput+Metrics]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Feedback of using the 'confluent-kafka-go' for reliablity and performance

2023-08-15 Thread Qichao Chu
Hi Kenneth,

Recently we noticed that there's an issue in the Sarama client's broker
address management.
Although the Sarama client fetches the latest broker information and
stores it in the brokers
<https://github.com/IBM/sarama/blob/main/client.go#L155C6-L155C6> list, the
seedbrokers
<https://github.com/IBM/sarama/blob/main/client.go#L151C5-L151C5> list is
never refreshed and it's always the first broker being used
<https://github.com/IBM/sarama/blob/main/client.go#L765C11-L765C11>. This
has caused two problems for us:

   1. The broker hostname might be re-used and the same hostname may be
   pointing to a broker in another cluster. This will make consumers read from
   the low watermark for a topic with the same name in the wrong cluster.
   2. The client may completely fail the topic discovery and be stuck in
   group rebalance and unable to consume messages.

We are wondering if similar problems are spotted in your usage.

Best,
Qichao Chu
Software Engineer | Data - Kafka
[image: Uber] <https://uber.com/>


On Mon, Jul 31, 2023 at 10:57 AM Kenneth Eversole
 wrote:

> Hello,
>
> Xinli,
>
> I can only speak for Cloudflare, but we use various versions of the Sarama
> Go Client for various production workloads and they seem to function quite
> well.
>
> I am curious what your reasoning for wanting to replace Sarama?
>
> Kenneth Eversole
>
> On Fri, Jul 28, 2023 at 1:58 PM Xinli shang 
> wrote:
>
> > Hi all,
> >
> > We are considering replacing the Sarama Go client with confluent-kafka-go
> > <https://github.com/confluentinc/confluent-kafka-go> which is supported
> > and
> > seems promising. Regarding reliability and performance, does anybody want
> > to share the experience? Whether or not it is the experience in
> production
> > or testing, they are all appreciated.
> >
> > --
> > Xinli Shang
> >
>


Re: Request permission to contribute

2023-07-30 Thread Qichao Chu
Hi Bruno,

Is it ok for you to set me up for contribution too? I would like to create
a KIP.
My account ID in Confluence and JIRA is ex172000 and my email is
ex172...@gmail.com.

Thank you in advance!

Best,
Qichao Chu
Software Engineer | Data - Kafka
[image: Uber] <https://uber.com/>


On Wed, Jul 26, 2023 at 12:41 AM Bruno Cadonna  wrote:

> Hi Taras,
>
> I set you up for the Apache Kafka wiki. Let me know if you still miss
> permissions.
>
> Thank you for your interest in Apache Kafka!
>
> Best,
> Bruno
>
> On 7/25/23 9:49 AM, Taras Ledkov wrote:
> > Hi Guozhang,
> >
> > Thanks for your attention.
> >
> > I'm a contributor of other Apache project (Ignite).
> > Now I don't have permissions to create a page in `Apache Kafka` space on
> wiki (confluence) [1]
> >
> > [1]
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>