Thanks Sagar for writing this PR.

I think twice about the options that have been proposed in
https://issues.apache.org/jira/browse/KAFKA-13152, and feel that at the
moment it's simpler to just do the even distribution of the configured
total bytes. My rationale is that right now we have a static tasks ->
threads mapping, and hence each partition would only be fetched by a single
thread / consumer at a given time. If in the future we break that static
mapping into dynamic mapping, then we would not be able to do this even
distribution. Instead we would have other threads polling from consumer
only, and those threads would be responsible for checking the config and
pause non-empty partitions if it goes beyond the threshold. But since at
that time we would not change the config but just how it would be
implemented behind the scenes we would not need another KIP to change it.

Some more comments:

1. We need to discuss a bit about the default value of this new config.
Personally I think we need to be a bit conservative with large values so
that it would not have any perf regression compared with old configs
especially with large topology and large number of partitions.
2. I looked at the existing metrics, and do not have corresponding sensors.
How about also adding a task level metric indicating the current totally
aggregated metrics. The reason I do not suggest this metric on the
per-thread level is that in the future we may break the static mapping of
tasks -> threads.

[optional] As an orthogonal thought, I'm thinking maybe we can rename the
other "*cache.max.bytes.buffering*" as "statestore.cache.max.bytes" (via
deprecation of course), piggy-backed in this KIP? Would like to hear
others' thoughts.


Guozhang



On Sun, Aug 22, 2021 at 9:29 AM Sagar <sagarmeansoc...@gmail.com> wrote:

> Hi All,
>
> I would like to start a discussion on the following KIP:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
>
> Thanks!
> Sagar.
>


-- 
-- Guozhang

Reply via email to