Hey Hudeqi,

I took some time to read through the PR link as well where you and Chris
had an informative discussion.

I think even over there and in this discussion thread, it seems to me that
the consensus is to reduce the scope of the KIP to reduce the default value
of segment.bytes config for offsets topic. This will prevent future workers
from having a lesser boot up time. IMO while this might not seem like a
high impact thing, the configs that we are talking about here are advanced
ones which new users for Connect might not immediately look into. So, if
they end up in a situation where there's a 23-min worker startup time, then
it might not be an overall good experience for them.

Regarding the point Greg mentioned, we will have to think about getting
around it. The approach you suggested seems unclean to me. Since you have
been testing with this config in your cluster and you already have a large
offsets topic, in your experience have you noticed any discrepancies of the
in-memory states across workers in your cluster? Would it be possible for
you to test that? That might be a good starting point to understand how we
want to fix this. Ideally we should have some kind of a Point of view(or
even a potential fix) on this before we go about implementing this change.
WDYT?

Thanks!
Sagar.

On Mon, Aug 14, 2023 at 6:09 PM hudeqi <16120...@bjtu.edu.cn> wrote:

> bump this discuss thread.
>
> best,
> hudeqi
>
> &quot;hudeqi&quot; &lt;16120...@bjtu.edu.cn&gt;写道:
> > Sorry for not getting email reminders and ignoring your reply for
> getting back so late, Yash Mayya, Greg Harris, Sagar.
> >
> > Thank you for your thoughts and suggestions, I learned a lot, I will
> give my thoughts and answers in a comprehensive way:
> > 1. The default configuration of 50MB is the online configuration I
> actually used to solve this problem, and the effect is better (see the
> description of jira:
> https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-15086?filter=allopenissues.
> In fact, I think it may be better to set this value smaller, so I abandoned
> the default value like __consumer_offsets, but I don't know how much the
> default value is the best.). Secondly, I also set the default value of 50MB
> online through ConfigDef#defineInternal, and if the value configured by the
> user is greater than the default value, the warning log will be displayed,
> but the only difference from your said is that I will overwrite the value
> configured by the user with the default value (emmm, this point was denied
> by Chris Egerton: https://github.com/apache/kafka/pull/13852, in fact,
> you all agree that should not directly override the user-configured value,
> and now I agree with this).
> > 2. I think the potential bug that Greg mentioned may lead to
> inconsistent state between workers is a great point. It is true that we
> cannot directly change the configuration for an existing internal topics.
> Perhaps a more tricky and disgusting approach is that we manually find that
> the active segment sizes of all current partitions are relatively small,
> first stop all connect instances, then change the topic configuration, and
> finally start the instances.
> >
> > To sum up, I think whether the scope of the KIP could be reduced to:
> only set the default value of the 'segment.bytes' of the internal topics
> and make a warning for the bigger value configured by the user. What do you
> think? If there's a better way I'm all ears.
> >
> > best,
> > hudeqi
>

Reply via email to