Thank you, I will make it an issue.

Luke Chen <show...@gmail.com> 于2022年3月26日周六 16:03写道:

> Hi Feiyan,
>
> It did look like a bug. Could you open a bug in JIRA here
> <
> https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13735?filter=allopenissues
> >
> ?
> One thing I don't fully understand from the unit test is that, you have
> output "error distributions: $tpsWithoutResize".
> I thought the bug is about the unevenly partition distribution after
> resizing fetch thread, what do you mean by "error distribution"?
> Could you please also elaborate in JIRA ticket?
>
> Also, welcome to submit a PR to it, since you've already had fix and test
> ready! :)
>
> Thank you.
> Luke
>
>
>
> On Sat, Mar 26, 2022 at 2:56 PM Feiyan Yu <dbtr...@gmail.com> wrote:
>
> > Howdy!
> >
> > I found that the method of resizing fetching thread had one potential bug
> > which lead to imbalanced partitions distribution for each thread after
> > changing thread number on dynamic configuration.
> >
> > To figure it out, I added a primitive unit test method "testResize" in
> > "AbstractFetcherThreadTest" to simulate the replica fetchers resizing
> > process from 10 threads to 60 threads with originally 10 topics and 100
> > partitions each. I designed the test to show that after the resizing
> > process, all partitions should be redistributed correctly based on the
> new
> > thread number. However, the test failed because when I tried to compare
> the
> > "fetcherThreadMap" with the fetcherId for each topic-partition, the
> > fetcherId mismatched! The unit test I added is in this commit (
> >
> https://github.com/yufeiyan1220/kafka/commit/eb99b7499b416cdeb44c2ccd3ea55a1e38ea3d60
> ),
> > and the standard output of the unit test showed in attachment.
> >
> > I doubt that maybe it is because the method "addFetcherForPartitions"
> > which maybe adds some new fetchers to "fetcherThreadMap" called in the
> > block of iterating the "fetcherThreadMap", and the iterator ignore some
> > fetchers, which leads to some of the fetchers remain their topic
> > partitions. And it leads to the imbalanced partition distribution pattern
> > in resizing.
> >
> > To solve this issue I make a mutable map to store all partitions and its
> > fetch offset, and then add it back once out of the iteration. I make
> > another commit (
> >
> https://github.com/yufeiyan1220/kafka/commit/0a793dfca2ab9b8e8b45ba5693359960f3e306eb
> ),
> > and the new resize method passed the unit test.
> >
> > I'm not sure whether it is an issue that Kafka Community need to fix. But
> > for me, it affects the fetching efficiency when I try to deploy Kafka
> cross
> > regions with high network latency.
> >
> > I'd really appreciate to hear from you!
> >
> >
>

Reply via email to