Hi Heesung,

For 2.10, I would like to suggest fixing the issue instead of cherry-picking
the PR. The problem that https://github.com/apache/pulsar/pull/388 had
resolved will happen again if `loadBalancerDistributeBundlesEvenlyEnabled`
is disabled. We should try to remove the configuration in the future
because users are difficult to decide whether to enable or disable it. Both
of them have problems, just different issues.

> I think we also need to consider the namespace anit-affinity-group logic
too.

+1, it should be fixed to avoid an infinite bundle unloading loop.

Thanks,
Penghui

On Sat, Jul 8, 2023 at 4:07 AM Heesung Sohn
<heesung.s...@streamnative.io.invalid> wrote:

> Hi dev,
>
> I think we also need to consider the namespace anit-affinity-group logic
> too. These logics seem to do similar things.
>
> https://pulsar.apache.org/docs/3.0.x/administration-load-balance/#distribute-anti-affinity-namespaces-across-failure-domains
>
>
> PengHui
> We got three biding votes here. Do you think we should proceed to
> cherry-pick the PR to 2.10, then?
>
> Thanks,
> Heesung
>
>
>
>
>
> On Sun, Jul 2, 2023 at 5:22 PM PengHui Li <peng...@apache.org> wrote:
>
> > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to
> > solve the problem that when all bundles in a particular namespace
> > belong to 1 or few machines, customers owning that namespace will be
> > heavily impacted if that broker goes down. Of course, this PR caused
> > the infinite unloading issue and we need to fix it.
> >
> > Thanks for the context.
> > It looks like we can also try to fix the infinite unloading issue.
> > Now, the broker is unloading the bundles without checking the
> distribution
> > of the bundles under a namespace, but it will check when finding
> > a new owner. Is it possible to check the bundle distribution before
> > unloading the bundles to avoid infinite unloading?
> >
> > Regards,
> > Penghui
> >
> >
> > On Sun, Jul 2, 2023 at 3:28 PM Enrico Olivelli <eolive...@gmail.com>
> > wrote:
> >
> > > +1
> > >
> > > Enrico
> > >
> > > Il Dom 2 Lug 2023, 06:19 Hang Chen <chenh...@apache.org> ha scritto:
> > >
> > > > +1 for cherry-picking it to branch-2.10. We have a flag to control
> > > > whether to enable or disable it.
> > > >
> > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to
> > > > solve the problem that when all bundles in a particular namespace
> > > > belong to 1 or few machines, customers owning that namespace will be
> > > > heavily impacted if that broker goes down. Of course, this PR caused
> > > > the infinite unloading issue and we need to fix it.
> > > >
> > > > > I agree with making it false for the next major version release by
> > > > default.
> > > > We'd better remove the config in the next version instead of change
> > > > the default value to `false`, which will make Pulsar's configuration
> > > > keep increasing.
> > > >
> > > > Thanks,
> > > > Hang
> > > >
> > > > [1] https://github.com/apache/pulsar/pull/388
> > > >
> > > > PengHui Li <peng...@apache.org> 于2023年7月1日周六 09:38写道:
> > > > >
> > > > > +1 for cherry-pick to branch-2.10 since users don't have a
> workaround
> > > > > for this issue, and the change is well-understand, low risk.
> > > > >
> > > > > I agree with making it false for the next major version release by
> > > > default.
> > > > >
> > > > > Thanks,
> > > > > Penghui
> > > > >
> > > > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn
> > > > > <heesung.s...@streamnative.io.invalid> wrote:
> > > > >
> > > > > > Hi dev,
> > > > > >
> > > > > > I realized that `removeMostServicingBrokersForNamespace` func in
> > the
> > > > broker
> > > > > > selection logic can cause infinite unloading.
> > > > > >
> > > > > > Suppose an overloaded broker unloaded a bundle and only has the
> > > minimum
> > > > > > number of bundles(in that namespace) among brokers. In that case,
> > the
> > > > > > selection logic (`removeMostServicingBrokersForNamespace`) will
> > > filter
> > > > out
> > > > > > other brokers and always reassign the bundle to the previous
> > broker.
> > > > This
> > > > > > will cause infinite unloading(like a boomerang).
> > > > > >
> > > > > > To mitigate this issue, we need to cherry-pick this PR to disable
> > > this
> > > > > > logic by the config.
> > > > > > https://github.com/apache/pulsar/pull/16059
> > > > > >
> > > > > > And we probably want to disable this
> > > > > > `removeMostServicingBrokersForNamespace` logic by default.
> > > > > >
> > > > > > Regards,
> > > > > > Heesung
> > > > > >
> > > >
> > >
> >
>

Reply via email to