Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading

2023-07-23 Thread Kai Wang
Hi dev,

I pushed a PR https://github.com/apache/pulsar/pull/20822 to fix the infinite 
unloading. Please help review this PR. Thanks!

Thanks,
Kai

On 2023/07/09 23:40:03 PengHui Li wrote:
> Hi Heesung,
> 
> For 2.10, I would like to suggest fixing the issue instead of cherry-picking
> the PR. The problem that https://github.com/apache/pulsar/pull/388 had
> resolved will happen again if `loadBalancerDistributeBundlesEvenlyEnabled`
> is disabled. We should try to remove the configuration in the future
> because users are difficult to decide whether to enable or disable it. Both
> of them have problems, just different issues.
> 
> > I think we also need to consider the namespace anit-affinity-group logic
> too.
> 
> +1, it should be fixed to avoid an infinite bundle unloading loop.
> 
> Thanks,
> Penghui
> 
> On Sat, Jul 8, 2023 at 4:07 AM Heesung Sohn
>  wrote:
> 
> > Hi dev,
> >
> > I think we also need to consider the namespace anit-affinity-group logic
> > too. These logics seem to do similar things.
> >
> > https://pulsar.apache.org/docs/3.0.x/administration-load-balance/#distribute-anti-affinity-namespaces-across-failure-domains
> >
> >
> > PengHui
> > We got three biding votes here. Do you think we should proceed to
> > cherry-pick the PR to 2.10, then?
> >
> > Thanks,
> > Heesung
> >
> >
> >
> >
> >
> > On Sun, Jul 2, 2023 at 5:22 PM PengHui Li  wrote:
> >
> > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to
> > > solve the problem that when all bundles in a particular namespace
> > > belong to 1 or few machines, customers owning that namespace will be
> > > heavily impacted if that broker goes down. Of course, this PR caused
> > > the infinite unloading issue and we need to fix it.
> > >
> > > Thanks for the context.
> > > It looks like we can also try to fix the infinite unloading issue.
> > > Now, the broker is unloading the bundles without checking the
> > distribution
> > > of the bundles under a namespace, but it will check when finding
> > > a new owner. Is it possible to check the bundle distribution before
> > > unloading the bundles to avoid infinite unloading?
> > >
> > > Regards,
> > > Penghui
> > >
> > >
> > > On Sun, Jul 2, 2023 at 3:28 PM Enrico Olivelli 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Enrico
> > > >
> > > > Il Dom 2 Lug 2023, 06:19 Hang Chen  ha scritto:
> > > >
> > > > > +1 for cherry-picking it to branch-2.10. We have a flag to control
> > > > > whether to enable or disable it.
> > > > >
> > > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to
> > > > > solve the problem that when all bundles in a particular namespace
> > > > > belong to 1 or few machines, customers owning that namespace will be
> > > > > heavily impacted if that broker goes down. Of course, this PR caused
> > > > > the infinite unloading issue and we need to fix it.
> > > > >
> > > > > > I agree with making it false for the next major version release by
> > > > > default.
> > > > > We'd better remove the config in the next version instead of change
> > > > > the default value to `false`, which will make Pulsar's configuration
> > > > > keep increasing.
> > > > >
> > > > > Thanks,
> > > > > Hang
> > > > >
> > > > > [1] https://github.com/apache/pulsar/pull/388
> > > > >
> > > > > PengHui Li  于2023年7月1日周六 09:38写道:
> > > > > >
> > > > > > +1 for cherry-pick to branch-2.10 since users don't have a
> > workaround
> > > > > > for this issue, and the change is well-understand, low risk.
> > > > > >
> > > > > > I agree with making it false for the next major version release by
> > > > > default.
> > > > > >
> > > > > > Thanks,
> > > > > > Penghui
> > > > > >
> > > > > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn
> > > > > >  wrote:
> > > > > >
> > > > > > > Hi dev,
> > > > > > >
> > > > > > > I realized that `removeMostServicingBrokersForNamespace` func in
> > > the
> > > > > broker
> > > > > > > selection logic can cause infinite unloading.
> > > > > > >
> > > > > > > Suppose an overloaded broker unloaded a bundle and only has the
> > > > minimum
> > > > > > > number of bundles(in that namespace) among brokers. In that case,
> > > the
> > > > > > > selection logic (`removeMostServicingBrokersForNamespace`) will
> > > > filter
> > > > > out
> > > > > > > other brokers and always reassign the bundle to the previous
> > > broker.
> > > > > This
> > > > > > > will cause infinite unloading(like a boomerang).
> > > > > > >
> > > > > > > To mitigate this issue, we need to cherry-pick this PR to disable
> > > > this
> > > > > > > logic by the config.
> > > > > > > https://github.com/apache/pulsar/pull/16059
> > > > > > >
> > > > > > > And we probably want to disable this
> > > > > > > `removeMostServicingBrokersForNamespace` logic by default.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Heesung
> > > > > > >
> > > > >
> > > >
> > >
> >
> 


Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading

2023-07-09 Thread PengHui Li
Hi Heesung,

For 2.10, I would like to suggest fixing the issue instead of cherry-picking
the PR. The problem that https://github.com/apache/pulsar/pull/388 had
resolved will happen again if `loadBalancerDistributeBundlesEvenlyEnabled`
is disabled. We should try to remove the configuration in the future
because users are difficult to decide whether to enable or disable it. Both
of them have problems, just different issues.

> I think we also need to consider the namespace anit-affinity-group logic
too.

+1, it should be fixed to avoid an infinite bundle unloading loop.

Thanks,
Penghui

On Sat, Jul 8, 2023 at 4:07 AM Heesung Sohn
 wrote:

> Hi dev,
>
> I think we also need to consider the namespace anit-affinity-group logic
> too. These logics seem to do similar things.
>
> https://pulsar.apache.org/docs/3.0.x/administration-load-balance/#distribute-anti-affinity-namespaces-across-failure-domains
>
>
> PengHui
> We got three biding votes here. Do you think we should proceed to
> cherry-pick the PR to 2.10, then?
>
> Thanks,
> Heesung
>
>
>
>
>
> On Sun, Jul 2, 2023 at 5:22 PM PengHui Li  wrote:
>
> > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to
> > solve the problem that when all bundles in a particular namespace
> > belong to 1 or few machines, customers owning that namespace will be
> > heavily impacted if that broker goes down. Of course, this PR caused
> > the infinite unloading issue and we need to fix it.
> >
> > Thanks for the context.
> > It looks like we can also try to fix the infinite unloading issue.
> > Now, the broker is unloading the bundles without checking the
> distribution
> > of the bundles under a namespace, but it will check when finding
> > a new owner. Is it possible to check the bundle distribution before
> > unloading the bundles to avoid infinite unloading?
> >
> > Regards,
> > Penghui
> >
> >
> > On Sun, Jul 2, 2023 at 3:28 PM Enrico Olivelli 
> > wrote:
> >
> > > +1
> > >
> > > Enrico
> > >
> > > Il Dom 2 Lug 2023, 06:19 Hang Chen  ha scritto:
> > >
> > > > +1 for cherry-picking it to branch-2.10. We have a flag to control
> > > > whether to enable or disable it.
> > > >
> > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to
> > > > solve the problem that when all bundles in a particular namespace
> > > > belong to 1 or few machines, customers owning that namespace will be
> > > > heavily impacted if that broker goes down. Of course, this PR caused
> > > > the infinite unloading issue and we need to fix it.
> > > >
> > > > > I agree with making it false for the next major version release by
> > > > default.
> > > > We'd better remove the config in the next version instead of change
> > > > the default value to `false`, which will make Pulsar's configuration
> > > > keep increasing.
> > > >
> > > > Thanks,
> > > > Hang
> > > >
> > > > [1] https://github.com/apache/pulsar/pull/388
> > > >
> > > > PengHui Li  于2023年7月1日周六 09:38写道:
> > > > >
> > > > > +1 for cherry-pick to branch-2.10 since users don't have a
> workaround
> > > > > for this issue, and the change is well-understand, low risk.
> > > > >
> > > > > I agree with making it false for the next major version release by
> > > > default.
> > > > >
> > > > > Thanks,
> > > > > Penghui
> > > > >
> > > > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn
> > > > >  wrote:
> > > > >
> > > > > > Hi dev,
> > > > > >
> > > > > > I realized that `removeMostServicingBrokersForNamespace` func in
> > the
> > > > broker
> > > > > > selection logic can cause infinite unloading.
> > > > > >
> > > > > > Suppose an overloaded broker unloaded a bundle and only has the
> > > minimum
> > > > > > number of bundles(in that namespace) among brokers. In that case,
> > the
> > > > > > selection logic (`removeMostServicingBrokersForNamespace`) will
> > > filter
> > > > out
> > > > > > other brokers and always reassign the bundle to the previous
> > broker.
> > > > This
> > > > > > will cause infinite unloading(like a boomerang).
> > > > > >
> > > > > > To mitigate this issue, we need to cherry-pick this PR to disable
> > > this
> > > > > > logic by the config.
> > > > > > https://github.com/apache/pulsar/pull/16059
> > > > > >
> > > > > > And we probably want to disable this
> > > > > > `removeMostServicingBrokersForNamespace` logic by default.
> > > > > >
> > > > > > Regards,
> > > > > > Heesung
> > > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading

2023-07-07 Thread Heesung Sohn
Hi dev,

I think we also need to consider the namespace anit-affinity-group logic
too. These logics seem to do similar things.
https://pulsar.apache.org/docs/3.0.x/administration-load-balance/#distribute-anti-affinity-namespaces-across-failure-domains


PengHui
We got three biding votes here. Do you think we should proceed to
cherry-pick the PR to 2.10, then?

Thanks,
Heesung





On Sun, Jul 2, 2023 at 5:22 PM PengHui Li  wrote:

> > `removeMostServicingBrokersForNamespace ` is introduced by [1] to
> solve the problem that when all bundles in a particular namespace
> belong to 1 or few machines, customers owning that namespace will be
> heavily impacted if that broker goes down. Of course, this PR caused
> the infinite unloading issue and we need to fix it.
>
> Thanks for the context.
> It looks like we can also try to fix the infinite unloading issue.
> Now, the broker is unloading the bundles without checking the distribution
> of the bundles under a namespace, but it will check when finding
> a new owner. Is it possible to check the bundle distribution before
> unloading the bundles to avoid infinite unloading?
>
> Regards,
> Penghui
>
>
> On Sun, Jul 2, 2023 at 3:28 PM Enrico Olivelli 
> wrote:
>
> > +1
> >
> > Enrico
> >
> > Il Dom 2 Lug 2023, 06:19 Hang Chen  ha scritto:
> >
> > > +1 for cherry-picking it to branch-2.10. We have a flag to control
> > > whether to enable or disable it.
> > >
> > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to
> > > solve the problem that when all bundles in a particular namespace
> > > belong to 1 or few machines, customers owning that namespace will be
> > > heavily impacted if that broker goes down. Of course, this PR caused
> > > the infinite unloading issue and we need to fix it.
> > >
> > > > I agree with making it false for the next major version release by
> > > default.
> > > We'd better remove the config in the next version instead of change
> > > the default value to `false`, which will make Pulsar's configuration
> > > keep increasing.
> > >
> > > Thanks,
> > > Hang
> > >
> > > [1] https://github.com/apache/pulsar/pull/388
> > >
> > > PengHui Li  于2023年7月1日周六 09:38写道:
> > > >
> > > > +1 for cherry-pick to branch-2.10 since users don't have a workaround
> > > > for this issue, and the change is well-understand, low risk.
> > > >
> > > > I agree with making it false for the next major version release by
> > > default.
> > > >
> > > > Thanks,
> > > > Penghui
> > > >
> > > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn
> > > >  wrote:
> > > >
> > > > > Hi dev,
> > > > >
> > > > > I realized that `removeMostServicingBrokersForNamespace` func in
> the
> > > broker
> > > > > selection logic can cause infinite unloading.
> > > > >
> > > > > Suppose an overloaded broker unloaded a bundle and only has the
> > minimum
> > > > > number of bundles(in that namespace) among brokers. In that case,
> the
> > > > > selection logic (`removeMostServicingBrokersForNamespace`) will
> > filter
> > > out
> > > > > other brokers and always reassign the bundle to the previous
> broker.
> > > This
> > > > > will cause infinite unloading(like a boomerang).
> > > > >
> > > > > To mitigate this issue, we need to cherry-pick this PR to disable
> > this
> > > > > logic by the config.
> > > > > https://github.com/apache/pulsar/pull/16059
> > > > >
> > > > > And we probably want to disable this
> > > > > `removeMostServicingBrokersForNamespace` logic by default.
> > > > >
> > > > > Regards,
> > > > > Heesung
> > > > >
> > >
> >
>


Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading

2023-07-02 Thread PengHui Li
> `removeMostServicingBrokersForNamespace ` is introduced by [1] to
solve the problem that when all bundles in a particular namespace
belong to 1 or few machines, customers owning that namespace will be
heavily impacted if that broker goes down. Of course, this PR caused
the infinite unloading issue and we need to fix it.

Thanks for the context.
It looks like we can also try to fix the infinite unloading issue.
Now, the broker is unloading the bundles without checking the distribution
of the bundles under a namespace, but it will check when finding
a new owner. Is it possible to check the bundle distribution before
unloading the bundles to avoid infinite unloading?

Regards,
Penghui


On Sun, Jul 2, 2023 at 3:28 PM Enrico Olivelli  wrote:

> +1
>
> Enrico
>
> Il Dom 2 Lug 2023, 06:19 Hang Chen  ha scritto:
>
> > +1 for cherry-picking it to branch-2.10. We have a flag to control
> > whether to enable or disable it.
> >
> > `removeMostServicingBrokersForNamespace ` is introduced by [1] to
> > solve the problem that when all bundles in a particular namespace
> > belong to 1 or few machines, customers owning that namespace will be
> > heavily impacted if that broker goes down. Of course, this PR caused
> > the infinite unloading issue and we need to fix it.
> >
> > > I agree with making it false for the next major version release by
> > default.
> > We'd better remove the config in the next version instead of change
> > the default value to `false`, which will make Pulsar's configuration
> > keep increasing.
> >
> > Thanks,
> > Hang
> >
> > [1] https://github.com/apache/pulsar/pull/388
> >
> > PengHui Li  于2023年7月1日周六 09:38写道:
> > >
> > > +1 for cherry-pick to branch-2.10 since users don't have a workaround
> > > for this issue, and the change is well-understand, low risk.
> > >
> > > I agree with making it false for the next major version release by
> > default.
> > >
> > > Thanks,
> > > Penghui
> > >
> > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn
> > >  wrote:
> > >
> > > > Hi dev,
> > > >
> > > > I realized that `removeMostServicingBrokersForNamespace` func in the
> > broker
> > > > selection logic can cause infinite unloading.
> > > >
> > > > Suppose an overloaded broker unloaded a bundle and only has the
> minimum
> > > > number of bundles(in that namespace) among brokers. In that case, the
> > > > selection logic (`removeMostServicingBrokersForNamespace`) will
> filter
> > out
> > > > other brokers and always reassign the bundle to the previous broker.
> > This
> > > > will cause infinite unloading(like a boomerang).
> > > >
> > > > To mitigate this issue, we need to cherry-pick this PR to disable
> this
> > > > logic by the config.
> > > > https://github.com/apache/pulsar/pull/16059
> > > >
> > > > And we probably want to disable this
> > > > `removeMostServicingBrokersForNamespace` logic by default.
> > > >
> > > > Regards,
> > > > Heesung
> > > >
> >
>


Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading

2023-07-02 Thread Enrico Olivelli
+1

Enrico

Il Dom 2 Lug 2023, 06:19 Hang Chen  ha scritto:

> +1 for cherry-picking it to branch-2.10. We have a flag to control
> whether to enable or disable it.
>
> `removeMostServicingBrokersForNamespace ` is introduced by [1] to
> solve the problem that when all bundles in a particular namespace
> belong to 1 or few machines, customers owning that namespace will be
> heavily impacted if that broker goes down. Of course, this PR caused
> the infinite unloading issue and we need to fix it.
>
> > I agree with making it false for the next major version release by
> default.
> We'd better remove the config in the next version instead of change
> the default value to `false`, which will make Pulsar's configuration
> keep increasing.
>
> Thanks,
> Hang
>
> [1] https://github.com/apache/pulsar/pull/388
>
> PengHui Li  于2023年7月1日周六 09:38写道:
> >
> > +1 for cherry-pick to branch-2.10 since users don't have a workaround
> > for this issue, and the change is well-understand, low risk.
> >
> > I agree with making it false for the next major version release by
> default.
> >
> > Thanks,
> > Penghui
> >
> > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn
> >  wrote:
> >
> > > Hi dev,
> > >
> > > I realized that `removeMostServicingBrokersForNamespace` func in the
> broker
> > > selection logic can cause infinite unloading.
> > >
> > > Suppose an overloaded broker unloaded a bundle and only has the minimum
> > > number of bundles(in that namespace) among brokers. In that case, the
> > > selection logic (`removeMostServicingBrokersForNamespace`) will filter
> out
> > > other brokers and always reassign the bundle to the previous broker.
> This
> > > will cause infinite unloading(like a boomerang).
> > >
> > > To mitigate this issue, we need to cherry-pick this PR to disable this
> > > logic by the config.
> > > https://github.com/apache/pulsar/pull/16059
> > >
> > > And we probably want to disable this
> > > `removeMostServicingBrokersForNamespace` logic by default.
> > >
> > > Regards,
> > > Heesung
> > >
>


Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading

2023-07-01 Thread Hang Chen
+1 for cherry-picking it to branch-2.10. We have a flag to control
whether to enable or disable it.

`removeMostServicingBrokersForNamespace ` is introduced by [1] to
solve the problem that when all bundles in a particular namespace
belong to 1 or few machines, customers owning that namespace will be
heavily impacted if that broker goes down. Of course, this PR caused
the infinite unloading issue and we need to fix it.

> I agree with making it false for the next major version release by default.
We'd better remove the config in the next version instead of change
the default value to `false`, which will make Pulsar's configuration
keep increasing.

Thanks,
Hang

[1] https://github.com/apache/pulsar/pull/388

PengHui Li  于2023年7月1日周六 09:38写道:
>
> +1 for cherry-pick to branch-2.10 since users don't have a workaround
> for this issue, and the change is well-understand, low risk.
>
> I agree with making it false for the next major version release by default.
>
> Thanks,
> Penghui
>
> On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn
>  wrote:
>
> > Hi dev,
> >
> > I realized that `removeMostServicingBrokersForNamespace` func in the broker
> > selection logic can cause infinite unloading.
> >
> > Suppose an overloaded broker unloaded a bundle and only has the minimum
> > number of bundles(in that namespace) among brokers. In that case, the
> > selection logic (`removeMostServicingBrokersForNamespace`) will filter out
> > other brokers and always reassign the bundle to the previous broker. This
> > will cause infinite unloading(like a boomerang).
> >
> > To mitigate this issue, we need to cherry-pick this PR to disable this
> > logic by the config.
> > https://github.com/apache/pulsar/pull/16059
> >
> > And we probably want to disable this
> > `removeMostServicingBrokersForNamespace` logic by default.
> >
> > Regards,
> > Heesung
> >


Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading

2023-06-30 Thread PengHui Li
+1 for cherry-pick to branch-2.10 since users don't have a workaround
for this issue, and the change is well-understand, low risk.

I agree with making it false for the next major version release by default.

Thanks,
Penghui

On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn
 wrote:

> Hi dev,
>
> I realized that `removeMostServicingBrokersForNamespace` func in the broker
> selection logic can cause infinite unloading.
>
> Suppose an overloaded broker unloaded a bundle and only has the minimum
> number of bundles(in that namespace) among brokers. In that case, the
> selection logic (`removeMostServicingBrokersForNamespace`) will filter out
> other brokers and always reassign the bundle to the previous broker. This
> will cause infinite unloading(like a boomerang).
>
> To mitigate this issue, we need to cherry-pick this PR to disable this
> logic by the config.
> https://github.com/apache/pulsar/pull/16059
>
> And we probably want to disable this
> `removeMostServicingBrokersForNamespace` logic by default.
>
> Regards,
> Heesung
>


[DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading

2023-06-30 Thread Heesung Sohn
Hi dev,

I realized that `removeMostServicingBrokersForNamespace` func in the broker
selection logic can cause infinite unloading.

Suppose an overloaded broker unloaded a bundle and only has the minimum
number of bundles(in that namespace) among brokers. In that case, the
selection logic (`removeMostServicingBrokersForNamespace`) will filter out
other brokers and always reassign the bundle to the previous broker. This
will cause infinite unloading(like a boomerang).

To mitigate this issue, we need to cherry-pick this PR to disable this
logic by the config.
https://github.com/apache/pulsar/pull/16059

And we probably want to disable this
`removeMostServicingBrokersForNamespace` logic by default.

Regards,
Heesung