Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading
Hi dev, I pushed a PR https://github.com/apache/pulsar/pull/20822 to fix the infinite unloading. Please help review this PR. Thanks! Thanks, Kai On 2023/07/09 23:40:03 PengHui Li wrote: > Hi Heesung, > > For 2.10, I would like to suggest fixing the issue instead of cherry-picking > the PR. The problem that https://github.com/apache/pulsar/pull/388 had > resolved will happen again if `loadBalancerDistributeBundlesEvenlyEnabled` > is disabled. We should try to remove the configuration in the future > because users are difficult to decide whether to enable or disable it. Both > of them have problems, just different issues. > > > I think we also need to consider the namespace anit-affinity-group logic > too. > > +1, it should be fixed to avoid an infinite bundle unloading loop. > > Thanks, > Penghui > > On Sat, Jul 8, 2023 at 4:07 AM Heesung Sohn > wrote: > > > Hi dev, > > > > I think we also need to consider the namespace anit-affinity-group logic > > too. These logics seem to do similar things. > > > > https://pulsar.apache.org/docs/3.0.x/administration-load-balance/#distribute-anti-affinity-namespaces-across-failure-domains > > > > > > PengHui > > We got three biding votes here. Do you think we should proceed to > > cherry-pick the PR to 2.10, then? > > > > Thanks, > > Heesung > > > > > > > > > > > > On Sun, Jul 2, 2023 at 5:22 PM PengHui Li wrote: > > > > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to > > > solve the problem that when all bundles in a particular namespace > > > belong to 1 or few machines, customers owning that namespace will be > > > heavily impacted if that broker goes down. Of course, this PR caused > > > the infinite unloading issue and we need to fix it. > > > > > > Thanks for the context. > > > It looks like we can also try to fix the infinite unloading issue. > > > Now, the broker is unloading the bundles without checking the > > distribution > > > of the bundles under a namespace, but it will check when finding > > > a new owner. Is it possible to check the bundle distribution before > > > unloading the bundles to avoid infinite unloading? > > > > > > Regards, > > > Penghui > > > > > > > > > On Sun, Jul 2, 2023 at 3:28 PM Enrico Olivelli > > > wrote: > > > > > > > +1 > > > > > > > > Enrico > > > > > > > > Il Dom 2 Lug 2023, 06:19 Hang Chen ha scritto: > > > > > > > > > +1 for cherry-picking it to branch-2.10. We have a flag to control > > > > > whether to enable or disable it. > > > > > > > > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to > > > > > solve the problem that when all bundles in a particular namespace > > > > > belong to 1 or few machines, customers owning that namespace will be > > > > > heavily impacted if that broker goes down. Of course, this PR caused > > > > > the infinite unloading issue and we need to fix it. > > > > > > > > > > > I agree with making it false for the next major version release by > > > > > default. > > > > > We'd better remove the config in the next version instead of change > > > > > the default value to `false`, which will make Pulsar's configuration > > > > > keep increasing. > > > > > > > > > > Thanks, > > > > > Hang > > > > > > > > > > [1] https://github.com/apache/pulsar/pull/388 > > > > > > > > > > PengHui Li 于2023年7月1日周六 09:38写道: > > > > > > > > > > > > +1 for cherry-pick to branch-2.10 since users don't have a > > workaround > > > > > > for this issue, and the change is well-understand, low risk. > > > > > > > > > > > > I agree with making it false for the next major version release by > > > > > default. > > > > > > > > > > > > Thanks, > > > > > > Penghui > > > > > > > > > > > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn > > > > > > wrote: > > > > > > > > > > > > > Hi dev, > > > > > > > > > > > > > > I realized that `removeMostServicingBrokersForNamespace` func in > > > the > > > > > broker > > > > > > > selection logic can cause infinite unloading. > > > > > > > > > > > > > > Suppose an overloaded broker unloaded a bundle and only has the > > > > minimum > > > > > > > number of bundles(in that namespace) among brokers. In that case, > > > the > > > > > > > selection logic (`removeMostServicingBrokersForNamespace`) will > > > > filter > > > > > out > > > > > > > other brokers and always reassign the bundle to the previous > > > broker. > > > > > This > > > > > > > will cause infinite unloading(like a boomerang). > > > > > > > > > > > > > > To mitigate this issue, we need to cherry-pick this PR to disable > > > > this > > > > > > > logic by the config. > > > > > > > https://github.com/apache/pulsar/pull/16059 > > > > > > > > > > > > > > And we probably want to disable this > > > > > > > `removeMostServicingBrokersForNamespace` logic by default. > > > > > > > > > > > > > > Regards, > > > > > > > Heesung > > > > > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading
Hi Heesung, For 2.10, I would like to suggest fixing the issue instead of cherry-picking the PR. The problem that https://github.com/apache/pulsar/pull/388 had resolved will happen again if `loadBalancerDistributeBundlesEvenlyEnabled` is disabled. We should try to remove the configuration in the future because users are difficult to decide whether to enable or disable it. Both of them have problems, just different issues. > I think we also need to consider the namespace anit-affinity-group logic too. +1, it should be fixed to avoid an infinite bundle unloading loop. Thanks, Penghui On Sat, Jul 8, 2023 at 4:07 AM Heesung Sohn wrote: > Hi dev, > > I think we also need to consider the namespace anit-affinity-group logic > too. These logics seem to do similar things. > > https://pulsar.apache.org/docs/3.0.x/administration-load-balance/#distribute-anti-affinity-namespaces-across-failure-domains > > > PengHui > We got three biding votes here. Do you think we should proceed to > cherry-pick the PR to 2.10, then? > > Thanks, > Heesung > > > > > > On Sun, Jul 2, 2023 at 5:22 PM PengHui Li wrote: > > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to > > solve the problem that when all bundles in a particular namespace > > belong to 1 or few machines, customers owning that namespace will be > > heavily impacted if that broker goes down. Of course, this PR caused > > the infinite unloading issue and we need to fix it. > > > > Thanks for the context. > > It looks like we can also try to fix the infinite unloading issue. > > Now, the broker is unloading the bundles without checking the > distribution > > of the bundles under a namespace, but it will check when finding > > a new owner. Is it possible to check the bundle distribution before > > unloading the bundles to avoid infinite unloading? > > > > Regards, > > Penghui > > > > > > On Sun, Jul 2, 2023 at 3:28 PM Enrico Olivelli > > wrote: > > > > > +1 > > > > > > Enrico > > > > > > Il Dom 2 Lug 2023, 06:19 Hang Chen ha scritto: > > > > > > > +1 for cherry-picking it to branch-2.10. We have a flag to control > > > > whether to enable or disable it. > > > > > > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to > > > > solve the problem that when all bundles in a particular namespace > > > > belong to 1 or few machines, customers owning that namespace will be > > > > heavily impacted if that broker goes down. Of course, this PR caused > > > > the infinite unloading issue and we need to fix it. > > > > > > > > > I agree with making it false for the next major version release by > > > > default. > > > > We'd better remove the config in the next version instead of change > > > > the default value to `false`, which will make Pulsar's configuration > > > > keep increasing. > > > > > > > > Thanks, > > > > Hang > > > > > > > > [1] https://github.com/apache/pulsar/pull/388 > > > > > > > > PengHui Li 于2023年7月1日周六 09:38写道: > > > > > > > > > > +1 for cherry-pick to branch-2.10 since users don't have a > workaround > > > > > for this issue, and the change is well-understand, low risk. > > > > > > > > > > I agree with making it false for the next major version release by > > > > default. > > > > > > > > > > Thanks, > > > > > Penghui > > > > > > > > > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn > > > > > wrote: > > > > > > > > > > > Hi dev, > > > > > > > > > > > > I realized that `removeMostServicingBrokersForNamespace` func in > > the > > > > broker > > > > > > selection logic can cause infinite unloading. > > > > > > > > > > > > Suppose an overloaded broker unloaded a bundle and only has the > > > minimum > > > > > > number of bundles(in that namespace) among brokers. In that case, > > the > > > > > > selection logic (`removeMostServicingBrokersForNamespace`) will > > > filter > > > > out > > > > > > other brokers and always reassign the bundle to the previous > > broker. > > > > This > > > > > > will cause infinite unloading(like a boomerang). > > > > > > > > > > > > To mitigate this issue, we need to cherry-pick this PR to disable > > > this > > > > > > logic by the config. > > > > > > https://github.com/apache/pulsar/pull/16059 > > > > > > > > > > > > And we probably want to disable this > > > > > > `removeMostServicingBrokersForNamespace` logic by default. > > > > > > > > > > > > Regards, > > > > > > Heesung > > > > > > > > > > > > > > > >
Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading
Hi dev, I think we also need to consider the namespace anit-affinity-group logic too. These logics seem to do similar things. https://pulsar.apache.org/docs/3.0.x/administration-load-balance/#distribute-anti-affinity-namespaces-across-failure-domains PengHui We got three biding votes here. Do you think we should proceed to cherry-pick the PR to 2.10, then? Thanks, Heesung On Sun, Jul 2, 2023 at 5:22 PM PengHui Li wrote: > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to > solve the problem that when all bundles in a particular namespace > belong to 1 or few machines, customers owning that namespace will be > heavily impacted if that broker goes down. Of course, this PR caused > the infinite unloading issue and we need to fix it. > > Thanks for the context. > It looks like we can also try to fix the infinite unloading issue. > Now, the broker is unloading the bundles without checking the distribution > of the bundles under a namespace, but it will check when finding > a new owner. Is it possible to check the bundle distribution before > unloading the bundles to avoid infinite unloading? > > Regards, > Penghui > > > On Sun, Jul 2, 2023 at 3:28 PM Enrico Olivelli > wrote: > > > +1 > > > > Enrico > > > > Il Dom 2 Lug 2023, 06:19 Hang Chen ha scritto: > > > > > +1 for cherry-picking it to branch-2.10. We have a flag to control > > > whether to enable or disable it. > > > > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to > > > solve the problem that when all bundles in a particular namespace > > > belong to 1 or few machines, customers owning that namespace will be > > > heavily impacted if that broker goes down. Of course, this PR caused > > > the infinite unloading issue and we need to fix it. > > > > > > > I agree with making it false for the next major version release by > > > default. > > > We'd better remove the config in the next version instead of change > > > the default value to `false`, which will make Pulsar's configuration > > > keep increasing. > > > > > > Thanks, > > > Hang > > > > > > [1] https://github.com/apache/pulsar/pull/388 > > > > > > PengHui Li 于2023年7月1日周六 09:38写道: > > > > > > > > +1 for cherry-pick to branch-2.10 since users don't have a workaround > > > > for this issue, and the change is well-understand, low risk. > > > > > > > > I agree with making it false for the next major version release by > > > default. > > > > > > > > Thanks, > > > > Penghui > > > > > > > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn > > > > wrote: > > > > > > > > > Hi dev, > > > > > > > > > > I realized that `removeMostServicingBrokersForNamespace` func in > the > > > broker > > > > > selection logic can cause infinite unloading. > > > > > > > > > > Suppose an overloaded broker unloaded a bundle and only has the > > minimum > > > > > number of bundles(in that namespace) among brokers. In that case, > the > > > > > selection logic (`removeMostServicingBrokersForNamespace`) will > > filter > > > out > > > > > other brokers and always reassign the bundle to the previous > broker. > > > This > > > > > will cause infinite unloading(like a boomerang). > > > > > > > > > > To mitigate this issue, we need to cherry-pick this PR to disable > > this > > > > > logic by the config. > > > > > https://github.com/apache/pulsar/pull/16059 > > > > > > > > > > And we probably want to disable this > > > > > `removeMostServicingBrokersForNamespace` logic by default. > > > > > > > > > > Regards, > > > > > Heesung > > > > > > > > > > >
Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading
> `removeMostServicingBrokersForNamespace ` is introduced by [1] to solve the problem that when all bundles in a particular namespace belong to 1 or few machines, customers owning that namespace will be heavily impacted if that broker goes down. Of course, this PR caused the infinite unloading issue and we need to fix it. Thanks for the context. It looks like we can also try to fix the infinite unloading issue. Now, the broker is unloading the bundles without checking the distribution of the bundles under a namespace, but it will check when finding a new owner. Is it possible to check the bundle distribution before unloading the bundles to avoid infinite unloading? Regards, Penghui On Sun, Jul 2, 2023 at 3:28 PM Enrico Olivelli wrote: > +1 > > Enrico > > Il Dom 2 Lug 2023, 06:19 Hang Chen ha scritto: > > > +1 for cherry-picking it to branch-2.10. We have a flag to control > > whether to enable or disable it. > > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to > > solve the problem that when all bundles in a particular namespace > > belong to 1 or few machines, customers owning that namespace will be > > heavily impacted if that broker goes down. Of course, this PR caused > > the infinite unloading issue and we need to fix it. > > > > > I agree with making it false for the next major version release by > > default. > > We'd better remove the config in the next version instead of change > > the default value to `false`, which will make Pulsar's configuration > > keep increasing. > > > > Thanks, > > Hang > > > > [1] https://github.com/apache/pulsar/pull/388 > > > > PengHui Li 于2023年7月1日周六 09:38写道: > > > > > > +1 for cherry-pick to branch-2.10 since users don't have a workaround > > > for this issue, and the change is well-understand, low risk. > > > > > > I agree with making it false for the next major version release by > > default. > > > > > > Thanks, > > > Penghui > > > > > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn > > > wrote: > > > > > > > Hi dev, > > > > > > > > I realized that `removeMostServicingBrokersForNamespace` func in the > > broker > > > > selection logic can cause infinite unloading. > > > > > > > > Suppose an overloaded broker unloaded a bundle and only has the > minimum > > > > number of bundles(in that namespace) among brokers. In that case, the > > > > selection logic (`removeMostServicingBrokersForNamespace`) will > filter > > out > > > > other brokers and always reassign the bundle to the previous broker. > > This > > > > will cause infinite unloading(like a boomerang). > > > > > > > > To mitigate this issue, we need to cherry-pick this PR to disable > this > > > > logic by the config. > > > > https://github.com/apache/pulsar/pull/16059 > > > > > > > > And we probably want to disable this > > > > `removeMostServicingBrokersForNamespace` logic by default. > > > > > > > > Regards, > > > > Heesung > > > > > > >
Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading
+1 Enrico Il Dom 2 Lug 2023, 06:19 Hang Chen ha scritto: > +1 for cherry-picking it to branch-2.10. We have a flag to control > whether to enable or disable it. > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to > solve the problem that when all bundles in a particular namespace > belong to 1 or few machines, customers owning that namespace will be > heavily impacted if that broker goes down. Of course, this PR caused > the infinite unloading issue and we need to fix it. > > > I agree with making it false for the next major version release by > default. > We'd better remove the config in the next version instead of change > the default value to `false`, which will make Pulsar's configuration > keep increasing. > > Thanks, > Hang > > [1] https://github.com/apache/pulsar/pull/388 > > PengHui Li 于2023年7月1日周六 09:38写道: > > > > +1 for cherry-pick to branch-2.10 since users don't have a workaround > > for this issue, and the change is well-understand, low risk. > > > > I agree with making it false for the next major version release by > default. > > > > Thanks, > > Penghui > > > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn > > wrote: > > > > > Hi dev, > > > > > > I realized that `removeMostServicingBrokersForNamespace` func in the > broker > > > selection logic can cause infinite unloading. > > > > > > Suppose an overloaded broker unloaded a bundle and only has the minimum > > > number of bundles(in that namespace) among brokers. In that case, the > > > selection logic (`removeMostServicingBrokersForNamespace`) will filter > out > > > other brokers and always reassign the bundle to the previous broker. > This > > > will cause infinite unloading(like a boomerang). > > > > > > To mitigate this issue, we need to cherry-pick this PR to disable this > > > logic by the config. > > > https://github.com/apache/pulsar/pull/16059 > > > > > > And we probably want to disable this > > > `removeMostServicingBrokersForNamespace` logic by default. > > > > > > Regards, > > > Heesung > > > >
Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading
+1 for cherry-picking it to branch-2.10. We have a flag to control whether to enable or disable it. `removeMostServicingBrokersForNamespace ` is introduced by [1] to solve the problem that when all bundles in a particular namespace belong to 1 or few machines, customers owning that namespace will be heavily impacted if that broker goes down. Of course, this PR caused the infinite unloading issue and we need to fix it. > I agree with making it false for the next major version release by default. We'd better remove the config in the next version instead of change the default value to `false`, which will make Pulsar's configuration keep increasing. Thanks, Hang [1] https://github.com/apache/pulsar/pull/388 PengHui Li 于2023年7月1日周六 09:38写道: > > +1 for cherry-pick to branch-2.10 since users don't have a workaround > for this issue, and the change is well-understand, low risk. > > I agree with making it false for the next major version release by default. > > Thanks, > Penghui > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn > wrote: > > > Hi dev, > > > > I realized that `removeMostServicingBrokersForNamespace` func in the broker > > selection logic can cause infinite unloading. > > > > Suppose an overloaded broker unloaded a bundle and only has the minimum > > number of bundles(in that namespace) among brokers. In that case, the > > selection logic (`removeMostServicingBrokersForNamespace`) will filter out > > other brokers and always reassign the bundle to the previous broker. This > > will cause infinite unloading(like a boomerang). > > > > To mitigate this issue, we need to cherry-pick this PR to disable this > > logic by the config. > > https://github.com/apache/pulsar/pull/16059 > > > > And we probably want to disable this > > `removeMostServicingBrokersForNamespace` logic by default. > > > > Regards, > > Heesung > >
Re: [DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading
+1 for cherry-pick to branch-2.10 since users don't have a workaround for this issue, and the change is well-understand, low risk. I agree with making it false for the next major version release by default. Thanks, Penghui On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn wrote: > Hi dev, > > I realized that `removeMostServicingBrokersForNamespace` func in the broker > selection logic can cause infinite unloading. > > Suppose an overloaded broker unloaded a bundle and only has the minimum > number of bundles(in that namespace) among brokers. In that case, the > selection logic (`removeMostServicingBrokersForNamespace`) will filter out > other brokers and always reassign the bundle to the previous broker. This > will cause infinite unloading(like a boomerang). > > To mitigate this issue, we need to cherry-pick this PR to disable this > logic by the config. > https://github.com/apache/pulsar/pull/16059 > > And we probably want to disable this > `removeMostServicingBrokersForNamespace` logic by default. > > Regards, > Heesung >
[DISCUSS] Cherry-pick PR-16059 to 2.10 to prevent infinite unloading
Hi dev, I realized that `removeMostServicingBrokersForNamespace` func in the broker selection logic can cause infinite unloading. Suppose an overloaded broker unloaded a bundle and only has the minimum number of bundles(in that namespace) among brokers. In that case, the selection logic (`removeMostServicingBrokersForNamespace`) will filter out other brokers and always reassign the bundle to the previous broker. This will cause infinite unloading(like a boomerang). To mitigate this issue, we need to cherry-pick this PR to disable this logic by the config. https://github.com/apache/pulsar/pull/16059 And we probably want to disable this `removeMostServicingBrokersForNamespace` logic by default. Regards, Heesung