Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-04-11 Thread Lin Lin
As I mentioned in the implementation of PIP, we will plug-in the partition 
assignment strategy. 

However, in the same cluster, it is impossible for some Brokers to use 
consistent hashing and some Brokers to use round robin.

On 2023/04/11 07:37:19 Xiangying Meng wrote:
> Hi Linlin,
> > This is an incompatible modification, so the entire cluster needs to be
> upgraded, not just a part of the nodes
> 
> Appreciate your contribution to the new feature in PIP-255.
>  I have a question regarding the load-balancing aspect of this feature.
> 
> You mentioned that this is an incompatible modification,
> and the entire cluster needs to be upgraded, not just a part of the nodes.
>  I was wondering why we can only have one load-balancing strategy.
> Would it be possible to abstract the logic here and make it an optional
> choice?
> This way, we could have multiple load-balancing strategies,
> such as hash-based, round-robin, etc., available for users to choose from.
> 
> I'd love to hear your thoughts on this.
> 
> Best regards,
> Xiangying
> 
> On Mon, Apr 10, 2023 at 8:23 PM PengHui Li  wrote:
> 
> > Hi Lin,
> >
> > > The load managed by each Bundle is not even. Even if the number of
> > partitions managed
> >by each bundle is the same, there is no guarantee that the sum of the
> > loads of these partitions
> >will be the same.
> >
> > Do we expect that the bundles should have the same loads? The bundle is the
> > base unit of the
> > load balancer, we can set the high watermark of the bundle, e.g., the
> > maximum topics and throughput.
> > But the bundle can have different real loads, and if one bundle runs out of
> > the high watermark, the bundle
> > will be split. Users can tune the high watermark to distribute the loads
> > evenly across brokers.
> >
> > For example, there are 4 bundles with loads 1, 3, 2, 4, the maximum load of
> > a bundle is 5 and 2 brokers.
> > We can assign bundle 0 and bundle 3 to broker-0 and bundle 1 and bundle 2
> > to broker-2.
> >
> > Of course, this is the ideal situation. If bundle 0 has been assigned to
> > broker-0 and bundle 1 has been
> > assigned to broker-1. Now, bundle 2 will go to broker 1, and bundle 3 will
> > go to broker 1. The loads for each
> > broker are 3 and 7. Dynamic programming can help to find an optimized
> > solution with more bundle unloads.
> >
> > So, should we design the bundle to have even loads? It is difficult to
> > achieve in reality. And the proposal
> > said, "Let each bundle carry the same load as possible". Is it the correct
> > direction for the load balancer?
> >
> > > Doesn't shed loads very well. The existing default policy
> > ThresholdShedder has a relatively high usage
> >threshold, and various traffic thresholds need to be set. Many clusters
> > with high TPS and small message
> >bodies may have high CPU but low traffic; And for many small-scale
> > clusters, the threshold needs to be
> >modified according to the actual business.
> >
> > Can it be resolved by introducing the entry write/read rate to the bundle
> > stats?
> >
> > > The removed Bundle cannot be well distributed to other Brokers. The load
> > information of each Broker
> >will be reported at regular intervals, so the judgment of the Leader
> > Broker when allocating Bundles cannot
> >be guaranteed to be completely correct. Secondly, if there are a large
> > number of Bundles to be redistributed,
> >the Leader may make the low-load Broker a new high-load node when the
> > load information is not up-to-date.
> >
> > Can we try to force-sync the load data of the brokers before performing the
> > distribution of a large number of
> > bundles?
> >
> > For the Goal section in the proposal. It looks like it doesn't map to the
> > issues mentioned in the Motivation section.
> > IMO, the proposal should clearly describe the Goal, like which problem will
> > be resolved with this proposal.
> > Both of the above 3 issues or part of them. And what is the high-level
> > solution to resolve the issue,
> > and what are the pros and cons compared with the existing solution without
> > diving into the implementation section.
> >
> > Another consideration is the default max bundles of a namespace is 128. I
> > don't think the common cases that need
> > to set 128 partitions for a topic. If the partitions < the bundle's count,
> > will the new solution basically be equivalent to
> > the current way?
> >
> > If this is not a general solution for common scenarios. I support making
> > the topic-bundle assigner pluggable without
> > introducing the implementation to the Pulsar repo. Users can implement
> > their own assigner based on the business
> > requirement. Pulsar's general solution may not be good for all scenarios,
> > but it is better for scalability (bundle split)
> > and enough for most common scenarios. We can keep improving the general
> > solution for the general requirement
> > for the most common scenarios.
> >
> > Regards,
> > Penghui
> >
> >

Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-04-11 Thread Xiangying Meng
Hi Linlin,
> This is an incompatible modification, so the entire cluster needs to be
upgraded, not just a part of the nodes

Appreciate your contribution to the new feature in PIP-255.
 I have a question regarding the load-balancing aspect of this feature.

You mentioned that this is an incompatible modification,
and the entire cluster needs to be upgraded, not just a part of the nodes.
 I was wondering why we can only have one load-balancing strategy.
Would it be possible to abstract the logic here and make it an optional
choice?
This way, we could have multiple load-balancing strategies,
such as hash-based, round-robin, etc., available for users to choose from.

I'd love to hear your thoughts on this.

Best regards,
Xiangying

On Mon, Apr 10, 2023 at 8:23 PM PengHui Li  wrote:

> Hi Lin,
>
> > The load managed by each Bundle is not even. Even if the number of
> partitions managed
>by each bundle is the same, there is no guarantee that the sum of the
> loads of these partitions
>will be the same.
>
> Do we expect that the bundles should have the same loads? The bundle is the
> base unit of the
> load balancer, we can set the high watermark of the bundle, e.g., the
> maximum topics and throughput.
> But the bundle can have different real loads, and if one bundle runs out of
> the high watermark, the bundle
> will be split. Users can tune the high watermark to distribute the loads
> evenly across brokers.
>
> For example, there are 4 bundles with loads 1, 3, 2, 4, the maximum load of
> a bundle is 5 and 2 brokers.
> We can assign bundle 0 and bundle 3 to broker-0 and bundle 1 and bundle 2
> to broker-2.
>
> Of course, this is the ideal situation. If bundle 0 has been assigned to
> broker-0 and bundle 1 has been
> assigned to broker-1. Now, bundle 2 will go to broker 1, and bundle 3 will
> go to broker 1. The loads for each
> broker are 3 and 7. Dynamic programming can help to find an optimized
> solution with more bundle unloads.
>
> So, should we design the bundle to have even loads? It is difficult to
> achieve in reality. And the proposal
> said, "Let each bundle carry the same load as possible". Is it the correct
> direction for the load balancer?
>
> > Doesn't shed loads very well. The existing default policy
> ThresholdShedder has a relatively high usage
>threshold, and various traffic thresholds need to be set. Many clusters
> with high TPS and small message
>bodies may have high CPU but low traffic; And for many small-scale
> clusters, the threshold needs to be
>modified according to the actual business.
>
> Can it be resolved by introducing the entry write/read rate to the bundle
> stats?
>
> > The removed Bundle cannot be well distributed to other Brokers. The load
> information of each Broker
>will be reported at regular intervals, so the judgment of the Leader
> Broker when allocating Bundles cannot
>be guaranteed to be completely correct. Secondly, if there are a large
> number of Bundles to be redistributed,
>the Leader may make the low-load Broker a new high-load node when the
> load information is not up-to-date.
>
> Can we try to force-sync the load data of the brokers before performing the
> distribution of a large number of
> bundles?
>
> For the Goal section in the proposal. It looks like it doesn't map to the
> issues mentioned in the Motivation section.
> IMO, the proposal should clearly describe the Goal, like which problem will
> be resolved with this proposal.
> Both of the above 3 issues or part of them. And what is the high-level
> solution to resolve the issue,
> and what are the pros and cons compared with the existing solution without
> diving into the implementation section.
>
> Another consideration is the default max bundles of a namespace is 128. I
> don't think the common cases that need
> to set 128 partitions for a topic. If the partitions < the bundle's count,
> will the new solution basically be equivalent to
> the current way?
>
> If this is not a general solution for common scenarios. I support making
> the topic-bundle assigner pluggable without
> introducing the implementation to the Pulsar repo. Users can implement
> their own assigner based on the business
> requirement. Pulsar's general solution may not be good for all scenarios,
> but it is better for scalability (bundle split)
> and enough for most common scenarios. We can keep improving the general
> solution for the general requirement
> for the most common scenarios.
>
> Regards,
> Penghui
>
>
> On Wed, Mar 22, 2023 at 9:52 AM Lin Lin  wrote:
>
> >
> > > This appears to be the "round-robin topic-to-bundle mapping" option in
> > > the `fundBundle` function. Is this the only place that needs an update?
> > Can
> > > you list what change is required?
> >
> > In this PIP, we only discuss topic-to-bundle mapping
> > Change is required:
> > 1)
> > When lookup, partitions is assigned to bundle:
> > Lookup -> NamespaceService#getBrokerServiceUrlAsync ->
> > 

Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-04-11 Thread Lin Lin
> Do we expect that the bundles should have the same loads?

Yes.If the bundle load is similar, it will be easier to achieve a balanced 
state during subsequent load balancing.
If the bundle loads are different or there are hot bundles, it will trigger 
bundle split or unload.
Unload the hotspot bundle can easily make the Broker receiving it become a new 
hotspot.
The example you gave is very ideal. In the actual production environment, 
the load difference between bundles is very large, 
and the existing load balancing algorithm is difficult to achieve balance.


> Can it be resolved by introducing the entry write/read rate to the bundle 
> stats?

The implementation of ThresholdSheder needs to set the threshold. 
Even if the read/write rate is added, the threshold also needs to be set. 
It needs to be set according to different scenarios, which is very inconvenient 
to use.

> Can we try to force-sync the load data of the brokers before performing the 
> distribution of a large number of bundles?

After the bundle is assigned to new Broker, it needs to run for a period of 
time to reflect the new load, 
so mandatory reporting is not very useful.

> IMO, the proposal should clearly describe the Goal, like which problem will 
> be resolved with this proposal.

I have re-added the Goal and the limit of this PIP in issue.
For me, if the community doesn't need my implementation,
it is fine to only plug-in the assignment algorithm.

On 2023/04/10 12:23:03 PengHui Li wrote:
> Hi Lin,
> 
> > The load managed by each Bundle is not even. Even if the number of
> partitions managed
>by each bundle is the same, there is no guarantee that the sum of the
> loads of these partitions
>will be the same.
> 
> Do we expect that the bundles should have the same loads? The bundle is the
> base unit of the
> load balancer, we can set the high watermark of the bundle, e.g., the
> maximum topics and throughput.
> But the bundle can have different real loads, and if one bundle runs out of
> the high watermark, the bundle
> will be split. Users can tune the high watermark to distribute the loads
> evenly across brokers.
> 
> For example, there are 4 bundles with loads 1, 3, 2, 4, the maximum load of
> a bundle is 5 and 2 brokers.
> We can assign bundle 0 and bundle 3 to broker-0 and bundle 1 and bundle 2
> to broker-2.
> 
> Of course, this is the ideal situation. If bundle 0 has been assigned to
> broker-0 and bundle 1 has been
> assigned to broker-1. Now, bundle 2 will go to broker 1, and bundle 3 will
> go to broker 1. The loads for each
> broker are 3 and 7. Dynamic programming can help to find an optimized
> solution with more bundle unloads.
> 
> So, should we design the bundle to have even loads? It is difficult to
> achieve in reality. And the proposal
> said, "Let each bundle carry the same load as possible". Is it the correct
> direction for the load balancer?
> 
> > Doesn't shed loads very well. The existing default policy
> ThresholdShedder has a relatively high usage
>threshold, and various traffic thresholds need to be set. Many clusters
> with high TPS and small message
>bodies may have high CPU but low traffic; And for many small-scale
> clusters, the threshold needs to be
>modified according to the actual business.
> 
> Can it be resolved by introducing the entry write/read rate to the bundle
> stats?
> 
> > The removed Bundle cannot be well distributed to other Brokers. The load
> information of each Broker
>will be reported at regular intervals, so the judgment of the Leader
> Broker when allocating Bundles cannot
>be guaranteed to be completely correct. Secondly, if there are a large
> number of Bundles to be redistributed,
>the Leader may make the low-load Broker a new high-load node when the
> load information is not up-to-date.
> 
> Can we try to force-sync the load data of the brokers before performing the
> distribution of a large number of
> bundles?
> 
> For the Goal section in the proposal. It looks like it doesn't map to the
> issues mentioned in the Motivation section.
> IMO, the proposal should clearly describe the Goal, like which problem will
> be resolved with this proposal.
> Both of the above 3 issues or part of them. And what is the high-level
> solution to resolve the issue,
> and what are the pros and cons compared with the existing solution without
> diving into the implementation section.
> 
> Another consideration is the default max bundles of a namespace is 128. I
> don't think the common cases that need
> to set 128 partitions for a topic. If the partitions < the bundle's count,
> will the new solution basically be equivalent to
> the current way?
> 
> If this is not a general solution for common scenarios. I support making
> the topic-bundle assigner pluggable without
> introducing the implementation to the Pulsar repo. Users can implement
> their own assigner based on the business
> requirement. Pulsar's general solution may not be good for all 

Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-04-10 Thread PengHui Li
Hi Lin,

> The load managed by each Bundle is not even. Even if the number of
partitions managed
   by each bundle is the same, there is no guarantee that the sum of the
loads of these partitions
   will be the same.

Do we expect that the bundles should have the same loads? The bundle is the
base unit of the
load balancer, we can set the high watermark of the bundle, e.g., the
maximum topics and throughput.
But the bundle can have different real loads, and if one bundle runs out of
the high watermark, the bundle
will be split. Users can tune the high watermark to distribute the loads
evenly across brokers.

For example, there are 4 bundles with loads 1, 3, 2, 4, the maximum load of
a bundle is 5 and 2 brokers.
We can assign bundle 0 and bundle 3 to broker-0 and bundle 1 and bundle 2
to broker-2.

Of course, this is the ideal situation. If bundle 0 has been assigned to
broker-0 and bundle 1 has been
assigned to broker-1. Now, bundle 2 will go to broker 1, and bundle 3 will
go to broker 1. The loads for each
broker are 3 and 7. Dynamic programming can help to find an optimized
solution with more bundle unloads.

So, should we design the bundle to have even loads? It is difficult to
achieve in reality. And the proposal
said, "Let each bundle carry the same load as possible". Is it the correct
direction for the load balancer?

> Doesn't shed loads very well. The existing default policy
ThresholdShedder has a relatively high usage
   threshold, and various traffic thresholds need to be set. Many clusters
with high TPS and small message
   bodies may have high CPU but low traffic; And for many small-scale
clusters, the threshold needs to be
   modified according to the actual business.

Can it be resolved by introducing the entry write/read rate to the bundle
stats?

> The removed Bundle cannot be well distributed to other Brokers. The load
information of each Broker
   will be reported at regular intervals, so the judgment of the Leader
Broker when allocating Bundles cannot
   be guaranteed to be completely correct. Secondly, if there are a large
number of Bundles to be redistributed,
   the Leader may make the low-load Broker a new high-load node when the
load information is not up-to-date.

Can we try to force-sync the load data of the brokers before performing the
distribution of a large number of
bundles?

For the Goal section in the proposal. It looks like it doesn't map to the
issues mentioned in the Motivation section.
IMO, the proposal should clearly describe the Goal, like which problem will
be resolved with this proposal.
Both of the above 3 issues or part of them. And what is the high-level
solution to resolve the issue,
and what are the pros and cons compared with the existing solution without
diving into the implementation section.

Another consideration is the default max bundles of a namespace is 128. I
don't think the common cases that need
to set 128 partitions for a topic. If the partitions < the bundle's count,
will the new solution basically be equivalent to
the current way?

If this is not a general solution for common scenarios. I support making
the topic-bundle assigner pluggable without
introducing the implementation to the Pulsar repo. Users can implement
their own assigner based on the business
requirement. Pulsar's general solution may not be good for all scenarios,
but it is better for scalability (bundle split)
and enough for most common scenarios. We can keep improving the general
solution for the general requirement
for the most common scenarios.

Regards,
Penghui


On Wed, Mar 22, 2023 at 9:52 AM Lin Lin  wrote:

>
> > This appears to be the "round-robin topic-to-bundle mapping" option in
> > the `fundBundle` function. Is this the only place that needs an update?
> Can
> > you list what change is required?
>
> In this PIP, we only discuss topic-to-bundle mapping
> Change is required:
> 1)
> When lookup, partitions is assigned to bundle:
> Lookup -> NamespaceService#getBrokerServiceUrlAsync ->
> NamespaceService#getBundleAsync ->
> NamespaceBundles#findBundle
> Consistent hashing is now used to assign partitions to bundle in
> NamespaceBundles#findBundle.
> We should add a configuration item partitionAssignerClassName, so that
> different partition assignment algorithms can be dynamically configured.
> The existing algorithm will be used as the default
> (partitionAssignerClassName=ConsistentHashingPartitionAssigner)
> 2)
> Implement a new partition assignment class RoundRobinPartitionAssigner.
> New partition assignments will be implemented in this class
>
>
> > How do we enable this "round-robin topic-to-bundle mapping option" (by
> > namespace policy and broker.conf)?
>
> In broker.conf, a new option called `partitionAssignerClassName`
>
> > Can we apply this option to existing namespaces? (what's the admin
> > operation to enable this option)?
>
> The cluster must ensure that all nodes use the same algorithm.
> Broker-level configuration can be made effective by restarting 

Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-03-21 Thread Lin Lin


> This appears to be the "round-robin topic-to-bundle mapping" option in
> the `fundBundle` function. Is this the only place that needs an update? Can
> you list what change is required?

In this PIP, we only discuss topic-to-bundle mapping
Change is required:
1)
When lookup, partitions is assigned to bundle:
Lookup -> NamespaceService#getBrokerServiceUrlAsync -> 
NamespaceService#getBundleAsync ->
NamespaceBundles#findBundle
Consistent hashing is now used to assign partitions to bundle in 
NamespaceBundles#findBundle.
We should add a configuration item partitionAssignerClassName, so that 
different partition assignment algorithms can be dynamically configured.
The existing algorithm will be used as the default 
(partitionAssignerClassName=ConsistentHashingPartitionAssigner)
2)
Implement a new partition assignment class RoundRobinPartitionAssigner. 
New partition assignments will be implemented in this class


> How do we enable this "round-robin topic-to-bundle mapping option" (by
> namespace policy and broker.conf)?

In broker.conf, a new option called `partitionAssignerClassName`

> Can we apply this option to existing namespaces? (what's the admin
> operation to enable this option)?

The cluster must ensure that all nodes use the same algorithm.
Broker-level configuration can be made effective by restarting or admin API
BrokersBase#updateDynamicConfiguration

> I assume the "round-robin topic-to-bundle mapping option" works with a
> single partitioned topic, because other topics might show different load
> per partition. Is this intention? (so users need to ensure not to put other
> topics in the namespace, if this option is configured)

For  single-partition topics, since the starting bundle is determined using a 
consistent hash. 
Therefore,  single-partition topics will spread out to different bundle as much 
as possible.
For high load single-partition topics, current algorithms cannot solve this 
problem. 
This PIP cannot solve this problem as well.
If it just a low load single-partition topic , the impact on the entire bundle 
is very small.
However, in real scenarios, high-load businesses will share the load through 
multiple partitions.

> Some brokers might have more bundles than other brokers. Do we have
> different logic for bundle balancing across brokers? or do we rely on the
> existing assign/unload/split logic to balance bundles among brokers?

In this PIP, we do not involve the mapping between bundles and brokers, 
the existing algorithm works well with this PIP. 
However, we will also contribute our mapping algorithm in the subsequent PIP.
For example: bundles under same namespace can be assigned to broker in a 
round-robin manner.




RE: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-03-21 Thread Lin Lin
The namespace level bundle-unload can be performed in 
NamespaceService#splitAndOwnBundleOnceAndRetry
A new judgment will be added here.
After splitting the bundle, it should determine whether to unload at the 
namespace level.


On 2023/03/19 09:53:07 lifepuzzlefun wrote:
> I'm interest on the implementation details.
> 
> 
> 1. where is the partition to bundle mapping stored?when upgrade origin logic 
> to the new round robin logic. how the current code distinguish partition 
> assigned by origin logic and the new created topic assign by round robin 
> logic.
> 
> 
> 2. can you explain how the re-assignment works (when bundle number change).  
> which component will trigger and do the work ? 
> 
> 
> 3. If bundle-split is not expected. how many bundle should user set. and do 
> we need disable bundle split we the round robin logic applied.
> 
> 
> 
> 


Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-03-21 Thread Heesung Sohn
Hi, I see. I have follow-up questions below.

- This appears to be the "round-robin topic-to-bundle mapping" option in
the `fundBundle` function. Is this the only place that needs an update? Can
you list what change is required?

- How do we enable this "round-robin topic-to-bundle mapping option" (by
namespace policy and broker.conf)?

- Can we apply this option to existing namespaces? (what's the admin
operation to enable this option)?

- I assume the "round-robin topic-to-bundle mapping option" works with a
single partitioned topic, because other topics might show different load
per partition. Is this intention? (so users need to ensure not to put other
topics in the namespace, if this option is  configured)

- Some brokers might have more bundles than other brokers. Do we have
different logic for bundle balancing across brokers? or do we rely on the
existing assign/unload/split logic to balance bundles among brokers?

Thanks,
Heesung



On Tue, Mar 21, 2023 at 7:27 AM Lin Lin  wrote:

>
>
> Thanks for joining this discussion
>
> > 1. where is the partition to bundle mapping stored?
>
> We don't need to store the mapping relationship, it can be calculated
> dynamically. The first is the starting bundle, partition-0 is calculated
> directly through consistent hashing. Subsequent partitions are assigned to
> subsequent bundles by round robin
>
> > 2. when upgrade origin logic to the new round robin logic. how the
> current code distinguish partition assigned by origin logic and the new
> created topic assign by round robin logic.
>
> This is an incompatible modification, so the entire cluster needs to be
> upgraded, not just a part of the nodes
>
> > 2. can you explain how the re-assignment works (when bundle number
> change). which component will trigger and do the work ?
>
> When a bundle-split occurs, the bundle unload at the namespace level will
> be triggered. In this namespace, the binding relationship between all
> partitions and the bundle will be re-determined. The re-determined steps
> are as stated in the issue:
> 1) partition-0 finds the starting bundle through consistent hashing
> 2) Subsequent partitions are assigned to subsequent bundles by round robin
>
>
> > 3. If bundle-split is not expected. how many bundle should user set. and
> do we need disable bundle split we the round robin logic applied.
>
> Now this way does not limit the use of bundle-split, but it will trigger
> the rebinding of partitions under the entire namespace during bundle split,
> and there will be a certain allocation time
>


RE: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-03-21 Thread Lin Lin



Thanks for joining this discussion

> 1. where is the partition to bundle mapping stored?

We don't need to store the mapping relationship, it can be calculated 
dynamically. The first is the starting bundle, partition-0 is calculated 
directly through consistent hashing. Subsequent partitions are assigned to 
subsequent bundles by round robin

> 2. when upgrade origin logic to the new round robin logic. how the current 
> code distinguish partition assigned by origin logic and the new created topic 
> assign by round robin logic.

This is an incompatible modification, so the entire cluster needs to be 
upgraded, not just a part of the nodes

> 2. can you explain how the re-assignment works (when bundle number change). 
> which component will trigger and do the work ?

When a bundle-split occurs, the bundle unload at the namespace level will be 
triggered. In this namespace, the binding relationship between all partitions 
and the bundle will be re-determined. The re-determined steps are as stated in 
the issue:
1) partition-0 finds the starting bundle through consistent hashing
2) Subsequent partitions are assigned to subsequent bundles by round robin


> 3. If bundle-split is not expected. how many bundle should user set. and do 
> we need disable bundle split we the round robin logic applied.

Now this way does not limit the use of bundle-split, but it will trigger the 
rebinding of partitions under the entire namespace during bundle split, and 
there will be a certain allocation time


RE: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-03-19 Thread lifepuzzlefun
I'm interest on the implementation details.


1. where is the partition to bundle mapping stored?when upgrade origin logic to 
the new round robin logic. how the current code distinguish partition assigned 
by origin logic and the new created topic assign by round robin logic.


2. can you explain how the re-assignment works (when bundle number change).  
which component will trigger and do the work ? 


3. If bundle-split is not expected. how many bundle should user set. and do we 
need disable bundle split we the round robin logic applied.





RE: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-03-19 Thread lifepuzzlefun
Hi, pulsar community. I'm insterest in this topic and want to join the 
discussion : - )


I test to create one partitioned topic with 1024 partitions in a namespace with 
256 bundle.
the distribution for this topic is post below.


key is the total assigned partition number to the bundle.
value is how many bundles are assigned at this number.


topic `persistent://test/test/test_1` 
{1=10, 2=30, 3=76, 4=46, 5=54, 6=24, 7=4, 8=8, 9=4}


topic ·persistent://test/test/test_2`
{1=2, 2=32, 3=68, 4=60, 5=50, 6=44}


topic `persistent://pulsar/pulsar/test_1` (also 256 bundles and 1024 partitions)
{1=12, 2=27, 3=65, 4=44, 5=67, 6=35, 7=6}



Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-03-17 Thread Lin Lin
Thanks for your review.


> Could you clarify the limitation of the current logic?

The current logic cannot guarantee that the traffic of each bundle is the same, 
and must be balanced through split.
However, the load of the topic is not always the same, and the traffic of the 
business changes with time,
so the load of bundle will continue to change.

If we rely on split+unload to balance, the number of bundles will eventually 
reach the upper limit.

In order to avoid frequent split and unload, the current logic has many 
thresholds, allowing Broker to tolerate load imbalance, which is one of the 
reasons why the load gap between different nodes of the Pulsar cluster is large


> For this issue, the community introduced a new assignment strategy, 
> LeastResourceUsageWithWeight, which better randomizes assignments.

Yes, but LeastResourceUsageWithWeight still cannot completely solve the current 
problem, only alleviate it.
We also optimized based on this implementation, but we will discuss this 
optimization in the following PIP,
The current pip is not covered.



> If each partition has the same load, then having the same number of topics
per bundle should lead to the load balance.
Then, I wonder how the current way, "hashing" does not achieve the goal here.

We think that the loads of different partitions under a same topic are the 
same, but the loads of partitions of different topics are different. 
Bundles are shared by all topics in the entire namespace. 
If we guarantee each bundle has the same number of partitions, but these 
partitions may come from different topics, resulting in different loads for 
each bundle.
If we split bundle according to load, the load of each topic may be different 
in different time periods, and it is impossible to keep the load of each Bundle 
the same.
Using the round robin strategy, we can ensure that the number of partitions 
from a same Topic on each Bundle is roughly consistent, so that the load of 
each Bundle is also consistent.


> happens if the leader restarts? how do we guarantee this mappingpersistence?

1)First of all, we need to find the starting bundle. partition-0 finds a bundle 
through consistent hashing, so as long as the number of bundles remains the 
same, the starting bundle is the same every time, and then other partitions 1, 
2, 3, 4 ... is assigned the same result every time.
2)If the number of bundles changes, i.e. triggering split, the bundles of the 
entire namespace will be forced to be unloaded and all reassigned


> It is unclear how RoundRobinPartitionAssigner will work with the existing 
> code.

The specific implementation has been refined, please check the latest PIP issue



On 2023/03/16 18:20:35 Heesung Sohn wrote:
> Hi,
> 
> Thank you for sharing this.
> In general, I think this can be another good option for Pulsar load
> assignment logic.
> However, I have some comments below.


Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

2023-03-16 Thread Heesung Sohn
Hi,

Thank you for sharing this.
In general, I think this can be another good option for Pulsar load
assignment logic.
However, I have some comments below.


> The load managed by each Bundle is not even.

Even if the number of partitions managed by each bundle is the same,

there is no guarantee that the sum of the loads of these partitions will be
> the same.



Each bundle can split and be unloaded to other brokers. Also, the current
hashing logic should distribute approximately the same number of
partitioned topics to each bundle.

Could you clarify the limitation of the current logic?


Doesn't shed loads very well. The existing default policy ThresholdShedder
> has a relatively high usage threshold,

and various traffic thresholds need to be set. Many clusters with high TPS
> and small message bodies may have high CPU but low traffic;

And for many small-scale clusters, the threshold needs to be modified
> according to the actual business.


Yes, fine-tuning is expected for ThresholdShedder. From what I have
observed, loadBalancerBundleUnloadMinThroughputThreshold must be adjusted
based on the cluster's avg throughput.

Also, there is a config, lowerBoundarySheddingEnabled, recently introduced
to unload more aggressively to lower-loaded brokers.


The removed Bundle cannot be well distributed to other Brokers.

The load information of each Broker will be reported at regular intervals,

so the judgment of the Leader Broker when allocating Bundles cannot be
> guaranteed to be completely correct.

Secondly, if there are a large number of Bundles to be redistributed,

the Leader may make the low-load Broker a new high-load node when the load
> information is not up-to-date.


For this issue, the community introduced a new assignment strategy,
LeastResourceUsageWithWeight, which better randomizes assignments.


Implementation
> The client sends a message to a multi-partition Topic, which uses polling
> by default.
> Therefore, we believe that the load of partitions of the same topic is
> balanced.
> We assign partitions of the same topic to bundle by round-robin.
> In this way, the difference in the number of partitions carried by the
> bundle will not exceed 1.
> Since we consider the load of each partition of the same topic to be
> balanced, the load carried by each bundle is also balanced.



If each partition has the same load, then having the same number of topics
per bundle should lead to the load balance.

Then, I wonder how the current way, "hashing" does not achieve the goal
here.



Operation steps:
>
>1. Partition 0 finds a starting bundle through the consistent hash
>algorithm, assuming it is bundle0, we start from this bundle
>2. By round-robin, assign partition 1 to the next bundle1, assign
>partition 2 to the next bundle2, and so on
>
> Do we store this partition to bundle mapping information?(If we do, what
happens if the leader restarts? how do we guarantee this mapping
persistence?)

How do we find the assigned bundle from a partitioned topic?

Currently, each (partitioned) topic is statically assigned to bundles by "
findBundle" in the following code, so that any broker can know what bundle
a (partitioned) topic is assigned to. Can you clarify the behavior change
here?

public NamespaceBundle findBundle(TopicName topicName) {
checkArgument(this.nsname.equals(topicName.getNamespaceObject()));
long hashCode = factory.getLongHashCode(topicName.toString());
NamespaceBundle bundle = getBundle(hashCode);
if (topicName.getDomain().equals(TopicDomain.non_persistent)) {
bundle.setHasNonPersistentTopic(true);
}
return bundle;
}

protected NamespaceBundle getBundle(long hash) {
int idx = Arrays.binarySearch(partitions, hash);
int lowerIdx = idx < 0 ? -(idx + 2) : idx;
return bundles.get(lowerIdx);
}



API Changes
>
>1. Add a configuration item partitionAssignerClassName, so that
>different partition assignment algorithms can be dynamically configured.
>2. The existing algorithm will be used as the default
>partitionAssignerClassName=ConsistentHashingPartitionAssigner
>3. Implement a new partition assignment class
>RoundRobinPartitionAssigner
>
> Can't we add this assignment logic to a class that
implements ModularLoadManagerStrategy and BrokerSelectionStrategy(for
PIP-192 Load Balancer Extension)?

It is unclear how RoundRobinPartitionAssigner will work with the existing
code.

Also, note that BrokerSelectionStrategy can run on each broker (not only
the leader broker)




Thanks,

Heesung

On Tue, Mar 14, 2023 at 5:58 AM linlin  wrote:

> Hi all,
> I created a proposal to
> assign topic partitions to bundles by round robin:
> https://github.com/apache/pulsar/issues/19806
>
> It is already running in our production environment,
> and it has a good performance.
>
> Thanks!
>