Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

Lin Lin Tue, 11 Apr 2023 00:30:19 -0700

> Do we expect that the bundles should have the same loads?

Yes.If the bundle load is similar, it will be easier to achieve a balanced 
state during subsequent load balancing.
If the bundle loads are different or there are hot bundles, it will trigger 
bundle split or unload.
Unload the hotspot bundle can easily make the Broker receiving it become a new 
hotspot.
The example you gave is very ideal. In the actual production environment, 
the load difference between bundles is very large, 
and the existing load balancing algorithm is difficult to achieve balance.



> Can it be resolved by introducing the entry write/read rate to the bundle 
> stats?

The implementation of ThresholdSheder needs to set the threshold. 
Even if the read/write rate is added, the threshold also needs to be set. 
It needs to be set according to different scenarios, which is very inconvenient 
to use.

> Can we try to force-sync the load data of the brokers before performing the 
> distribution of a large number of bundles?

After the bundle is assigned to new Broker, it needs to run for a period of 
time to reflect the new load, 
so mandatory reporting is not very useful.

> IMO, the proposal should clearly describe the Goal, like which problem will 
> be resolved with this proposal.

I have re-added the Goal and the limit of this PIP in issue.
For me, if the community doesn't need my implementation,
it is fine to only plug-in the assignment algorithm.

On 2023/04/10 12:23:03 PengHui Li wrote:
> Hi Lin,
> 
> > The load managed by each Bundle is not even. Even if the number of
> partitions managed
>    by each bundle is the same, there is no guarantee that the sum of the
> loads of these partitions
>    will be the same.
> 
> Do we expect that the bundles should have the same loads? The bundle is the
> base unit of the
> load balancer, we can set the high watermark of the bundle, e.g., the
> maximum topics and throughput.
> But the bundle can have different real loads, and if one bundle runs out of
> the high watermark, the bundle
> will be split. Users can tune the high watermark to distribute the loads
> evenly across brokers.
> 
> For example, there are 4 bundles with loads 1, 3, 2, 4, the maximum load of
> a bundle is 5 and 2 brokers.
> We can assign bundle 0 and bundle 3 to broker-0 and bundle 1 and bundle 2
> to broker-2.
> 
> Of course, this is the ideal situation. If bundle 0 has been assigned to
> broker-0 and bundle 1 has been
> assigned to broker-1. Now, bundle 2 will go to broker 1, and bundle 3 will
> go to broker 1. The loads for each
> broker are 3 and 7. Dynamic programming can help to find an optimized
> solution with more bundle unloads.
> 
> So, should we design the bundle to have even loads? It is difficult to
> achieve in reality. And the proposal
> said, "Let each bundle carry the same load as possible". Is it the correct
> direction for the load balancer?
> 
> > Doesn't shed loads very well. The existing default policy
> ThresholdShedder has a relatively high usage
>    threshold, and various traffic thresholds need to be set. Many clusters
> with high TPS and small message
>    bodies may have high CPU but low traffic; And for many small-scale
> clusters, the threshold needs to be
>    modified according to the actual business.
> 
> Can it be resolved by introducing the entry write/read rate to the bundle
> stats?
> 
> > The removed Bundle cannot be well distributed to other Brokers. The load
> information of each Broker
>    will be reported at regular intervals, so the judgment of the Leader
> Broker when allocating Bundles cannot
>    be guaranteed to be completely correct. Secondly, if there are a large
> number of Bundles to be redistributed,
>    the Leader may make the low-load Broker a new high-load node when the
> load information is not up-to-date.
> 
> Can we try to force-sync the load data of the brokers before performing the
> distribution of a large number of
> bundles?
> 
> For the Goal section in the proposal. It looks like it doesn't map to the
> issues mentioned in the Motivation section.
> IMO, the proposal should clearly describe the Goal, like which problem will
> be resolved with this proposal.
> Both of the above 3 issues or part of them. And what is the high-level
> solution to resolve the issue,
> and what are the pros and cons compared with the existing solution without
> diving into the implementation section.
> 
> Another consideration is the default max bundles of a namespace is 128. I
> don't think the common cases that need
> to set 128 partitions for a topic. If the partitions < the bundle's count,
> will the new solution basically be equivalent to
> the current way?
> 
> If this is not a general solution for common scenarios. I support making
> the topic-bundle assigner pluggable without
> introducing the implementation to the Pulsar repo. Users can implement
> their own assigner based on the business
> requirement. Pulsar's general solution may not be good for all scenarios,
> but it is better for scalability (bundle split)
> and enough for most common scenarios. We can keep improving the general
> solution for the general requirement
> for the most common scenarios.
> 
> Regards,
> Penghui
> 
> 
> On Wed, Mar 22, 2023 at 9:52 AM Lin Lin <[email protected]> wrote:
> 
> >
> > > This appears to be the "round-robin topic-to-bundle mapping" option in
> > > the `fundBundle` function. Is this the only place that needs an update?
> > Can
> > > you list what change is required?
> >
> > In this PIP, we only discuss topic-to-bundle mapping
> > Change is required:
> > 1)
> > When lookup, partitions is assigned to bundle:
> > Lookup -> NamespaceService#getBrokerServiceUrlAsync ->
> > NamespaceService#getBundleAsync ->
> > NamespaceBundles#findBundle
> > Consistent hashing is now used to assign partitions to bundle in
> > NamespaceBundles#findBundle.
> > We should add a configuration item partitionAssignerClassName, so that
> > different partition assignment algorithms can be dynamically configured.
> > The existing algorithm will be used as the default
> > （partitionAssignerClassName=ConsistentHashingPartitionAssigner）
> > 2)
> > Implement a new partition assignment class RoundRobinPartitionAssigner.
> > New partition assignments will be implemented in this class
> >
> >
> > > How do we enable this "round-robin topic-to-bundle mapping option" (by
> > > namespace policy and broker.conf)?
> >
> > In broker.conf, a new option called `partitionAssignerClassName`
> >
> > > Can we apply this option to existing namespaces? (what's the admin
> > > operation to enable this option)?
> >
> > The cluster must ensure that all nodes use the same algorithm.
> > Broker-level configuration can be made effective by restarting or admin API
> > BrokersBase#updateDynamicConfiguration
> >
> > > I assume the "round-robin topic-to-bundle mapping option" works with a
> > > single partitioned topic, because other topics might show different load
> > > per partition. Is this intention? (so users need to ensure not to put
> > other
> > > topics in the namespace, if this option is configured)
> >
> > For  single-partition topics, since the starting bundle is determined
> > using a consistent hash.
> > Therefore,  single-partition topics will spread out to different bundle as
> > much as possible.
> > For high load single-partition topics, current algorithms cannot solve
> > this problem.
> > This PIP cannot solve this problem as well.
> > If it just a low load single-partition topic , the impact on the entire
> > bundle is very small.
> > However, in real scenarios, high-load businesses will share the load
> > through multiple partitions.
> >
> > > Some brokers might have more bundles than other brokers. Do we have
> > > different logic for bundle balancing across brokers? or do we rely on the
> > > existing assign/unload/split logic to balance bundles among brokers?
> >
> > In this PIP, we do not involve the mapping between bundles and brokers,
> > the existing algorithm works well with this PIP.
> > However, we will also contribute our mapping algorithm in the subsequent
> > PIP.
> > For example: bundles under same namespace can be assigned to broker in a
> > round-robin manner.
> >
> >
> >
>

Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

Reply via email to