Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

PengHui Li Mon, 10 Apr 2023 05:23:21 -0700

Hi Lin,

> The load managed by each Bundle is not even. Even if the number of
partitions managed
   by each bundle is the same, there is no guarantee that the sum of the
loads of these partitions
   will be the same.

Do we expect that the bundles should have the same loads? The bundle is the
base unit of the
load balancer, we can set the high watermark of the bundle, e.g., the
maximum topics and throughput.
But the bundle can have different real loads, and if one bundle runs out of
the high watermark, the bundle
will be split. Users can tune the high watermark to distribute the loads
evenly across brokers.

For example, there are 4 bundles with loads 1, 3, 2, 4, the maximum load of
a bundle is 5 and 2 brokers.
We can assign bundle 0 and bundle 3 to broker-0 and bundle 1 and bundle 2
to broker-2.

Of course, this is the ideal situation. If bundle 0 has been assigned to
broker-0 and bundle 1 has been
assigned to broker-1. Now, bundle 2 will go to broker 1, and bundle 3 will
go to broker 1. The loads for each
broker are 3 and 7. Dynamic programming can help to find an optimized
solution with more bundle unloads.

So, should we design the bundle to have even loads? It is difficult to
achieve in reality. And the proposal
said, "Let each bundle carry the same load as possible". Is it the correct
direction for the load balancer?

> Doesn't shed loads very well. The existing default policy
ThresholdShedder has a relatively high usage
   threshold, and various traffic thresholds need to be set. Many clusters
with high TPS and small message
   bodies may have high CPU but low traffic; And for many small-scale
clusters, the threshold needs to be
   modified according to the actual business.

Can it be resolved by introducing the entry write/read rate to the bundle
stats?

> The removed Bundle cannot be well distributed to other Brokers. The load
information of each Broker
   will be reported at regular intervals, so the judgment of the Leader
Broker when allocating Bundles cannot
   be guaranteed to be completely correct. Secondly, if there are a large
number of Bundles to be redistributed,
   the Leader may make the low-load Broker a new high-load node when the
load information is not up-to-date.

Can we try to force-sync the load data of the brokers before performing the
distribution of a large number of
bundles?

For the Goal section in the proposal. It looks like it doesn't map to the
issues mentioned in the Motivation section.
IMO, the proposal should clearly describe the Goal, like which problem will
be resolved with this proposal.
Both of the above 3 issues or part of them. And what is the high-level
solution to resolve the issue,
and what are the pros and cons compared with the existing solution without
diving into the implementation section.

Another consideration is the default max bundles of a namespace is 128. I
don't think the common cases that need
to set 128 partitions for a topic. If the partitions < the bundle's count,
will the new solution basically be equivalent to
the current way?

If this is not a general solution for common scenarios. I support making
the topic-bundle assigner pluggable without
introducing the implementation to the Pulsar repo. Users can implement
their own assigner based on the business
requirement. Pulsar's general solution may not be good for all scenarios,
but it is better for scalability (bundle split)
and enough for most common scenarios. We can keep improving the general
solution for the general requirement
for the most common scenarios.

Regards,
Penghui

On Wed, Mar 22, 2023 at 9:52 AM Lin Lin <[email protected]> wrote:

>
> > This appears to be the "round-robin topic-to-bundle mapping" option in
> > the `fundBundle` function. Is this the only place that needs an update?
> Can
> > you list what change is required?
>
> In this PIP, we only discuss topic-to-bundle mapping
> Change is required:
> 1)
> When lookup, partitions is assigned to bundle:
> Lookup -> NamespaceService#getBrokerServiceUrlAsync ->
> NamespaceService#getBundleAsync ->
> NamespaceBundles#findBundle
> Consistent hashing is now used to assign partitions to bundle in
> NamespaceBundles#findBundle.
> We should add a configuration item partitionAssignerClassName, so that
> different partition assignment algorithms can be dynamically configured.
> The existing algorithm will be used as the default
> （partitionAssignerClassName=ConsistentHashingPartitionAssigner）
> 2)
> Implement a new partition assignment class RoundRobinPartitionAssigner.
> New partition assignments will be implemented in this class
>
>
> > How do we enable this "round-robin topic-to-bundle mapping option" (by
> > namespace policy and broker.conf)?
>
> In broker.conf, a new option called `partitionAssignerClassName`
>
> > Can we apply this option to existing namespaces? (what's the admin
> > operation to enable this option)?
>
> The cluster must ensure that all nodes use the same algorithm.
> Broker-level configuration can be made effective by restarting or admin API
> BrokersBase#updateDynamicConfiguration
>
> > I assume the "round-robin topic-to-bundle mapping option" works with a
> > single partitioned topic, because other topics might show different load
> > per partition. Is this intention? (so users need to ensure not to put
> other
> > topics in the namespace, if this option is configured)
>
> For  single-partition topics, since the starting bundle is determined
> using a consistent hash.
> Therefore,  single-partition topics will spread out to different bundle as
> much as possible.
> For high load single-partition topics, current algorithms cannot solve
> this problem.
> This PIP cannot solve this problem as well.
> If it just a low load single-partition topic , the impact on the entire
> bundle is very small.
> However, in real scenarios, high-load businesses will share the load
> through multiple partitions.
>
> > Some brokers might have more bundles than other brokers. Do we have
> > different logic for bundle balancing across brokers? or do we rely on the
> > existing assign/unload/split logic to balance bundles among brokers?
>
> In this PIP, we do not involve the mapping between bundles and brokers,
> the existing algorithm works well with this PIP.
> However, we will also contribute our mapping algorithm in the subsequent
> PIP.
> For example: bundles under same namespace can be assigned to broker in a
> round-robin manner.
>
>
>

Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

Reply via email to