Hi,

Thank you for sharing this.
In general, I think this can be another good option for Pulsar load
assignment logic.
However, I have some comments below.


> The load managed by each Bundle is not even.

Even if the number of partitions managed by each bundle is the same,

there is no guarantee that the sum of the loads of these partitions will be
> the same.



Each bundle can split and be unloaded to other brokers. Also, the current
hashing logic should distribute approximately the same number of
partitioned topics to each bundle.

Could you clarify the limitation of the current logic?


Doesn't shed loads very well. The existing default policy ThresholdShedder
> has a relatively high usage threshold,

and various traffic thresholds need to be set. Many clusters with high TPS
> and small message bodies may have high CPU but low traffic;

And for many small-scale clusters, the threshold needs to be modified
> according to the actual business.


Yes, fine-tuning is expected for ThresholdShedder. From what I have
observed, loadBalancerBundleUnloadMinThroughputThreshold must be adjusted
based on the cluster's avg throughput.

Also, there is a config, lowerBoundarySheddingEnabled, recently introduced
to unload more aggressively to lower-loaded brokers.


The removed Bundle cannot be well distributed to other Brokers.

The load information of each Broker will be reported at regular intervals,

so the judgment of the Leader Broker when allocating Bundles cannot be
> guaranteed to be completely correct.

Secondly, if there are a large number of Bundles to be redistributed,

the Leader may make the low-load Broker a new high-load node when the load
> information is not up-to-date.


For this issue, the community introduced a new assignment strategy,
LeastResourceUsageWithWeight, which better randomizes assignments.


Implementation
> The client sends a message to a multi-partition Topic, which uses polling
> by default.
> Therefore, we believe that the load of partitions of the same topic is
> balanced.
> We assign partitions of the same topic to bundle by round-robin.
> In this way, the difference in the number of partitions carried by the
> bundle will not exceed 1.
> Since we consider the load of each partition of the same topic to be
> balanced, the load carried by each bundle is also balanced.



If each partition has the same load, then having the same number of topics
per bundle should lead to the load balance.

Then, I wonder how the current way, "hashing" does not achieve the goal
here.



Operation steps:
>
>    1. Partition 0 finds a starting bundle through the consistent hash
>    algorithm, assuming it is bundle0, we start from this bundle
>    2. By round-robin, assign partition 1 to the next bundle1, assign
>    partition 2 to the next bundle2, and so on
>
> Do we store this partition to bundle mapping information?(If we do, what
happens if the leader restarts? how do we guarantee this mapping
persistence?)

How do we find the assigned bundle from a partitioned topic?

Currently, each (partitioned) topic is statically assigned to bundles by "
findBundle" in the following code, so that any broker can know what bundle
a (partitioned) topic is assigned to. Can you clarify the behavior change
here?

public NamespaceBundle findBundle(TopicName topicName) {
    checkArgument(this.nsname.equals(topicName.getNamespaceObject()));
    long hashCode = factory.getLongHashCode(topicName.toString());
    NamespaceBundle bundle = getBundle(hashCode);
    if (topicName.getDomain().equals(TopicDomain.non_persistent)) {
        bundle.setHasNonPersistentTopic(true);
    }
    return bundle;
}

protected NamespaceBundle getBundle(long hash) {
    int idx = Arrays.binarySearch(partitions, hash);
    int lowerIdx = idx < 0 ? -(idx + 2) : idx;
    return bundles.get(lowerIdx);
}



API Changes
>
>    1. Add a configuration item partitionAssignerClassName, so that
>    different partition assignment algorithms can be dynamically configured.
>    2. The existing algorithm will be used as the default
>    partitionAssignerClassName=ConsistentHashingPartitionAssigner
>    3. Implement a new partition assignment class
>    RoundRobinPartitionAssigner
>
> Can't we add this assignment logic to a class that
implements ModularLoadManagerStrategy and BrokerSelectionStrategy(for
PIP-192 Load Balancer Extension)?

It is unclear how RoundRobinPartitionAssigner will work with the existing
code.

Also, note that BrokerSelectionStrategy can run on each broker (not only
the leader broker)




Thanks,

Heesung

On Tue, Mar 14, 2023 at 5:58 AM linlin <lin...@apache.org> wrote:

> Hi all,
> I created a proposal to
> assign topic partitions to bundles by round robin:
> https://github.com/apache/pulsar/issues/19806
>
> It is already running in our production environment,
> and it has a good performance.
>
> Thanks!
>

Reply via email to