Hi Pulsar Community,

I have opened a issue in [1] https://github.com/apache/pulsar/issues/12812

Any suggestions will be appreciated.

## Motivation

As described in [2], we are running a pulsar cluster with about a million
topics, and 20% percent of brokers could break down at the sametime.
Previously in [2], I proposed adding a ratelimiter to protect zk from
surging requests.
Thanks to Penghui Li and Hang Chen, provides an alternative way to solve
this issue with zk multi api, which provides a way to optimize the
performance by batching reads or writes.

we have done a perf test on zk multi, check it out in [1].

## Goal

Optimize zookeeper client performance for loading amounts of topics.

## API Changes

Three new configs in broker.conf
- **enableAutoBatchZookeeperOps**, this feature is optional, as it may
increase metadata latency with a small number of topics.

- **autoBatchZookeeperOpsMaxNum** and
**autoBatchZookeeperOpsMaxDelayMills**  Just like auto batching parameters
in pulsar producer. Limits the max number of ops in one batch and max delay
time  to wait for a batch.

## Implementation

The basic idea of implementation will be add two queue (one for read ops
and one for write ops) in PulsarZooKeeperClient, all zk ops will be added
to the queue first, and  a background thread will batch theses requests and
sends to zk server in one "multi op".


## Reject Alternatives
Holding [2] for now, to see the result of this performance optimization.


[1] https://github.com/apache/pulsar/issues/12812
[2] https://github.com/apache/pulsar/issues/12651


---
Thanks,
Haiting Jiang

Reply via email to