Hi Pulsar Community, I have opened a issue in [1] https://github.com/apache/pulsar/issues/12812
Any suggestions will be appreciated. ## Motivation As described in [2], we are running a pulsar cluster with about a million topics, and 20% percent of brokers could break down at the sametime. Previously in [2], I proposed adding a ratelimiter to protect zk from surging requests. Thanks to Penghui Li and Hang Chen, provides an alternative way to solve this issue with zk multi api, which provides a way to optimize the performance by batching reads or writes. we have done a perf test on zk multi, check it out in [1]. ## Goal Optimize zookeeper client performance for loading amounts of topics. ## API Changes Three new configs in broker.conf - **enableAutoBatchZookeeperOps**, this feature is optional, as it may increase metadata latency with a small number of topics. - **autoBatchZookeeperOpsMaxNum** and **autoBatchZookeeperOpsMaxDelayMills** Just like auto batching parameters in pulsar producer. Limits the max number of ops in one batch and max delay time to wait for a batch. ## Implementation The basic idea of implementation will be add two queue (one for read ops and one for write ops) in PulsarZooKeeperClient, all zk ops will be added to the queue first, and a background thread will batch theses requests and sends to zk server in one "multi op". ## Reject Alternatives Holding [2] for now, to see the result of this performance optimization. [1] https://github.com/apache/pulsar/issues/12812 [2] https://github.com/apache/pulsar/issues/12651 --- Thanks, Haiting Jiang