PIP: https://github.com/apache/pulsar/issues/18510
Problem Statement When a topic is a partitioned topic and a partition is not available for producing messages, currently pulsar client will still try to produce messages on unavailable partitions, which it may not necessarily need to do in certain cases. Pulsar Client may simply pick up another partition and try producing in certain cases. Partition Unavailable There could be a plethora of reasons a partition can become unavailable. But the most prominent reason is partition is moving from one broker to another, and until every actor is in sync with which broker owns the partition, the partition will be unavailable for producing. Actors are producers, old broker, new broker. Client Behavior This is the typical produce code. producer.sendAsync(payLoad.getBytes(StandardCharsets.UTF_8)); When send is called message is enqueued in a queue(called pending message queue) and the future is returned. And future is only completed when the message is picked from the queue and sent to the broker asynchronously and ack is received asynchronously again. Max size of the pending message queue is controlled by producer config maxPendingMessages. When pending message queue is full, the application will start getting publish failures. Pending message queue provide a cushion towards unavailable partitions. But again it has some limits. When another partitions can be picked 1. When the message is not keyed. That means the message is not ordered based on a key. 2. When routing mode is round-robin, that means a message can be produced to any of the partitions. So If a partition is unavailable try and pick up another partition for producing, by using the same round-robin algorithm.