Re: [I] [Bug] Consumption unevenness arises in the consumption performance pressure test. [pulsar]

via GitHub Mon, 08 Dec 2025 23:48:32 -0800


lhotari commented on issue #25046:
URL: https://github.com/apache/pulsar/issues/25046#issuecomment-3630829888

> Currently, my goal is to identify the root cause of partition imbalance in
version 3.0.10.

@g0715158 In the OSS project, we don't maintain specific versions such as
3.0.10. 3.0.x continues to be maintained, but the latest released version is
3.0.15 .
For this case, please attempt to reproduce with 4.1.2 as I suggested before.
If you cannot reproduce, that's a lot more information for you to identify the
root cause in version 3.0.10.

> Approximately 1000 consumers per partition experience performance
degradation after consuming for about 9 minutes, and partition imbalance can be
observed from the console. I would like to ask if such a situation has occurred
in any existing issues?

Yes, I've seen that happen. The imbalance is common, but the case where
consuming stops completely might be a different issue such as #24926 .

However, in the stats that you shared, there are many cases where the
backlog is 0 for the partitions that have out rate of 0.
Is this test case of a scenario where producers are producing actively and
consumers are following? Or is it a "catch-up scenario" where there's existing
backlog which consumers consume.

In your test scenario, you didn't mention anything about the client side.
How many separate client instances and/or client connections do you have? How
well is the client side tuned? For example,
https://pulsar.apache.org/docs/next/client-libraries-java-setup/#java-client-performance
?

When you are creating a large number of Java client instances in a single
JVM, it's necessary to share resources. There's an example in branch-4.0 in
this test:
https://github.com/apache/pulsar/blob/branch-4.0/pulsar-broker/src/test/java/org/apache/pulsar/client/api/PatternConsumerBackPressureMultipleConsumersTest.java#L238-L280
.
For 4.1+, there's PIP-234 `PulsarClientSharedResources`:
https://github.com/apache/pulsar/blob/270120ce6e33e5a084397ca31186f1bb87835e48/pulsar-broker/src/test/java/org/apache/pulsar/client/api/PatternConsumerBackPressureMultipleConsumersTest.java#L103-L123

If you are actively producing to partitions in a test case, one common issue
for test scenarios is the producing side. It's also possible that the producing
side doesn't produce evenly across partitions. One way to solve this is to
produce individually to specific partitions (*-partition-0, *-partition-1, ...)
in the load generator and have a sufficient amount of separate nodes for
producing the messages so that the bottleneck isn't in producing clients.
On the producer side, using a multi-topic (partitioned) producer will also
have more variance across partitions due to the default use of
https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/RoundRobinPartitionMessageRouterImpl.java
.
The setting can be controlled with
https://github.com/apache/pulsar/blob/cc5e479d63103f81e3af833e8b06227d1a6563e1/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/ProducerBuilder.java#L462-L474
.
The defaults are time based for both routing and batching. For testing
purposes, it could be better to use a count based routing if partitioned
producer is used and configure batching with a long `batchingMaxPublishDelay`
and use `batchingMaxMessages` to achieve similar sized batches each time.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] Consumption unevenness arises in the consumption performance pressure test. [pulsar]

Reply via email to