YanshuoH opened a new issue, #25028:
URL: https://github.com/apache/pulsar/issues/25028

   ### Search before reporting
   
   - [x] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Read release policy
   
   - [x] I understand that [unsupported 
versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions)
 don't get bug fixes. I will attempt to reproduce the issue on a supported 
version of Pulsar client and Pulsar broker.
   
   
   ### User environment
   
   - broker version: 4.0.8
   - broker os: Linux pulsar-broker-1a-0 6.12.40-64.114.amzn2023.aarch64 #1 SMP 
Tue Aug 26 05:25:54 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
   - java: openjdk version "17.0.12" 2024-07-16
   - client: golang
   - client version: 0.17.0
   - client os: same as broker
   - client java version: NaN
   
   ### Issue Description
   
   One of our scenario is check user's payment with variadic delay, from 10s to 
1h indifferent.
   My observation is that when the individuallyDeletedMessages becomes quite 
big (100,000+, and the setting `managedLedgerMaxUnackedRangesToPersist` is 
100,000 too), dispatching of messages become strange. The message dispatch is 
very slow and most messages don't get dispatched.
   Checking the internal-stats, I can see something as such:
   ```
         "numberOfEntriesSinceFirstNotAckedMessage": 751170,
         "totalNonContiguousDeletedMessagesRange": 105911,
   ```
   
   No more error message on both client and server side.
   
   I see there's a similar issue https://github.com/apache/pulsar/issues/23200, 
yet we're using Shared subscription type.
   
   ### Error messages
   
   ```text
   The suspicious message I got is:
   client side tries to reconnect to the broker with:
   
   INFO[0960] Connecting to broker                          
remote_addr="pulsar://pulsar-broker.pulsar1.svc.cluster.local:6650"
   INFO[0960] TCP connection established                    
local_addr="10.120.147.140:56018" 
remote_addr="pulsar://pulsar-broker.pulsar1.svc.cluster.local:6650"
   INFO[0960] Connection is ready                           
local_addr="10.120.147.140:56018" 
remote_addr="pulsar://pulsar-broker.pulsar1.svc.cluster.local:6650"
   
   
   And the server has a shedding performed.
   
   Since it is very costy to have the DEBUG level log turned on, I didn't have 
the chance to catch debug level messages.
   ```
   
   ### Reproducing the issue
   
   I've written two parts that can reproduce such issue.
   Producer that would delivery messages with variadic delay (from 10s to 1h).
   Consumer that would receive messages.
   
   Wait for the message cumulate until the expected number, the consumer hangs 
with very little message received.
   
   ### Additional information
   
   It might relates to the setting of `managedLedgerMaxUnackedRangesToPersist` 
but for our usage type, it is not possible to increase this setting infinitely 
because the message would grow.
   Also I've notice that when the `individuallyDeletedMessages` is quite big, 
every time a consumer reconnect to the broker would cause both broker and 
zookeeper to have a peak CPU usage, I assume it is because pulsar was trying to 
compute the actual messages that shall be dispatched.
   I wonder if there's a way to optimize such issue or a way to tune it ? Or 
this is not the correct way of using pulsar ?
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to