szkoludasebastian commented on issue #23908:
URL: https://github.com/apache/pulsar/issues/23908#issuecomment-2710803382
> Thank you for reporting this bug. Seems like there are multiple things to
check to isolate the problem. Possible Theories:
>
> * consumers failed to reconnect (can we check the ownership of the topics
by `pulsar-admin topics lookup {topic-partition-name} ` Also, do we see any
errors on the consumer side?)
> * brokers failed to dispatch( can we take a heapdump from the owner broker
above)? by `PID=$(ps aux | grep java | grep -v grep | awk '{print $2}'); jmap
-dump:live,format=b,file=/tmp/${HOSTNAME}.hprof $PID ; tar -zcvf
/tmp/${HOSTNAME}.tar.gz /tmp/${HOSTNAME}.hprof`) Also, can you share the broker
logs during this time? We can turn on the ExtensibleLoadBalancer debug logs by
`pulsar-admin brokers update-dynamic-config --config
loadBalancerDebugModeEnabled --value true`
> * I am speculating `topic.isTransferring` flag might have a bug and might
be blocking the message dispatcher logic after the topic transfer. We can
confirm this by heapdump. Also, can you try to unload the namespace and see if
that can mitigate the issue, by `pulsar-admin namespaces unload
{tenant/namespace}`
>
> Meanwhile, I am trying to reproduce this issue on my end.
I tried to reproduce it one more time to get all things which you asked. I
was able to reproduce issue and below are all details which you asked:
* ownership of topic:
pulsar://integ-pulsar-broker-3.integ-pulsar-broker.str-integ.svc.cluster.local:6650
(no errors on consumer side)
* whole heapdump has 709MB. In archive it has 130MB, so I'm not able to
share it here
* Here are broker-3 logs:
[broker-3-logs.zip](https://github.com/user-attachments/files/19164564/broker-3-logs.zip)
* Here are stats-internal:
```json
{
"entriesAddedCounter" : 2590,
"numberOfEntries" : 3304,
"totalSize" : 34704530,
"currentLedgerEntries" : 1197,
"currentLedgerSize" : 12173322,
"lastLedgerCreatedTimestamp" : "2025-03-10T14:20:11.296Z",
"lastLedgerCreationFailureTimestamp" : "2025-03-10T14:19:58.081Z",
"waitingCursorsCount" : 0,
"pendingAddEntriesCount" : 0,
"lastConfirmedEntry" : "5454987:1196",
"state" : "LedgerOpened",
"ledgers" : [ {
"ledgerId" : 5437330,
"entries" : 714,
"size" : 6684721,
"offloaded" : false,
"underReplicated" : false
}, {
"ledgerId" : 5438003,
"entries" : 1030,
"size" : 12127502,
"offloaded" : false,
"underReplicated" : false
}, {
"ledgerId" : 5446096,
"entries" : 284,
"size" : 2982059,
"offloaded" : false,
"underReplicated" : false
}, {
"ledgerId" : 5452432,
"entries" : 79,
"size" : 736926,
"offloaded" : false,
"underReplicated" : false
}, {
"ledgerId" : 5454987,
"entries" : 0,
"size" : 0,
"offloaded" : false,
"underReplicated" : false
} ],
"cursors" : {
"microbatcher" : {
"markDeletePosition" : "5437330:708",
"readPosition" : "5454987:1197",
"waitingReadOp" : true,
"pendingReadOps" : 0,
"messagesConsumedCounter" : 2582,
"cursorLedger" : 5446387,
"cursorLedgerLastEntry" : 115,
"individuallyDeletedMessages" :
"[(5438003:-1..5438003:1029],(5446096:-1..5446096:283],(5452432:-1..5452432:78],(5454987:-1..5454987:1193]]",
"lastLedgerSwitchTimestamp" : "2025-03-10T14:17:45.368Z",
"state" : "Open",
"active" : true,
"numberOfEntriesSinceFirstNotAckedMessage" : 2596,
"totalNonContiguousDeletedMessagesRange" : 4,
"subscriptionHavePendingRead" : true,
"subscriptionHavePendingReplayRead" : false,
"properties" : { }
}
},
"schemaLedgers" : [ ],
"compactedLedger" : {
"ledgerId" : -1,
"entries" : -1,
"size" : -1,
"offloaded" : false,
"underReplicated" : false
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]