Thanks for bringing up a real problem and driving the work to solve this issue.
I'd suggest analyzing 3 alternative designs before deciding on the solution. Alternative 1: I'd suggest looking into an alternative design that achieves the same outcome of allowing the subscription cursor to advance. Instead of making copies of the messages, an alternative design would be to create another subscription to track the slow or hot keys. Essentially, the design could be very similar to diverting to the overflow managed ledger, but there wouldn't be a need to duplicate the data and get into a situation where different failure modes cause unnecessary complications. Alternative 2: Simply optimize the replay queue solution together with improving the scalability of individualDeletedMessages so that it scales to 1,000,000 ack holes and beyond. This would result in the simplest solution, which would cover most use cases. There are multiple benefits to keeping the solution simple. For example, backlog management doesn't change. Together with the PIP-430 broker cache (since 4.1.0), the replay queue solution already avoids most unnecessary BK reads when the broker cache is sufficiently tuned for high-scale use cases. The PIP-430 broker cache could be improved further to achieve high cache hit rates if it turns out to be a problem. Alternative 3: The client-side code could simply route to a separate topic on its own when it detects a hot key and acknowledge the original message. Regarding Alternative 2, I believe that individualDeletedMessages can already scale to 1,000,000 ack holes and beyond when the broker is properly configured. It could be tested with this type of configuration: managedLedgerMaxUnackedRangesToPersist=1000000 managedLedgerMaxBatchDeletedIndexToPersist=1000000 managedLedgerPersistIndividualAckAsLongArray=true managedCursorInfoCompressionType=LZ4 managedLedgerInfoCompressionType=LZ4 (The last config is unrelated, but it makes sense to also switch to using compression.) I hope you could also analyze these alternatives before we proceed with making the decision on solving the hot (or slow) key problem. Thank you for focusing on solving this problem! -Lari On 2026/05/07 05:18:35 xiangying meng wrote: > Hi all, > > I'd like to propose PIP-474: Key_Shared Hot Key Overflow Mechanism. > > Key_Shared is Pulsar's only built-in solution for parallel consumption > with per-key ordering. But it has a critical production issue: a > single stuck consumer can starve ALL other keys across ALL partitions > within minutes, due to the containsStickyKeyHash ordering check > flooding the Replay queue. > > This becomes especially urgent as AI inference workloads adopt MQ as > their transport layer — slow consumption (seconds per request) plus > strict per-key ordering is exactly what Key_Shared is designed for, > yet the hot-key starvation bug makes it unusable in production. > > PIP-474 proposes diverting hot-key messages to an independent Overflow > ManagedLedger, unblocking Normal Read and mark-delete advancement > while preserving at-least-once delivery and per-key ordering. Zero > overhead when no hot keys are present. > > PIP: https://github.com/apache/pulsar/pull/25706 > > Feedback welcome. > > Thanks, Xiangying Meng >
