GitHub user lhotari edited a comment on the discussion: Key_Shared consumer scale behavior with a partitioned topic
I'm currently working on a new design for Key_Shared AUTO_SPLIT mode which would cover the requirement of individual hash tracking. The goal is that a message will only get blocked when it is required for preserving ordering. #23231 improves the situation as well as the PIP-282 changes, but the goal isn't met. The remaining gap are the current "recently joined consumers" rules (which were significantly improved in PIP-282). For STICKY mode, I have created https://github.com/apache/pulsar/pull/23275. Currently the "recently joined customer" rules are applied for STICKY mode which is causing unnecessary blocking. For AUTO_SPLIT mode, there's an additional requirement that I'd like to cover, which is to have a way for the clients to detect when a specific consumer will no longer receive more messages for a particular message key after the assignments change in AUTO_SPLIT mode. There's a related issue https://github.com/apache/pulsar/issues/6555 about it, but it doesn't currently capture the actual requirement. The use case is about being able to keep some client-side state for a particular message key and discard the state when the message key no longer will appear in the stream for a particular consumer. When the message key appears the first time, the client might build some entity and add it to a cache. When the message key is no longer assigned to the consumer, the client would discard it from the cache. No other consumer should be receiving a newer message with the same key until the client has acknowledged the last message with the message key. To handle the ordering constraints and cover the requirement of notifying the clients about message key assignment, I'm working on a design that would cover this together with covering the goal of addressing the currently unoptimal "recently joined consumers" rules which could block message delivery when it's not necessary. The exact details are still unknown, but I'm slowly making progress towards some concepts that would help work on the initial design. One of the concepts that I'm thinking of is a "hash assignment generation," which is the assignment of consumers to the hashes at a particular point in time. When new consumers join and leave, a new generation would be formed. When the dispatcher sends messages to consumers, it could attach the hash of the message key and the hash assignment generation number to each message. In addition to this, each consumer would receive its own hash assignment information and the generation number of that assignment in a special marker message that is injected by the dispatcher into the messages that are delivered to the consumer. A client library could use this information, for example, to flush client-side caches when it no longer continues to receive future messages for a particular key. The tracking of the key would be at the level of the hash for that key. When the consumer receives the hash information together with the message, there's no longer a need to recalculate the hash on the client side. This would allow also changing the hash algorithm on the broker side without a dependency in clients to know which hash algorithm was used. I'll follow up with discussions on the dev mailing list which hopefully lead to a Pulsar Improvement Proposal (PIP) that could be addressed in the Pulsar 4.0 timeline. GitHub link: https://github.com/apache/pulsar/discussions/22912#discussioncomment-10599466 ---- This is an automatically sent email for commits@pulsar.apache.org. To unsubscribe, please send an email to: commits-unsubscr...@pulsar.apache.org