tsturzl commented on issue #12224: URL: https://github.com/apache/pulsar/issues/12224#issuecomment-1608379774
@Jason918 I believe the author of this ticket was talking about Pulsar through their understanding of Kafka. I've been looking for something similar to this in Pulsar and have not been able to find it, nor find a work around. Pulsar key shared subscriptions work similarly to Kafka, where as Kafka balances assignment of partitions to consumers Pulsar key shared subscriptions just divide the hash key space among all consumers which are consuming the same key shared subscriptions. The key difference here is that Pulsar is not assigning partitions to consumers, but rather assigning them a key space for which they'll received the messages for. In this context "rebalancing" might be the wrong word, but ultimately what it being talked about is letting consumers know when their hash ranges have changed. This is common in distributed data processing work flows where the data you see come in informs you to consume other topics and combine the latest from each source. So I might have a key share subscription to feed out work to a pool of consumers, and based on the data they see they might consumer other subscriptions to retrieve dependent data. If another consumer joins it will subdivide the key space on another consumers therefore the other consumer needs to know to stop consuming these other topics, as they are no longer needed and the key shared subscription will no longer send them. In Kafka you handle this issue by notifying the consumer that a rebalance occurred. That means that the client can evaluate what it should stop consuming. `ConsumerEventListener` might be a good place to expand into for providing this feature, but as it stand the functionality of this event listener does not provide similar functionality to the mentioned `ConsumerRebalanceListener` in Kafka. Pulsar may not do partition reassignment, but Pulsar doesn't provide similar functionality to handle changes in a key shared consumer's keyspace, making it difficult to effectively implement combined latest processing strategies, and even simple things like invalidating local caches which might cache based on previously seen data from that key shared subscriptions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
