[ https://issues.apache.org/jira/browse/KAFKA-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias J. Sax reopened KAFKA-8812: ------------------------------------ No need to close the ticket, as long as the KIP was not declined :) > Rebalance Producers - yes, I mean it ;-) > ---------------------------------------- > > Key: KAFKA-8812 > URL: https://issues.apache.org/jira/browse/KAFKA-8812 > Project: Kafka > Issue Type: Improvement > Components: core > Affects Versions: 2.3.0 > Reporter: Werner Daehn > Priority: Major > > Please bare with me. Initially this thought sounds stupid but it has its > merits. > > How do you build a distributed producer at the moment? You use Kafka Connect > which in turn requires a cluster that tells which instance is producing what > partitions. > On the consumer side it is different. There Kafka itself does the data > distribution. If you have 10 Kafka partitions and 10 consumers, each will get > data for one partition. With 5 consumers, each will get data from two > partitions. And if there is only a single consumer active, it gets all data. > All is managed by Kafka, all you have to do is start as many consumers as you > want. > > I'd like to suggest something similar for the producers. A producer would > tell Kafka that its source has 10 partitions. The Kafka server then responds > with a list of partitions this instance shall be responsible for. If it is > the only producer, the response would be all 10 partitions. If it is the > second instance starting up, the first instance would get the information it > should produce data for partition 1-5 and the new one for partition 6-10. If > the producer fails to respond with an alive packet, a rebalance does happen, > informing the active producer to take more load and the dead producer will > get an error when sending data again. > For restart, the producer rebalance has to send the starting point where to > start producing the data onwards from as well, of course. Would be best if > this is a user generated pointer and not the topic offset. Then it can be > e.g. the database system change number, a database transaction id or > something similar. > -- This message was sent by Atlassian Jira (v8.3.2#803003)