Hi Jay, Thanks for comments. Please see inline responses.
Jiangjie (Becket) Qin On 1/21/15, 1:33 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote: >Hey guys, > >A couple questions/comments: > >1. The callback and user-controlled commit offset functionality is already >in the new consumer which we are working on in parallel. If we accelerated >that work it might help concentrate efforts. I admit this might take >slightly longer in calendar time but could still probably get done this >quarter. Have you guys considered that approach? Yes, I totally agree that ideally we should put efforts on new consumer. The main reason for still working on the old consumer is that we expect it would still be used in LinkedIn for quite a while before the new consumer could be fully rolled out. And we recently suffering a lot from mirror maker data loss issue. So our current plan is making necessary changes to make current mirror maker stable in production. Then we can test and rollout new consumer gradually without getting burnt. > >2. I think partitioning on the hash of the topic partition is not a very >good idea because that will make the case of going from a cluster with >fewer partitions to one with more partitions not work. I think an >intuitive >way to do this would be the following: >a. Default behavior: Just do what the producer does. I.e. if you specify a >key use it for partitioning, if not just partition in a round-robin >fashion. >b. Add a --preserve-partition option that will explicitly inherent the >partition from the source irrespective of whether there is a key or which >partition that key would hash to. Sorry that I did not explain this clear enough. The hash of topic partition is only used when decide which mirror maker data channel queue the consumer thread should put message into. It only tries to make sure the messages from the same partition is sent by the same producer thread to guarantee the sending order. This is not at all related to which partition in target cluster the messages end up. That is still decided by producer. > >3. You don't actually give the ConsumerRebalanceListener interface. What >is >that going to look like? Good point! I should have put it in the wiki. I just added it. > >4. What is MirrorMakerRecord? I think ideally the >MirrorMakerMessageHandler >interface would take a ConsumerRecord as input and return a >ProducerRecord, >right? That would allow you to transform the key, value, partition, or >destination topic... MirrorMakerRecord is introduced in KAFKA-1650, which is exactly the same as ConsumerRecord in KAFKA-1760. private[kafka] class MirrorMakerRecord (val sourceTopic: String, val sourcePartition: Int, val sourceOffset: Long, val key: Array[Byte], val value: Array[Byte]) { def size = value.length + {if (key == null) 0 else key.length} } However, because source partition and offset is needed in producer thread for consumer offsets bookkeeping, the record returned by MirrorMakerMessageHandler needs to contain those information. Therefore ProducerRecord does not work here. We could probably let message handler take ConsumerRecord for both input and output. > >5. Have you guys thought about what the implementation will look like in >terms of threading architecture etc with the new consumer? That will be >soon so even if we aren't starting with that let's make sure we can get >rid >of a lot of the current mirror maker accidental complexity in terms of >threads and queues when we move to that. I haven¹t thought about it throughly. The quick idea is after migration to the new consumer, it is probably better to use a single consumer thread. If multithread is needed, decoupling consumption and processing might be used. MirrorMaker definitely needs to be changed after new consumer get checked in. I¹ll document the changes and can submit follow up patches after the new consumer is available. > >-Jay > >On Tue, Jan 20, 2015 at 4:31 PM, Jiangjie Qin <j...@linkedin.com.invalid> >wrote: > >> Hi Kafka Devs, >> >> We are working on Kafka Mirror Maker enhancement. A KIP is posted to >> document and discuss on the followings: >> 1. KAFKA-1650: No Data loss mirror maker change >> 2. KAFKA-1839: To allow partition aware mirror. >> 3. KAFKA-1840: To allow message filtering/format conversion >> Feedbacks are welcome. Please let us know if you have any questions or >> concerns. >> >> Thanks. >> >> Jiangjie (Becket) Qin >>