QQ: If we ever use a different technique for the data channel selection
than for the producer partitioning won't that break ordering? How can we
ensure these things stay in sync?

With respect to the new consumer--I really do want to encourage people to
think through how MM will work with the new consumer. I mean this isn't
very far off, maybe a few months if we hustle? I could imagine us getting
this mm fix done maybe sooner, maybe in a month? So I guess this buys us an
extra month before we rip it out and throw it away? Maybe two? This bug has
been there for a while, though, right? Is it worth it? Probably it is, but
it still kind of sucks to have the duplicate effort.

So anyhow let's definitely think about how things will work with the new
consumer. I think we can probably just have N threads, each thread has a
producer and consumer and is internally single threaded. Any reason this
wouldn't work?

-Jay


On Wed, Jan 21, 2015 at 5:29 PM, Jiangjie Qin <j...@linkedin.com.invalid>
wrote:

> Hi Jay,
>
> Thanks for comments. Please see inline responses.
>
> Jiangjie (Becket) Qin
>
> On 1/21/15, 1:33 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote:
>
> >Hey guys,
> >
> >A couple questions/comments:
> >
> >1. The callback and user-controlled commit offset functionality is already
> >in the new consumer which we are working on in parallel. If we accelerated
> >that work it might help concentrate efforts. I admit this might take
> >slightly longer in calendar time but could still probably get done this
> >quarter. Have you guys considered that approach?
> Yes, I totally agree that ideally we should put efforts on new consumer.
> The main reason for still working on the old consumer is that we expect it
> would still be used in LinkedIn for quite a while before the new consumer
> could be fully rolled out. And we recently suffering a lot from mirror
> maker data loss issue. So our current plan is making necessary changes to
> make current mirror maker stable in production. Then we can test and
> rollout new consumer gradually without getting burnt.
> >
> >2. I think partitioning on the hash of the topic partition is not a very
> >good idea because that will make the case of going from a cluster with
> >fewer partitions to one with more partitions not work. I think an
> >intuitive
> >way to do this would be the following:
> >a. Default behavior: Just do what the producer does. I.e. if you specify a
> >key use it for partitioning, if not just partition in a round-robin
> >fashion.
> >b. Add a --preserve-partition option that will explicitly inherent the
> >partition from the source irrespective of whether there is a key or which
> >partition that key would hash to.
> Sorry that I did not explain this clear enough. The hash of topic
> partition is only used when decide which mirror maker data channel queue
> the consumer thread should put message into. It only tries to make sure
> the messages from the same partition is sent by the same producer thread
> to guarantee the sending order. This is not at all related to which
> partition in target cluster the messages end up. That is still decided by
> producer.
> >
> >3. You don't actually give the ConsumerRebalanceListener interface. What
> >is
> >that going to look like?
> Good point! I should have put it in the wiki. I just added it.
> >
> >4. What is MirrorMakerRecord? I think ideally the
> >MirrorMakerMessageHandler
> >interface would take a ConsumerRecord as input and return a
> >ProducerRecord,
> >right? That would allow you to transform the key, value, partition, or
> >destination topic...
> MirrorMakerRecord is introduced in KAFKA-1650, which is exactly the same
> as ConsumerRecord in KAFKA-1760.
> private[kafka] class MirrorMakerRecord (val sourceTopic: String,
>   val sourcePartition: Int,
>   val sourceOffset: Long,
>   val key: Array[Byte],
>   val value: Array[Byte]) {
>   def size = value.length + {if (key == null) 0 else key.length}
> }
>
> However, because source partition and offset is needed in producer thread
> for consumer offsets bookkeeping, the record returned by
> MirrorMakerMessageHandler needs to contain those information. Therefore
> ProducerRecord does not work here. We could probably let message handler
> take ConsumerRecord for both input and output.
> >
> >5. Have you guys thought about what the implementation will look like in
> >terms of threading architecture etc with the new consumer? That will be
> >soon so even if we aren't starting with that let's make sure we can get
> >rid
> >of a lot of the current mirror maker accidental complexity in terms of
> >threads and queues when we move to that.
> I haven¹t thought about it throughly. The quick idea is after migration to
> the new consumer, it is probably better to use a single consumer thread.
> If multithread is needed, decoupling consumption and processing might be
> used. MirrorMaker definitely needs to be changed after new consumer get
> checked in. I¹ll document the changes and can submit follow up patches
> after the new consumer is available.
> >
> >-Jay
> >
> >On Tue, Jan 20, 2015 at 4:31 PM, Jiangjie Qin <j...@linkedin.com.invalid>
> >wrote:
> >
> >> Hi Kafka Devs,
> >>
> >> We are working on Kafka Mirror Maker enhancement. A KIP is posted to
> >> document and discuss on the followings:
> >> 1. KAFKA-1650: No Data loss mirror maker change
> >> 2. KAFKA-1839: To allow partition aware mirror.
> >> 3. KAFKA-1840: To allow message filtering/format conversion
> >> Feedbacks are welcome. Please let us know if you have any questions or
> >> concerns.
> >>
> >> Thanks.
> >>
> >> Jiangjie (Becket) Qin
> >>
>
>

Reply via email to