Re: [KIP-DISCUSSION] Mirror Maker Enhancement

Jiangjie Qin Tue, 24 Feb 2015 14:54:56 -0800

I updated the KIP page based on the discussion we had.

Should I launch another vote or we can think of this mail thread has
already included a vote?


Jiangjie (Becket) Qin

On 2/11/15, 5:15 PM, "Neha Narkhede" <[email protected]> wrote:

>Thanks for the explanation, Joel! Would love to see the results of the
>throughput experiment and I'm a +1 on everything else, ncluding the
>rebalance callback and record handler.
>
>-Neha
>
>On Wed, Feb 11, 2015 at 1:13 PM, Jay Kreps <[email protected]> wrote:
>
>> Cool, I agree with all that.
>>
>> I agree about the need for a rebalancing callback.
>>
>> Totally agree about record handler.
>>
>> It would be great to see if a prototype of this is workable.
>>
>> Thanks guys!
>>
>> -Jay
>>
>> On Wed, Feb 11, 2015 at 12:36 PM, Joel Koshy <[email protected]>
>>wrote:
>>
>> > Hey Jay,
>> >
>> > Guozhang, Becket and I got together to discuss this and we think:
>> >
>> > - It seems that your proposal based on the new consumer and flush call
>> >   should work.
>> > - We would likely need to call the poll with a timeout that matches
>> >   the offset commit interval in order to deal with low volume
>> >   mirroring pipelines.
>> > - We will still need a rebalance callback to reduce duplicates - the
>> >   rebalance callback would need to flush and commit offsets.
>> > - The only remaining question is if the overall throughput is
>> >   sufficient. I think someone at LinkedIn (I don't remember who) did
>> >   some experiments with data channel size == 1 and ran into issues.
>> >   That was not thoroughly investigated though.
>> > - The addition of flush may actually make this solution viable for the
>> >   current mirror-maker (with the old consumer). We can prototype that
>> >   offline and if it works out well we can redo KAFKA-1650 (i.e.,
>> >   refactor the current mirror maker). The flush call and the new
>> >   consumer didn't exist at the time we did KAFKA-1650 so this did not
>> >   occur to us.
>> > - We think the RecordHandler is still a useful small addition for the
>> >   use-cases mentioned earlier in this thread.
>> >
>> > Thanks,
>> >
>> > Joel
>> >
>> > On Wed, Feb 11, 2015 at 09:05:39AM -0800, Jay Kreps wrote:
>> > > Guozhang, I agree with 1-3, I do think what I was proposing was
>>simpler
>> > but
>> > > perhaps there are gaps in that?
>> > >
>> > > Hey Joel--Here was a sketch of what I was proposing. I do think this
>> > get's
>> > > rid of manual offset tracking, especially doing so across threads
>>with
>> > > dedicated commit threads, which I think is pretty complex.
>> > >
>> > > while(true) {
>> > >     val recs = consumer.poll(Long.MaxValue);
>> > >     for (rec <- recs)
>> > >         producer.send(rec, logErrorCallback)
>> > >     if(System.currentTimeMillis - lastCommit > commitInterval) {
>> > >         producer.flush()
>> > >         consumer.commit()
>> > >         lastCommit = System.currentTimeMillis
>> > >     }
>> > > }
>> > >
>> > > (See the previous email for details). I think the question is: is
>>there
>> > any
>> > > reason--performance, correctness, etc--that this won't work?
>>Basically
>> I
>> > > think you guys have thought about this more so I may be missing
>> > something.
>> > > If so let's flag it while we still have leeway on the consumer.
>> > >
>> > > If we think that will work, well I do think it is conceptually a lot
>> > > simpler than the current code, though I suppose one could disagree
>>on
>> > that.
>> > >
>> > > -Jay
>> > >
>> > > On Wed, Feb 11, 2015 at 5:53 AM, Joel Koshy <[email protected]>
>> wrote:
>> > >
>> > > > Hi Jay,
>> > > >
>> > > > > The data channels are actually a big part of the complexity of
>>the
>> > zero
>> > > > > data loss design, though, right? Because then you need some
>>reverse
>> > > > channel
>> > > > > to flow the acks back to the consumer based on where you are
>>versus
>> > just
>> > > > > acking what you have read and written (as in the code snippet I
>>put
>> > up).
>> > > >
>> > > > I'm not sure if we are on the same page. Even if the data channel
>>was
>> > > > not there the current handling for zero data loss would remain
>>very
>> > > > similar - you would need to maintain lists of unacked source
>>offsets.
>> > > > I'm wondering if the KIP needs more detail on how it is currently
>> > > > implemented; or are suggesting a different approach (in which
>>case I
>> > > > have not fully understood). I'm not sure what you mean by flowing
>> acks
>> > > > back to the consumer - the MM commits offsets after the producer
>>ack
>> > > > has been received. There is some additional complexity introduced
>>in
>> > > > reducing duplicates on a rebalance - this is actually optional
>>(since
>> > > > duplicates are currently a given). The reason that was done
>>anyway is
>> > > > that with the auto-commit turned off duplicates are almost
>>guaranteed
>> > > > on a rebalance.
>> > > >
>> > > > > I think the point that Neha and I were trying to make was that
>>the
>> > > > > motivation to embed stuff into MM kind of is related to how
>> complex a
>> > > > > simple "consume and produce" with good throughput will be. If
>>it is
>> > > > simple
>> > > > > to write such a thing in a few lines, the pain of embedding a
>>bunch
>> > of
>> > > > > stuff won't be worth it, if it has to be as complex as the
>>current
>> mm
>> > > > then
>> > > > > of course we will need all kinds of plug ins because no one
>>will be
>> > able
>> > > > to
>> > > > > write such a thing. I don't have a huge concern with a simple
>> plug-in
>> > > > but I
>> > > > > think if it turns into something more complex with filtering and
>> > > > > aggregation or whatever we really need to stop and think a bit
>> about
>> > the
>> > > > > design.
>> > > >
>> > > > I agree - I don't think there is a use-case for any complex
>>plug-in.
>> > > > It is pretty much what Becket has described currently for the
>>message
>> > > > handler - i.e., take an incoming record and return a list of
>>outgoing
>> > > > records (which could be empty if you filter).
>> > > >
>> > > > So here is my take on the MM:
>> > > > - Bare bones: simple consumer - producer pairs (0.7 style). This
>>is
>> > > >   ideal, but does not handle no data loss
>> > > > - Above plus support no data loss. This actually adds quite a bit
>>of
>> > > >   complexity.
>> > > > - Above plus the message handler. This is a trivial addition I
>>think
>> > > >   that makes the MM usable in a few other mirroring-like
>> applications.
>> > > >
>> > > > Joel
>> > > >
>> > > > > On Tue, Feb 10, 2015 at 12:31 PM, Joel Koshy
>><[email protected]>
>> > > > wrote:
>> > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Feb 10, 2015 at 12:13:46PM -0800, Neha Narkhede wrote:
>> > > > > > > I think all of us agree that we want to design MirrorMaker
>>for
>> 0
>> > data
>> > > > > > loss.
>> > > > > > > With the absence of the data channel, 0 data loss will be
>>much
>> > > > simpler to
>> > > > > > > implement.
>> > > > > >
>> > > > > > The data channel is irrelevant to the implementation of zero
>>data
>> > > > > > loss. The complexity in the implementation of no data loss
>>that
>> you
>> > > > > > are seeing in mirror-maker affects all consume-then-produce
>> > patterns
>> > > > > > whether or not there is a data channel.  You still need to
>> > maintain a
>> > > > > > list of unacked offsets. What I meant earlier is that we can
>> > > > > > brainstorm completely different approaches to supporting no
>>data
>> > loss,
>> > > > > > but the current implementation is the only solution we are
>>aware
>> > of.
>> > > > > >
>> > > > > > >
>> > > > > > > My arguments for adding a message handler are that:
>> > > > > > > > 1. It is more efficient to do something in common for all
>>the
>> > > > clients
>> > > > > > in
>> > > > > > > > pipeline than letting each client do the same thing for
>>many
>> > > > times. And
>> > > > > > > > there are concrete use cases for the message handler
>>already.
>> > > > > > > >
>> > > > > > >
>> > > > > > > What are the concrete use cases?
>> > > > > >
>> > > > > > I think Becket already described a couple of use cases
>>earlier in
>> > the
>> > > > > > thread.
>> > > > > >
>> > > > > > <quote>
>> > > > > >
>> > > > > > 1. Format conversion. We have a use case where clients of
>>source
>> > > > > > cluster
>> > > > > > use an internal schema and clients of target cluster use a
>> > different
>> > > > > > public schema.
>> > > > > > 2. Message filtering: For the messages published to source
>> cluster,
>> > > > > > there
>> > > > > > are some messages private to source cluster clients and should
>> not
>> > > > > > exposed
>> > > > > > to target cluster clients. It would be difficult to publish
>>those
>> > > > > > messages
>> > > > > > into different partitions because they need to be ordered.
>> > > > > > I agree that we can always filter/convert messages after they
>>are
>> > > > > > copied
>> > > > > > to the target cluster, but that costs network bandwidth
>> > unnecessarily,
>> > > > > > especially if that is a cross colo mirror. With the handler,
>>we
>> can
>> > > > > > co-locate the mirror maker with source cluster and save that
>> cost.
>> > > > > > Also,
>> > > > > > imagine there are many downstream consumers consuming from the
>> > target
>> > > > > > cluster, filtering/reformatting the messages before the
>>messages
>> > reach
>> > > > > > the
>> > > > > > target cluster is much more efficient than having each of the
>> > > > > > consumers do
>> > > > > > this individually on their own.
>> > > > > >
>> > > > > > </quote>
>> > > > > >
>> > > > > > >
>> > > > > > > Also the KIP still refers to the data channel in a few
>>places
>> > > > (Motivation
>> > > > > > > and "On consumer rebalance" sections). Can you update the
>>wiki
>> > so it
>> > > > is
>> > > > > > > easier to review the new design, especially the data loss
>>part.
>> > > > > > >
>> > > > > > >
>> > > > > > > On Tue, Feb 10, 2015 at 10:36 AM, Joel Koshy <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > I think the message handler adds little to no complexity
>>to
>> the
>> > > > mirror
>> > > > > > > > maker. Jay/Neha, the MM became scary due to the
>> rearchitecture
>> > we
>> > > > did
>> > > > > > > > for 0.8 due to performance issues compared with 0.7 - we
>> should
>> > > > remove
>> > > > > > > > the data channel if it can match the current throughput. I
>> > agree
>> > > > it is
>> > > > > > > > worth prototyping and testing that so the MM architecture
>>is
>> > > > > > > > simplified.
>> > > > > > > >
>> > > > > > > > The MM became a little scarier in KAFKA-1650 in order to
>> > support no
>> > > > > > > > data loss. I think the implementation for no data loss
>>will
>> > remain
>> > > > > > > > about the same even in the new model (even without the
>>data
>> > > > channel) -
>> > > > > > > > we can probably brainstorm more if there is a
>>better/simpler
>> > way
>> > > > to do
>> > > > > > > > it (maybe there is in the absence of the data channel)
>>but at
>> > the
>> > > > time
>> > > > > > > > it was the best we (i.e., Becket, myself, Jun and Guozhang
>> who
>> > > > > > > > participated on the review) could come up with.
>> > > > > > > >
>> > > > > > > > So I'm definitely +1 on whatever it takes to support no
>>data
>> > loss.
>> > > > I
>> > > > > > > > think most people would want that out of the box.
>> > > > > > > >
>> > > > > > > > As for the message handler, as Becket wrote and I agree
>>with,
>> > it is
>> > > > > > > > really a trivial addition that would benefit (perhaps not
>> most,
>> > > > but at
>> > > > > > > > least some). So I'm personally +1 on that as well. That
>>said,
>> > I'm
>> > > > also
>> > > > > > > > okay with it not being there. I think the MM is fairly
>> > stand-alone
>> > > > and
>> > > > > > > > simple enough that it is entirely reasonable and
>>absolutely
>> > > > feasible
>> > > > > > > > for companies to fork/re-implement the mirror maker for
>>their
>> > own
>> > > > > > > > needs.
>> > > > > > > >
>> > > > > > > > So in summary, I'm +1 on the KIP.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > > Joel
>> > > > > > > >
>> > > > > > > > On Mon, Feb 09, 2015 at 09:19:57PM +0000, Jiangjie Qin
>>wrote:
>> > > > > > > > > I just updated the KIP page and incorporated Jay and
>>Neha’s
>> > > > > > suggestion.
>> > > > > > > > As
>> > > > > > > > > a brief summary of where we are:
>> > > > > > > > >
>> > > > > > > > > Consensus reached:
>> > > > > > > > > Have N independent mirror maker threads each has their
>>own
>> > > > consumers
>> > > > > > but
>> > > > > > > > > share a producer. The mirror maker threads will be
>> > responsible
>> > > > for
>> > > > > > > > > decompression, compression and offset commit. No data
>> > channel and
>> > > > > > > > separate
>> > > > > > > > > offset commit thread is needed. Consumer rebalance
>>callback
>> > will
>> > > > be
>> > > > > > used
>> > > > > > > > > to avoid duplicates on rebalance.
>> > > > > > > > >
>> > > > > > > > > Still under discussion:
>> > > > > > > > > Whether message handler is needed.
>> > > > > > > > >
>> > > > > > > > > My arguments for adding a message handler are that:
>> > > > > > > > > 1. It is more efficient to do something in common for
>>all
>> the
>> > > > > > clients in
>> > > > > > > > > pipeline than letting each client do the same thing for
>> many
>> > > > times.
>> > > > > > And
>> > > > > > > > > there are concrete use cases for the message handler
>> already.
>> > > > > > > > > 2. It is not a big complicated add-on to mirror maker.
>> > > > > > > > > 3. Without a message handler, for customers needs it,
>>they
>> > have
>> > > > to
>> > > > > > > > > re-implement all the logics of mirror maker by
>>themselves
>> > just in
>> > > > > > order
>> > > > > > > > to
>> > > > > > > > > add this handling in pipeline.
>> > > > > > > > >
>> > > > > > > > > Any thoughts?
>> > > > > > > > >
>> > > > > > > > > Thanks.
>> > > > > > > > >
>> > > > > > > > > ―Jiangjie (Becket) Qin
>> > > > > > > > >
>> > > > > > > > > On 2/8/15, 6:35 PM, "Jiangjie Qin" <[email protected]>
>> > wrote:
>> > > > > > > > >
>> > > > > > > > > >Hi Jay, thanks a lot for the comments.
>> > > > > > > > > >I think this solution is better. We probably don’t need
>> data
>> > > > channel
>> > > > > > > > > >anymore. It can be replaced with a list of producer if
>>we
>> > need
>> > > > more
>> > > > > > > > sender
>> > > > > > > > > >thread.
>> > > > > > > > > >I’ll update the KIP page.
>> > > > > > > > > >
>> > > > > > > > > >The reasoning about message handler is mainly for
>> efficiency
>> > > > > > purpose.
>> > > > > > > > I’m
>> > > > > > > > > >thinking that if something can be done in pipeline for
>>all
>> > the
>> > > > > > clients
>> > > > > > > > > >such as filtering/reformatting, it is probably better
>>to
>> do
>> > it
>> > > > in
>> > > > > > the
>> > > > > > > > > >pipeline than asking 100 clients do the same thing for
>>100
>> > > > times.
>> > > > > > > > > >
>> > > > > > > > > >―Jiangjie (Becket) Qin
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >On 2/8/15, 4:59 PM, "Jay Kreps" <[email protected]>
>> > wrote:
>> > > > > > > > > >
>> > > > > > > > > >>Yeah, I second Neha's comments. The current mm code
>>has
>> > taken
>> > > > > > something
>> > > > > > > > > >>pretty simple and made it pretty scary with callbacks
>>and
>> > > > > > wait/notify
>> > > > > > > > > >>stuff. Do we believe this works? I can't tell by
>>looking
>> > at it
>> > > > > > which is
>> > > > > > > > > >>kind of bad for something important like this. I don't
>> mean
>> > > > this as
>> > > > > > > > > >>criticism, I know the history: we added in memory
>>queues
>> to
>> > > > help
>> > > > > > with
>> > > > > > > > > >>other
>> > > > > > > > > >>performance problems without thinking about
>>correctness,
>> > then
>> > > > we
>> > > > > > added
>> > > > > > > > > >>stuff to work around the in-memory queues not lose
>>data,
>> > and
>> > > > so on.
>> > > > > > > > > >>
>> > > > > > > > > >>Can we instead do the opposite exercise and start with
>> the
>> > > > basics
>> > > > > > of
>> > > > > > > > what
>> > > > > > > > > >>mm should do and think about what deficiencies
>>prevents
>> > this
>> > > > > > approach
>> > > > > > > > > >>from
>> > > > > > > > > >>working? Then let's make sure the currently in-flight
>> work
>> > will
>> > > > > > remove
>> > > > > > > > > >>these deficiencies. After all mm is kind of the
>> > prototypical
>> > > > kafka
>> > > > > > use
>> > > > > > > > > >>case
>> > > > > > > > > >>so if we can't make our clients to this probably no
>>one
>> > else
>> > > > can.
>> > > > > > > > > >>
>> > > > > > > > > >>I think mm should just be N independent threads each
>>of
>> > which
>> > > > has
>> > > > > > their
>> > > > > > > > > >>own
>> > > > > > > > > >>consumer but share a producer and each of which looks
>> like
>> > > > this:
>> > > > > > > > > >>
>> > > > > > > > > >>while(true) {
>> > > > > > > > > >>    val recs = consumer.poll(Long.MaxValue);
>> > > > > > > > > >>    for (rec <- recs)
>> > > > > > > > > >>        producer.send(rec, logErrorCallback)
>> > > > > > > > > >>    if(System.currentTimeMillis - lastCommit >
>> > commitInterval)
>> > > > {
>> > > > > > > > > >>        producer.flush()
>> > > > > > > > > >>        consumer.commit()
>> > > > > > > > > >>        lastCommit = System.currentTimeMillis
>> > > > > > > > > >>    }
>> > > > > > > > > >>}
>> > > > > > > > > >>
>> > > > > > > > > >>This will depend on setting the retry count in the
>> > producer to
>> > > > > > > > something
>> > > > > > > > > >>high with a largish backoff so that a failed send
>>attempt
>> > > > doesn't
>> > > > > > drop
>> > > > > > > > > >>data.
>> > > > > > > > > >>
>> > > > > > > > > >>We will need to use the callback to force a flush and
>> > offset
>> > > > > > commit on
>> > > > > > > > > >>rebalance.
>> > > > > > > > > >>
>> > > > > > > > > >>This approach may have a few more TCP connections due
>>to
>> > using
>> > > > > > multiple
>> > > > > > > > > >>consumers but I think it is a lot easier to reason
>>about
>> > and
>> > > > the
>> > > > > > total
>> > > > > > > > > >>number of mm instances is always going to be small.
>> > > > > > > > > >>
>> > > > > > > > > >>Let's talk about where this simple approach falls
>>short,
>> I
>> > > > think
>> > > > > > that
>> > > > > > > > > >>will
>> > > > > > > > > >>help us understand your motivations for additional
>> > elements.
>> > > > > > > > > >>
>> > > > > > > > > >>Another advantage of this is that it is so simple I
>>don't
>> > > > think we
>> > > > > > > > really
>> > > > > > > > > >>even need to both making mm extensible because writing
>> > your own
>> > > > > > code
>> > > > > > > > that
>> > > > > > > > > >>does custom processing or transformation is just ten
>> lines
>> > and
>> > > > no
>> > > > > > plug
>> > > > > > > > in
>> > > > > > > > > >>system is going to make it simpler.
>> > > > > > > > > >>
>> > > > > > > > > >>-Jay
>> > > > > > > > > >>
>> > > > > > > > > >>
>> > > > > > > > > >>On Sun, Feb 8, 2015 at 2:40 PM, Neha Narkhede <
>> > > > [email protected]>
>> > > > > > > > wrote:
>> > > > > > > > > >>
>> > > > > > > > > >>> Few comments -
>> > > > > > > > > >>>
>> > > > > > > > > >>> 1. Why do we need the message handler? Do you have
>> > concrete
>> > > > use
>> > > > > > cases
>> > > > > > > > > >>>in
>> > > > > > > > > >>> mind? If not, we should consider adding it in the
>> future
>> > > > when/if
>> > > > > > we
>> > > > > > > > do
>> > > > > > > > > >>>have
>> > > > > > > > > >>> use cases for it. The purpose of the mirror maker
>>is a
>> > simple
>> > > > > > tool
>> > > > > > > > for
>> > > > > > > > > >>> setting up Kafka cluster replicas. I don't see why
>>we
>> > need to
>> > > > > > > > include a
>> > > > > > > > > >>> message handler for doing stream transformations or
>> > > > filtering.
>> > > > > > You
>> > > > > > > > can
>> > > > > > > > > >>> always write a simple process for doing that once
>>the
>> > data is
>> > > > > > copied
>> > > > > > > > as
>> > > > > > > > > >>>is
>> > > > > > > > > >>> in the target cluster
>> > > > > > > > > >>> 2. Why keep both designs? We should prefer the
>>simpler
>> > design
>> > > > > > unless
>> > > > > > > > it
>> > > > > > > > > >>>is
>> > > > > > > > > >>> not feasible due to the performance issue that we
>> > previously
>> > > > > > had. Did
>> > > > > > > > > >>>you
>> > > > > > > > > >>> get a chance to run some tests to see if that is
>>really
>> > > > still a
>> > > > > > > > problem
>> > > > > > > > > >>>or
>> > > > > > > > > >>> not? It will be easier to think about the design and
>> also
>> > > > make
>> > > > > > the
>> > > > > > > > KIP
>> > > > > > > > > >>> complete if we make a call on the design first.
>> > > > > > > > > >>> 3. Can you explain the need for keeping a list of
>> unacked
>> > > > > > offsets per
>> > > > > > > > > >>> partition? Consider adding a section on retries and
>>how
>> > you
>> > > > plan
>> > > > > > to
>> > > > > > > > > >>>handle
>> > > > > > > > > >>> the case when the producer runs out of all retries.
>> > > > > > > > > >>>
>> > > > > > > > > >>> Thanks,
>> > > > > > > > > >>> Neha
>> > > > > > > > > >>>
>> > > > > > > > > >>> On Sun, Feb 8, 2015 at 2:06 PM, Jiangjie Qin
>> > > > > > > > > >>><[email protected]>
>> > > > > > > > > >>> wrote:
>> > > > > > > > > >>>
>> > > > > > > > > >>> > Hi Neha,
>> > > > > > > > > >>> >
>> > > > > > > > > >>> > Yes, I’ve updated the KIP so the entire KIP is
>>based
>> > on new
>> > > > > > > > consumer
>> > > > > > > > > >>>now.
>> > > > > > > > > >>> > I’ve put both designs with and without data
>>channel
>> in
>> > the
>> > > > KIP
>> > > > > > as I
>> > > > > > > > > >>>still
>> > > > > > > > > >>> > feel we might need the data channel to provide
>>more
>> > > > > > flexibility,
>> > > > > > > > > >>> > especially after message handler is introduced.
>>I’ve
>> > put my
>> > > > > > > > thinking
>> > > > > > > > > >>>of
>> > > > > > > > > >>> > the pros and cons of the two designs in the KIP as
>> > well.
>> > > > It’ll
>> > > > > > be
>> > > > > > > > > >>>great
>> > > > > > > > > >>> if
>> > > > > > > > > >>> > you can give a review and comment.
>> > > > > > > > > >>> >
>> > > > > > > > > >>> > Thanks.
>> > > > > > > > > >>> >
>> > > > > > > > > >>> > Jiangjie (Becket) Qin
>> > > > > > > > > >>> >
>> > > > > > > > > >>> > On 2/6/15, 7:30 PM, "Neha Narkhede" <
>> [email protected]
>> > >
>> > > > wrote:
>> > > > > > > > > >>> >
>> > > > > > > > > >>> > >Hey Becket,
>> > > > > > > > > >>> > >
>> > > > > > > > > >>> > >What are the next steps on this KIP. As per your
>> > comment
>> > > > > > earlier
>> > > > > > > > on
>> > > > > > > > > >>>the
>> > > > > > > > > >>> > >thread -
>> > > > > > > > > >>> > >
>> > > > > > > > > >>> > >I do agree it makes more sense
>> > > > > > > > > >>> > >> to avoid duplicate effort and plan based on new
>> > > > consumer.
>> > > > > > I’ll
>> > > > > > > > > >>>modify
>> > > > > > > > > >>> > >>the
>> > > > > > > > > >>> > >> KIP.
>> > > > > > > > > >>> > >
>> > > > > > > > > >>> > >
>> > > > > > > > > >>> > >Did you get a chance to think about the
>>simplified
>> > design
>> > > > > > that we
>> > > > > > > > > >>> proposed
>> > > > > > > > > >>> > >earlier? Do you plan to update the KIP with that
>> > proposal?
>> > > > > > > > > >>> > >
>> > > > > > > > > >>> > >Thanks,
>> > > > > > > > > >>> > >Neha
>> > > > > > > > > >>> > >
>> > > > > > > > > >>> > >On Wed, Feb 4, 2015 at 12:12 PM, Jiangjie Qin
>> > > > > > > > > >>><[email protected]
>> > > > > > > > > >>> >
>> > > > > > > > > >>> > >wrote:
>> > > > > > > > > >>> > >
>> > > > > > > > > >>> > >> In mirror maker we do not do de-serialization
>>on
>> the
>> > > > > > messages.
>> > > > > > > > > >>>Mirror
>> > > > > > > > > >>> > >> maker use source TopicPartition hash to chose a
>> > > > producer to
>> > > > > > send
>> > > > > > > > > >>> > >>messages
>> > > > > > > > > >>> > >> from the same source partition. The partition
>> those
>> > > > > > messages end
>> > > > > > > > > >>>up
>> > > > > > > > > >>> with
>> > > > > > > > > >>> > >> are decided by Partitioner class in
>>KafkaProducer
>> > > > (assuming
>> > > > > > you
>> > > > > > > > > >>>are
>> > > > > > > > > >>> > >>using
>> > > > > > > > > >>> > >> the new producer), which uses hash code of
>> bytes[].
>> > > > > > > > > >>> > >>
>> > > > > > > > > >>> > >> If deserialization is needed, it has to be
>>done in
>> > > > message
>> > > > > > > > > >>>handler.
>> > > > > > > > > >>> > >>
>> > > > > > > > > >>> > >> Thanks.
>> > > > > > > > > >>> > >>
>> > > > > > > > > >>> > >> Jiangjie (Becket) Qin
>> > > > > > > > > >>> > >>
>> > > > > > > > > >>> > >> On 2/4/15, 11:33 AM, "Bhavesh Mistry" <
>> > > > > > > > [email protected]>
>> > > > > > > > > >>> > >>wrote:
>> > > > > > > > > >>> > >>
>> > > > > > > > > >>> > >> >Hi Jiangjie,
>> > > > > > > > > >>> > >> >
>> > > > > > > > > >>> > >> >Thanks for entertaining my question so far.
>>Last
>> > > > > > question, I
>> > > > > > > > > >>>have is
>> > > > > > > > > >>> > >> >about
>> > > > > > > > > >>> > >> >serialization of message key.  If the key
>> > > > de-serialization
>> > > > > > > > > >>>(Class) is
>> > > > > > > > > >>> > >>not
>> > > > > > > > > >>> > >> >present at the MM instance, then does it use
>>raw
>> > byte
>> > > > > > hashcode
>> > > > > > > > to
>> > > > > > > > > >>> > >> >determine
>> > > > > > > > > >>> > >> >the partition ?  How are you going to address
>>the
>> > > > situation
>> > > > > > > > where
>> > > > > > > > > >>>key
>> > > > > > > > > >>> > >> >needs
>> > > > > > > > > >>> > >> >to be de-serialization and get actual hashcode
>> > needs
>> > > > to be
>> > > > > > > > > >>>computed
>> > > > > > > > > >>> ?.
>> > > > > > > > > >>> > >> >
>> > > > > > > > > >>> > >> >
>> > > > > > > > > >>> > >> >Thanks,
>> > > > > > > > > >>> > >> >
>> > > > > > > > > >>> > >> >Bhavesh
>> > > > > > > > > >>> > >> >
>> > > > > > > > > >>> > >> >On Fri, Jan 30, 2015 at 1:41 PM, Jiangjie Qin
>> > > > > > > > > >>> > >><[email protected]>
>> > > > > > > > > >>> > >> >wrote:
>> > > > > > > > > >>> > >> >
>> > > > > > > > > >>> > >> >> Hi Bhavesh,
>> > > > > > > > > >>> > >> >>
>> > > > > > > > > >>> > >> >> Please see inline comments.
>> > > > > > > > > >>> > >> >>
>> > > > > > > > > >>> > >> >> Jiangjie (Becket) Qin
>> > > > > > > > > >>> > >> >>
>> > > > > > > > > >>> > >> >> On 1/29/15, 7:00 PM, "Bhavesh Mistry"
>> > > > > > > > > >>><[email protected]>
>> > > > > > > > > >>> > >> >>wrote:
>> > > > > > > > > >>> > >> >>
>> > > > > > > > > >>> > >> >> >Hi Jiangjie,
>> > > > > > > > > >>> > >> >> >
>> > > > > > > > > >>> > >> >> >Thanks for the input.
>> > > > > > > > > >>> > >> >> >
>> > > > > > > > > >>> > >> >> >a) Is MM will  producer ack will be attach
>>to
>> > > > Producer
>> > > > > > > > > >>>Instance or
>> > > > > > > > > >>> > >>per
>> > > > > > > > > >>> > >> >> >topic.  Use case is that one instance of MM
>> > > > > > > > > >>> > >> >> >needs to handle both strong ack and also
>>ack=0
>> > for
>> > > > some
>> > > > > > > > topic.
>> > > > > > > > > >>> Or
>> > > > > > > > > >>> > >>it
>> > > > > > > > > >>> > >> >> >would
>> > > > > > > > > >>> > >> >> >be better to set-up another instance of MM.
>> > > > > > > > > >>> > >> >> The acks setting is producer level setting
>> > instead of
>> > > > > > topic
>> > > > > > > > > >>>level
>> > > > > > > > > >>> > >> >>setting.
>> > > > > > > > > >>> > >> >> In this case you probably need to set up
>> another
>> > > > > > instance.
>> > > > > > > > > >>> > >> >> >
>> > > > > > > > > >>> > >> >> >b) Regarding TCP connections, Why does
>> #producer
>> > > > > > instance
>> > > > > > > > > >>>attach
>> > > > > > > > > >>> to
>> > > > > > > > > >>> > >>TCP
>> > > > > > > > > >>> > >> >> >connection.  Is it possible to use Broker
>> > > > Connection TCP
>> > > > > > > > Pool,
>> > > > > > > > > >>> > >>producer
>> > > > > > > > > >>> > >> >> >will just checkout TCP connection  to
>>Broker.
>> > So,
>> > > > # of
>> > > > > > > > > >>>Producer
>> > > > > > > > > >>> > >> >>Instance
>> > > > > > > > > >>> > >> >> >does not correlation to Brokers Connection.
>> Is
>> > this
>> > > > > > > > possible
>> > > > > > > > > >>>?
>> > > > > > > > > >>> > >> >> In new producer, each producer maintains a
>> > > > connection to
>> > > > > > each
>> > > > > > > > > >>> broker
>> > > > > > > > > >>> > >> >> within the producer instance. Making
>>producer
>> > > > instances
>> > > > > > to
>> > > > > > > > > >>>share
>> > > > > > > > > >>> the
>> > > > > > > > > >>> > >>TCP
>> > > > > > > > > >>> > >> >> connections is a very big change to the
>>current
>> > > > design,
>> > > > > > so I
>> > > > > > > > > >>> suppose
>> > > > > > > > > >>> > >>we
>> > > > > > > > > >>> > >> >> won’t be able to do that.
>> > > > > > > > > >>> > >> >> >
>> > > > > > > > > >>> > >> >> >
>> > > > > > > > > >>> > >> >> >Thanks,
>> > > > > > > > > >>> > >> >> >
>> > > > > > > > > >>> > >> >> >Bhavesh
>> > > > > > > > > >>> > >> >> >
>> > > > > > > > > >>> > >> >> >On Thu, Jan 29, 2015 at 11:50 AM, Jiangjie
>>Qin
>> > > > > > > > > >>> > >> >><[email protected]
>> > > > > > > > > >>> > >> >> >
>> > > > > > > > > >>> > >> >> >wrote:
>> > > > > > > > > >>> > >> >> >
>> > > > > > > > > >>> > >> >> >> Hi Bhavesh,
>> > > > > > > > > >>> > >> >> >>
>> > > > > > > > > >>> > >> >> >> I think it is the right discussion to
>>have
>> > when
>> > > > we are
>> > > > > > > > > >>>talking
>> > > > > > > > > >>> > >>about
>> > > > > > > > > >>> > >> >>the
>> > > > > > > > > >>> > >> >> >> new new design for MM.
>> > > > > > > > > >>> > >> >> >> Please see the inline comments.
>> > > > > > > > > >>> > >> >> >>
>> > > > > > > > > >>> > >> >> >> Jiangjie (Becket) Qin
>> > > > > > > > > >>> > >> >> >>
>> > > > > > > > > >>> > >> >> >> On 1/28/15, 10:48 PM, "Bhavesh Mistry"
>> > > > > > > > > >>> > >><[email protected]>
>> > > > > > > > > >>> > >> >> >>wrote:
>> > > > > > > > > >>> > >> >> >>
>> > > > > > > > > >>> > >> >> >> >Hi Jiangjie,
>> > > > > > > > > >>> > >> >> >> >
>> > > > > > > > > >>> > >> >> >> >I just wanted to let you know about our
>>use
>> > case
>> > > > and
>> > > > > > > > stress
>> > > > > > > > > >>>the
>> > > > > > > > > >>> > >> >>point
>> > > > > > > > > >>> > >> >> >>that
>> > > > > > > > > >>> > >> >> >> >local data center broker cluster have
>>fewer
>> > > > > > partitions
>> > > > > > > > than
>> > > > > > > > > >>>the
>> > > > > > > > > >>> > >> >> >> >destination
>> > > > > > > > > >>> > >> >> >> >offline broker cluster. Just because we
>>do
>> > the
>> > > > batch
>> > > > > > pull
>> > > > > > > > > >>>from
>> > > > > > > > > >>> > >>CAMUS
>> > > > > > > > > >>> > >> >> >>and
>> > > > > > > > > >>> > >> >> >> >in
>> > > > > > > > > >>> > >> >> >> >order to drain data faster than the
>> injection
>> > > > rate
>> > > > > > (from
>> > > > > > > > > >>>four
>> > > > > > > > > >>> DCs
>> > > > > > > > > >>> > >> >>for
>> > > > > > > > > >>> > >> >> >>same
>> > > > > > > > > >>> > >> >> >> >topic).
>> > > > > > > > > >>> > >> >> >> Keeping the same partition number in
>>source
>> > and
>> > > > target
>> > > > > > > > > >>>cluster
>> > > > > > > > > >>> > >>will
>> > > > > > > > > >>> > >> >>be
>> > > > > > > > > >>> > >> >> >>an
>> > > > > > > > > >>> > >> >> >> option but will not be enforced by
>>default.
>> > > > > > > > > >>> > >> >> >> >
>> > > > > > > > > >>> > >> >> >> >We are facing following issues (probably
>> due
>> > to
>> > > > > > > > > >>>configuration):
>> > > > > > > > > >>> > >> >> >> >
>> > > > > > > > > >>> > >> >> >> >1)      We occasionally loose data due
>>to
>> > message
>> > > > > > batch
>> > > > > > > > > >>>size is
>> > > > > > > > > >>> > >>too
>> > > > > > > > > >>> > >> >> >>large
>> > > > > > > > > >>> > >> >> >> >(2MB) on target data (we are using old
>> > producer
>> > > > but I
>> > > > > > > > think
>> > > > > > > > > >>>new
>> > > > > > > > > >>> > >> >> >>producer
>> > > > > > > > > >>> > >> >> >> >will solve this problem to some extend).
>> > > > > > > > > >>> > >> >> >> We do see this issue in LinkedIn as well.
>> New
>> > > > producer
>> > > > > > > > also
>> > > > > > > > > >>> might
>> > > > > > > > > >>> > >> >>have
>> > > > > > > > > >>> > >> >> >> this issue. There are some proposal of
>> > solutions,
>> > > > but
>> > > > > > no
>> > > > > > > > > >>>real
>> > > > > > > > > >>> work
>> > > > > > > > > >>> > >> >> >>started
>> > > > > > > > > >>> > >> >> >> yet. For now, as a workaround, setting a
>> more
>> > > > > > aggressive
>> > > > > > > > > >>>batch
>> > > > > > > > > >>> > >>size
>> > > > > > > > > >>> > >> >>on
>> > > > > > > > > >>> > >> >> >> producer side should work.
>> > > > > > > > > >>> > >> >> >> >2)      Since only one instance is set 
>>to
>> MM
>> > > > data,
>> > > > > > we
>> > > > > > > > are
>> > > > > > > > > >>>not
>> > > > > > > > > >>> > >>able
>> > > > > > > > > >>> > >> >>to
>> > > > > > > > > >>> > >> >> >> >set-up ack per topic instead ack is
>> attached
>> > to
>> > > > > > producer
>> > > > > > > > > >>> > >>instance.
>> > > > > > > > > >>> > >> >> >> I don’t quite get the question here.
>> > > > > > > > > >>> > >> >> >> >3)      How are you going to address two
>> > phase
>> > > > commit
>> > > > > > > > > >>>problem
>> > > > > > > > > >>> if
>> > > > > > > > > >>> > >> >>ack is
>> > > > > > > > > >>> > >> >> >> >set
>> > > > > > > > > >>> > >> >> >> >to strongest, but auto commit is on for
>> > consumer
>> > > > > > (meaning
>> > > > > > > > > >>> > >>producer
>> > > > > > > > > >>> > >> >>does
>> > > > > > > > > >>> > >> >> >> >not
>> > > > > > > > > >>> > >> >> >> >get ack,  but consumer auto committed
>> offset
>> > that
>> > > > > > > > message).
>> > > > > > > > > >>> Is
>> > > > > > > > > >>> > >> >>there
>> > > > > > > > > >>> > >> >> >> >transactional (Kafka transaction is in
>> > process)
>> > > > > > based ack
>> > > > > > > > > >>>and
>> > > > > > > > > >>> > >>commit
>> > > > > > > > > >>> > >> >> >> >offset
>> > > > > > > > > >>> > >> >> >> >?
>> > > > > > > > > >>> > >> >> >> Auto offset commit should be turned off 
>>in
>> > this
>> > > > case.
>> > > > > > The
>> > > > > > > > > >>>offset
>> > > > > > > > > >>> > >>will
>> > > > > > > > > >>> > >> >> >>only
>> > > > > > > > > >>> > >> >> >> be committed once by the offset commit
>> > thread. So
>> > > > > > there is
>> > > > > > > > > >>>no
>> > > > > > > > > >>> two
>> > > > > > > > > >>> > >> >>phase
>> > > > > > > > > >>> > >> >> >> commit.
>> > > > > > > > > >>> > >> >> >> >4)      How are you planning to avoid
>> > duplicated
>> > > > > > message?
>> > > > > > > > > >>>( Is
>> > > > > > > > > >>> > >> >> >> >brokergoing
>> > > > > > > > > >>> > >> >> >> >have moving window of message collected 
>>and
>> > > > de-dupe
>> > > > > > ?)
>> > > > > > > > > >>> > >>Possibly, we
>> > > > > > > > > >>> > >> >> >>get
>> > > > > > > > > >>> > >> >> >> >this from retry set to 5…?
>> > > > > > > > > >>> > >> >> >> We are not trying to completely avoid
>> > duplicates.
>> > > > The
>> > > > > > > > > >>>duplicates
>> > > > > > > > > >>> > >>will
>> > > > > > > > > >>> > >> >> >> still be there if:
>> > > > > > > > > >>> > >> >> >> 1. Producer retries on failure.
>> > > > > > > > > >>> > >> >> >> 2. Mirror maker is hard killed.
>> > > > > > > > > >>> > >> >> >> Currently, dedup is expected to be done 
>>by
>> > user if
>> > > > > > > > > >>>necessary.
>> > > > > > > > > >>> > >> >> >> >5)      Last, is there any warning or 
>>any
>> > thing
>> > > > you
>> > > > > > can
>> > > > > > > > > >>>provide
>> > > > > > > > > >>> > >> >>insight
>> > > > > > > > > >>> > >> >> >> >from MM component about data injection 
>>rate
>> > into
>> > > > > > > > > >>>destination
>> > > > > > > > > >>> > >> >> >>partitions is
>> > > > > > > > > >>> > >> >> >> >NOT evenly distributed regardless  of
>> keyed
>> > or
>> > > > > > non-keyed
>> > > > > > > > > >>> message
>> > > > > > > > > >>> > >> >> >>(Hence
>> > > > > > > > > >>> > >> >> >> >there is ripple effect such as data not
>> > arriving
>> > > > > > late, or
>> > > > > > > > > >>>data
>> > > > > > > > > >>> is
>> > > > > > > > > >>> > >> >> >>arriving
>> > > > > > > > > >>> > >> >> >> >out of order in  intern of time stamp  
>>and
>> > early
>> > > > some
>> > > > > > > > time,
>> > > > > > > > > >>>and
>> > > > > > > > > >>> > >> >>CAMUS
>> > > > > > > > > >>> > >> >> >> >creates huge number of file count on 
>>HDFS
>> > due to
>> > > > > > uneven
>> > > > > > > > > >>> injection
>> > > > > > > > > >>> > >> >>rate
>> > > > > > > > > >>> > >> >> >>.
>> > > > > > > > > >>> > >> >> >> >Camus Job is  configured to run every 3
>> > minutes.)
>> > > > > > > > > >>> > >> >> >> I think uneven data distribution is
>> typically
>> > > > caused
>> > > > > > by
>> > > > > > > > > >>>server
>> > > > > > > > > >>> > >>side
>> > > > > > > > > >>> > >> >> >> unbalance, instead of something mirror 
>>maker
>> > could
>> > > > > > > > control.
>> > > > > > > > > >>>In
>> > > > > > > > > >>> new
>> > > > > > > > > >>> > >> >> >>mirror
>> > > > > > > > > >>> > >> >> >> maker, however, there is a customizable
>> > message
>> > > > > > handler,
>> > > > > > > > > >>>that
>> > > > > > > > > >>> > >>might
>> > > > > > > > > >>> > >> >>be
>> > > > > > > > > >>> > >> >> >> able to help a little bit. In message
>> handler,
>> > > > you can
>> > > > > > > > > >>> explicitly
>> > > > > > > > > >>> > >> >>set a
>> > > > > > > > > >>> > >> >> >> partition that you want to produce the
>> message
>> > > > to. So
>> > > > > > if
>> > > > > > > > you
>> > > > > > > > > >>> know
>> > > > > > > > > >>> > >>the
>> > > > > > > > > >>> > >> >> >> uneven data distribution in target 
>>cluster,
>> > you
>> > > > may
>> > > > > > offset
>> > > > > > > > > >>>it
>> > > > > > > > > >>> > >>here.
>> > > > > > > > > >>> > >> >>But
>> > > > > > > > > >>> > >> >> >> that probably only works for non-keyed
>> > messages.
>> > > > > > > > > >>> > >> >> >> >
>> > > > > > > > > >>> > >> >> >> >I am not sure if this is right 
>>discussion
>> > form to
>> > > > > > bring
>> > > > > > > > > >>>these
>> > > > > > > > > >>> to
>> > > > > > > > > >>> > >> >> >> >your/kafka
>> > > > > > > > > >>> > >> >> >> >Dev team attention.  This might be off
>> track,
>> > > > > > > > > >>> > >> >> >> >
>> > > > > > > > > >>> > >> >> >> >
>> > > > > > > > > >>> > >> >> >> >Thanks,
>> > > > > > > > > >>> > >> >> >> >
>> > > > > > > > > >>> > >> >> >> >Bhavesh
>> > > > > > > > > >>> > >> >> >> >
>> > > > > > > > > >>> > >> >> >> >On Wed, Jan 28, 2015 at 11:07 AM, 
>>Jiangjie
>> > Qin
>> > > > > > > > > >>> > >> >> >><[email protected]
>> > > > > > > > > >>> > >> >> >> >
>> > > > > > > > > >>> > >> >> >> >wrote:
>> > > > > > > > > >>> > >> >> >> >
>> > > > > > > > > >>> > >> >> >> >> I’ve updated the KIP page. Feedbacks 
>>are
>> > > > welcome.
>> > > > > > > > > >>> > >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> Regarding the simple mirror maker
>> design. I
>> > > > thought
>> > > > > > > > over
>> > > > > > > > > >>>it
>> > > > > > > > > >>> and
>> > > > > > > > > >>> > >> >>have
>> > > > > > > > > >>> > >> >> >> >>some
>> > > > > > > > > >>> > >> >> >> >> worries:
>> > > > > > > > > >>> > >> >> >> >> There are two things that might worth
>> > thinking:
>> > > > > > > > > >>> > >> >> >> >> 1. One of the enhancement to mirror 
>>maker
>> > is
>> > > > > > adding a
>> > > > > > > > > >>>message
>> > > > > > > > > >>> > >> >> >>handler to
>> > > > > > > > > >>> > >> >> >> >> do things like reformatting. I think 
>>we
>> > might
>> > > > > > > > potentially
>> > > > > > > > > >>> want
>> > > > > > > > > >>> > >>to
>> > > > > > > > > >>> > >> >> >>have
>> > > > > > > > > >>> > >> >> >> >> more threads processing the messages 
>>than
>> > the
>> > > > > > number of
>> > > > > > > > > >>> > >>consumers.
>> > > > > > > > > >>> > >> >> >>If we
>> > > > > > > > > >>> > >> >> >> >> follow the simple mirror maker 
>>solution,
>> we
>> > > > lose
>> > > > > > this
>> > > > > > > > > >>> > >>flexibility.
>> > > > > > > > > >>> > >> >> >> >> 2. This might not matter too much, but
>> > creating
>> > > > > > more
>> > > > > > > > > >>> consumers
>> > > > > > > > > >>> > >> >>means
>> > > > > > > > > >>> > >> >> >> >>more
>> > > > > > > > > >>> > >> >> >> >> footprint of TCP connection / memory.
>> > > > > > > > > >>> > >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> Any thoughts on this?
>> > > > > > > > > >>> > >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> Thanks.
>> > > > > > > > > >>> > >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> Jiangjie (Becket) Qin
>> > > > > > > > > >>> > >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> On 1/26/15, 10:35 AM, "Jiangjie Qin" <
>> > > > > > > > [email protected]>
>> > > > > > > > > >>> > wrote:
>> > > > > > > > > >>> > >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> >Hi Jay and Neha,
>> > > > > > > > > >>> > >> >> >> >> >
>> > > > > > > > > >>> > >> >> >> >> >Thanks a lot for the reply and
>> > explanation. I
>> > > > do
>> > > > > > agree
>> > > > > > > > > >>>it
>> > > > > > > > > >>> > >>makes
>> > > > > > > > > >>> > >> >>more
>> > > > > > > > > >>> > >> >> >> >>sense
>> > > > > > > > > >>> > >> >> >> >> >to avoid duplicate effort and plan 
>>based
>> > on
>> > > > new
>> > > > > > > > > >>>consumer.
>> > > > > > > > > >>> I’ll
>> > > > > > > > > >>> > >> >> >>modify
>> > > > > > > > > >>> > >> >> >> >>the
>> > > > > > > > > >>> > >> >> >> >> >KIP.
>> > > > > > > > > >>> > >> >> >> >> >
>> > > > > > > > > >>> > >> >> >> >> >To Jay’s question on message 
>>ordering -
>> > The
>> > > > data
>> > > > > > > > channel
>> > > > > > > > > >>> > >> >>selection
>> > > > > > > > > >>> > >> >> >> >>makes
>> > > > > > > > > >>> > >> >> >> >> >sure that the messages from the same
>> > source
>> > > > > > partition
>> > > > > > > > > >>>will
>> > > > > > > > > >>> > >>sent
>> > > > > > > > > >>> > >> >>by
>> > > > > > > > > >>> > >> >> >>the
>> > > > > > > > > >>> > >> >> >> >> >same producer. So the order of the
>> > messages is
>> > > > > > > > > >>>guaranteed
>> > > > > > > > > >>> with
>> > > > > > > > > >>> > >> >> >>proper
>> > > > > > > > > >>> > >> >> >> >> >producer settings
>> > > > > > > > > >>> > >> 
>>>>(MaxInFlightRequests=1,retries=Integer.MaxValue,
>> > > > > > > > > >>> > >> >> >> >>etc.)
>> > > > > > > > > >>> > >> >> >> >> >For keyed messages, because they come
>> > from the
>> > > > > > same
>> > > > > > > > > >>>source
>> > > > > > > > > >>> > >> >>partition
>> > > > > > > > > >>> > >> >> >> >>and
>> > > > > > > > > >>> > >> >> >> >> >will end up in the same target
>> partition,
>> > as
>> > > > long
>> > > > > > as
>> > > > > > > > > >>>they
>> > > > > > > > > >>> are
>> > > > > > > > > >>> > >> >>sent
>> > > > > > > > > >>> > >> >> >>by
>> > > > > > > > > >>> > >> >> >> >>the
>> > > > > > > > > >>> > >> >> >> >> >same producer, the order is 
>>guaranteed.
>> > > > > > > > > >>> > >> >> >> >> >For non-keyed messages, the messages
>> > coming
>> > > > from
>> > > > > > the
>> > > > > > > > > >>>same
>> > > > > > > > > >>> > >>source
>> > > > > > > > > >>> > >> >> >> >>partition
>> > > > > > > > > >>> > >> >> >> >> >might go to different target 
>>partitions.
>> > The
>> > > > > > order is
>> > > > > > > > > >>>only
>> > > > > > > > > >>> > >> >> >>guaranteed
>> > > > > > > > > >>> > >> >> >> >> >within each partition.
>> > > > > > > > > >>> > >> >> >> >> >
>> > > > > > > > > >>> > >> >> >> >> >Anyway, I’ll modify the KIP and data
>> > channel
>> > > > will
>> > > > > > be
>> > > > > > > > > >>>away.
>> > > > > > > > > >>> > >> >> >> >> >
>> > > > > > > > > >>> > >> >> >> >> >Thanks.
>> > > > > > > > > >>> > >> >> >> >> >
>> > > > > > > > > >>> > >> >> >> >> >Jiangjie (Becket) Qin
>> > > > > > > > > >>> > >> >> >> >> >
>> > > > > > > > > >>> > >> >> >> >> >
>> > > > > > > > > >>> > >> >> >> >> >On 1/25/15, 4:34 PM, "Neha Narkhede" 
>><
>> > > > > > > > [email protected]>
>> > > > > > > > > >>> > >>wrote:
>> > > > > > > > > >>> > >> >> >> >> >
>> > > > > > > > > >>> > >> >> >> >> >>I think there is some value in
>> > investigating
>> > > > if
>> > > > > > we
>> > > > > > > > can
>> > > > > > > > > >>>go
>> > > > > > > > > >>> > >>back
>> > > > > > > > > >>> > >> >>to
>> > > > > > > > > >>> > >> >> >>the
>> > > > > > > > > >>> > >> >> >> >> >>simple mirror maker design, as Jay
>> points
>> > > > out.
>> > > > > > Here
>> > > > > > > > you
>> > > > > > > > > >>> have
>> > > > > > > > > >>> > >>N
>> > > > > > > > > >>> > >> >> >> >>threads,
>> > > > > > > > > >>> > >> >> >> >> >>each has a consumer and a producer.
>> > > > > > > > > >>> > >> >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> >>The reason why we had to move away 
>>from
>> > that
>> > > > was
>> > > > > > a
>> > > > > > > > > >>> > >>combination
>> > > > > > > > > >>> > >> >>of
>> > > > > > > > > >>> > >> >> >>the
>> > > > > > > > > >>> > >> >> >> >> >>difference in throughput between the
>> > consumer
>> > > > > > and the
>> > > > > > > > > >>>old
>> > > > > > > > > >>> > >> >>producer
>> > > > > > > > > >>> > >> >> >>and
>> > > > > > > > > >>> > >> >> >> >> >>the
>> > > > > > > > > >>> > >> >> >> >> >>deficiency of the consumer 
>>rebalancing
>> > that
>> > > > > > limits
>> > > > > > > > the
>> > > > > > > > > >>> total
>> > > > > > > > > >>> > >> >> >>number of
>> > > > > > > > > >>> > >> >> >> >> >>mirror maker threads. So the only
>> option
>> > > > > > available
>> > > > > > > > was
>> > > > > > > > > >>>to
>> > > > > > > > > >>> > >> >>increase
>> > > > > > > > > >>> > >> >> >>the
>> > > > > > > > > >>> > >> >> >> >> >>throughput of the limited # of 
>>mirror
>> > maker
>> > > > > > threads
>> > > > > > > > > >>>that
>> > > > > > > > > >>> > >>could
>> > > > > > > > > >>> > >> >>be
>> > > > > > > > > >>> > >> >> >> >> >>deployed.
>> > > > > > > > > >>> > >> >> >> >> >>Now that queuing design may not make
>> > sense,
>> > > > if
>> > > > > > the
>> > > > > > > > new
>> > > > > > > > > >>> > >> >>producer's
>> > > > > > > > > >>> > >> >> >> >> >>throughput is almost similar to the
>> > consumer
>> > > > AND
>> > > > > > the
>> > > > > > > > > >>>fact
>> > > > > > > > > >>> > >>that
>> > > > > > > > > >>> > >> >>the
>> > > > > > > > > >>> > >> >> >>new
>> > > > > > > > > >>> > >> >> >> >> >>round-robin based consumer 
>>rebalancing
>> > can
>> > > > allow
>> > > > > > a
>> > > > > > > > very
>> > > > > > > > > >>> high
>> > > > > > > > > >>> > >> >> >>number of
>> > > > > > > > > >>> > >> >> >> >> >>mirror maker instances to exist.
>> > > > > > > > > >>> > >> >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> >>This is the end state that the 
>>mirror
>> > maker
>> > > > > > should be
>> > > > > > > > > >>>in
>> > > > > > > > > >>> once
>> > > > > > > > > >>> > >> >>the
>> > > > > > > > > >>> > >> >> >>new
>> > > > > > > > > >>> > >> >> >> >> >>consumer is complete, so it wouldn't
>> > hurt to
>> > > > see
>> > > > > > if
>> > > > > > > > we
>> > > > > > > > > >>>can
>> > > > > > > > > >>> > >>just
>> > > > > > > > > >>> > >> >> >>move
>> > > > > > > > > >>> > >> >> >> >>to
>> > > > > > > > > >>> > >> >> >> >> >>that right now.
>> > > > > > > > > >>> > >> >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> >>On Fri, Jan 23, 2015 at 8:40 PM, Jay
>> > Kreps
>> > > > > > > > > >>> > >><[email protected]
>> > > > > > > > > >>> > >> >
>> > > > > > > > > >>> > >> >> >> >>wrote:
>> > > > > > > > > >>> > >> >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> >>> QQ: If we ever use a different
>> > technique
>> > > > for
>> > > > > > the
>> > > > > > > > data
>> > > > > > > > > >>> > >>channel
>> > > > > > > > > >>> > >> >> >> >>selection
>> > > > > > > > > >>> > >> >> >> >> >>> than for the producer partitioning
>> > won't
>> > > > that
>> > > > > > break
>> > > > > > > > > >>> > >>ordering?
>> > > > > > > > > >>> > >> >>How
>> > > > > > > > > >>> > >> >> >> >>can
>> > > > > > > > > >>> > >> >> >> >> >>>we
>> > > > > > > > > >>> > >> >> >> >> >>> ensure these things stay in sync?
>> > > > > > > > > >>> > >> >> >> >> >>>
>> > > > > > > > > >>> > >> >> >> >> >>> With respect to the new 
>>consumer--I
>> > really
>> > > > do
>> > > > > > want
>> > > > > > > > to
>> > > > > > > > > >>> > >> >>encourage
>> > > > > > > > > >>> > >> >> >> >>people
>> > > > > > > > > >>> > >> >> >> >> >>>to
>> > > > > > > > > >>> > >> >> >> >> >>> think through how MM will work 
>>with
>> > the new
>> > > > > > > > consumer.
>> > > > > > > > > >>>I
>> > > > > > > > > >>> > >>mean
>> > > > > > > > > >>> > >> >>this
>> > > > > > > > > >>> > >> >> >> >>isn't
>> > > > > > > > > >>> > >> >> >> >> >>> very far off, maybe a few months 
>>if
>> we
>> > > > hustle?
>> > > > > > I
>> > > > > > > > > >>>could
>> > > > > > > > > >>> > >> >>imagine us
>> > > > > > > > > >>> > >> >> >> >> >>>getting
>> > > > > > > > > >>> > >> >> >> >> >>> this mm fix done maybe sooner, 
>>maybe
>> > in a
>> > > > > > month?
>> > > > > > > > So I
>> > > > > > > > > >>> guess
>> > > > > > > > > >>> > >> >>this
>> > > > > > > > > >>> > >> >> >> >>buys
>> > > > > > > > > >>> > >> >> >> >> >>>us an
>> > > > > > > > > >>> > >> >> >> >> >>> extra month before we rip it out 
>>and
>> > throw
>> > > > it
>> > > > > > away?
>> > > > > > > > > >>>Maybe
>> > > > > > > > > >>> > >>two?
>> > > > > > > > > >>> > >> >> >>This
>> > > > > > > > > >>> > >> >> >> >>bug
>> > > > > > > > > >>> > >> >> >> >> >>>has
>> > > > > > > > > >>> > >> >> >> >> >>> been there for a while, though,
>> right?
>> > Is
>> > > > it
>> > > > > > worth
>> > > > > > > > > >>>it?
>> > > > > > > > > >>> > >> >>Probably
>> > > > > > > > > >>> > >> >> >>it
>> > > > > > > > > >>> > >> >> >> >>is,
>> > > > > > > > > >>> > >> >> >> >> >>>but
>> > > > > > > > > >>> > >> >> >> >> >>> it still kind of sucks to have the
>> > > > duplicate
>> > > > > > > > effort.
>> > > > > > > > > >>> > >> >> >> >> >>>
>> > > > > > > > > >>> > >> >> >> >> >>> So anyhow let's definitely think
>> about
>> > how
>> > > > > > things
>> > > > > > > > > >>>will
>> > > > > > > > > >>> work
>> > > > > > > > > >>> > >> >>with
>> > > > > > > > > >>> > >> >> >>the
>> > > > > > > > > >>> > >> >> >> >> >>>new
>> > > > > > > > > >>> > >> >> >> >> >>> consumer. I think we can probably
>> just
>> > > > have N
>> > > > > > > > > >>>threads,
>> > > > > > > > > >>> each
>> > > > > > > > > >>> > >> >> >>thread
>> > > > > > > > > >>> > >> >> >> >>has
>> > > > > > > > > >>> > >> >> >> >> >>>a
>> > > > > > > > > >>> > >> >> >> >> >>> producer and consumer and is
>> internally
>> > > > single
>> > > > > > > > > >>>threaded.
>> > > > > > > > > >>> > >>Any
>> > > > > > > > > >>> > >> >> >>reason
>> > > > > > > > > >>> > >> >> >> >> >>>this
>> > > > > > > > > >>> > >> >> >> >> >>> wouldn't work?
>> > > > > > > > > >>> > >> >> >> >> >>>
>> > > > > > > > > >>> > >> >> >> >> >>> -Jay
>> > > > > > > > > >>> > >> >> >> >> >>>
>> > > > > > > > > >>> > >> >> >> >> >>>
>> > > > > > > > > >>> > >> >> >> >> >>> On Wed, Jan 21, 2015 at 5:29 PM,
>> > Jiangjie
>> > > > Qin
>> > > > > > > > > >>> > >> >> >> >> >>><[email protected]>
>> > > > > > > > > >>> > >> >> >> >> >>> wrote:
>> > > > > > > > > >>> > >> >> >> >> >>>
>> > > > > > > > > >>> > >> >> >> >> >>> > Hi Jay,
>> > > > > > > > > >>> > >> >> >> >> >>> >
>> > > > > > > > > >>> > >> >> >> >> >>> > Thanks for comments. Please see
>> > inline
>> > > > > > responses.
>> > > > > > > > > >>> > >> >> >> >> >>> >
>> > > > > > > > > >>> > >> >> >> >> >>> > Jiangjie (Becket) Qin
>> > > > > > > > > >>> > >> >> >> >> >>> >
>> > > > > > > > > >>> > >> >> >> >> >>> > On 1/21/15, 1:33 PM, "Jay Kreps"
>> > > > > > > > > >>><[email protected]>
>> > > > > > > > > >>> > >> >>wrote:
>> > > > > > > > > >>> > >> >> >> >> >>> >
>> > > > > > > > > >>> > >> >> >> >> >>> > >Hey guys,
>> > > > > > > > > >>> > >> >> >> >> >>> > >
>> > > > > > > > > >>> > >> >> >> >> >>> > >A couple questions/comments:
>> > > > > > > > > >>> > >> >> >> >> >>> > >
>> > > > > > > > > >>> > >> >> >> >> >>> > >1. The callback and
>> user-controlled
>> > > > commit
>> > > > > > > > offset
>> > > > > > > > > >>> > >> >> >>functionality
>> > > > > > > > > >>> > >> >> >> >>is
>> > > > > > > > > >>> > >> >> >> >> >>> already
>> > > > > > > > > >>> > >> >> >> >> >>> > >in the new consumer which we 
>>are
>> > > > working on
>> > > > > > in
>> > > > > > > > > >>> parallel.
>> > > > > > > > > >>> > >> >>If we
>> > > > > > > > > >>> > >> >> >> >> >>> accelerated
>> > > > > > > > > >>> > >> >> >> >> >>> > >that work it might help
>> concentrate
>> > > > > > efforts. I
>> > > > > > > > > >>>admit
>> > > > > > > > > >>> > >>this
>> > > > > > > > > >>> > >> >> >>might
>> > > > > > > > > >>> > >> >> >> >>take
>> > > > > > > > > >>> > >> >> >> >> >>> > >slightly longer in calendar 
>>time
>> but
>> > > > could
>> > > > > > still
>> > > > > > > > > >>> > >>probably
>> > > > > > > > > >>> > >> >>get
>> > > > > > > > > >>> > >> >> >> >>done
>> > > > > > > > > >>> > >> >> >> >> >>>this
>> > > > > > > > > >>> > >> >> >> >> >>> > >quarter. Have you guys 
>>considered
>> > that
>> > > > > > approach?
>> > > > > > > > > >>> > >> >> >> >> >>> > Yes, I totally agree that 
>>ideally
>> we
>> > > > should
>> > > > > > put
>> > > > > > > > > >>>efforts
>> > > > > > > > > >>> > >>on
>> > > > > > > > > >>> > >> >>new
>> > > > > > > > > >>> > >> >> >> >> >>>consumer.
>> > > > > > > > > >>> > >> >> >> >> >>> > The main reason for still 
>>working
>> on
>> > the
>> > > > old
>> > > > > > > > > >>>consumer
>> > > > > > > > > >>> is
>> > > > > > > > > >>> > >> >>that
>> > > > > > > > > >>> > >> >> >>we
>> > > > > > > > > >>> > >> >> >> >> >>>expect
>> > > > > > > > > >>> > >> >> >> >> >>> it
>> > > > > > > > > >>> > >> >> >> >> >>> > would still be used in LinkedIn 
>>for
>> > > > quite a
>> > > > > > while
>> > > > > > > > > >>> before
>> > > > > > > > > >>> > >>the
>> > > > > > > > > >>> > >> >> >>new
>> > > > > > > > > >>> > >> >> >> >> >>>consumer
>> > > > > > > > > >>> > >> >> >> >> >>> > could be fully rolled out. And 
>>we
>> > > > recently
>> > > > > > > > > >>>suffering a
>> > > > > > > > > >>> > >>lot
>> > > > > > > > > >>> > >> >>from
>> > > > > > > > > >>> > >> >> >> >> >>>mirror
>> > > > > > > > > >>> > >> >> >> >> >>> > maker data loss issue. So our
>> current
>> > > > plan is
>> > > > > > > > > >>>making
>> > > > > > > > > >>> > >> >>necessary
>> > > > > > > > > >>> > >> >> >> >> >>>changes to
>> > > > > > > > > >>> > >> >> >> >> >>> > make current mirror maker 
>>stable in
>> > > > > > production.
>> > > > > > > > > >>>Then we
>> > > > > > > > > >>> > >>can
>> > > > > > > > > >>> > >> >> >>test
>> > > > > > > > > >>> > >> >> >> >>and
>> > > > > > > > > >>> > >> >> >> >> >>> > rollout new consumer gradually
>> > without
>> > > > > > getting
>> > > > > > > > > >>>burnt.
>> > > > > > > > > >>> > >> >> >> >> >>> > >
>> > > > > > > > > >>> > >> >> >> >> >>> > >2. I think partitioning on the
>> hash
>> > of
>> > > > the
>> > > > > > topic
>> > > > > > > > > >>> > >>partition
>> > > > > > > > > >>> > >> >>is
>> > > > > > > > > >>> > >> >> >> >>not a
>> > > > > > > > > >>> > >> >> >> >> >>>very
>> > > > > > > > > >>> > >> >> >> >> >>> > >good idea because that will 
>>make
>> the
>> > > > case of
>> > > > > > > > going
>> > > > > > > > > >>> from
>> > > > > > > > > >>> > >>a
>> > > > > > > > > >>> > >> >> >>cluster
>> > > > > > > > > >>> > >> >> >> >> >>>with
>> > > > > > > > > >>> > >> >> >> >> >>> > >fewer partitions to one with 
>>more
>> > > > > > partitions not
>> > > > > > > > > >>> work. I
>> > > > > > > > > >>> > >> >> >>think an
>> > > > > > > > > >>> > >> >> >> >> >>> > >intuitive
>> > > > > > > > > >>> > >> >> >> >> >>> > >way to do this would be the
>> > following:
>> > > > > > > > > >>> > >> >> >> >> >>> > >a. Default behavior: Just do 
>>what
>> > the
>> > > > > > producer
>> > > > > > > > > >>>does.
>> > > > > > > > > >>> > >>I.e.
>> > > > > > > > > >>> > >> >>if
>> > > > > > > > > >>> > >> >> >>you
>> > > > > > > > > >>> > >> >> >> >> >>> specify a
>> > > > > > > > > >>> > >> >> >> >> >>> > >key use it for partitioning, if
>> not
>> > just
>> > > > > > > > partition
>> > > > > > > > > >>>in
>> > > > > > > > > >>> a
>> > > > > > > > > >>> > >> >> >> >>round-robin
>> > > > > > > > > >>> > >> >> >> >> >>> > >fashion.
>> > > > > > > > > >>> > >> >> >> >> >>> > >b. Add a --preserve-partition
>> option
>> > > > that
>> > > > > > will
>> > > > > > > > > >>> > >>explicitly
>> > > > > > > > > >>> > >> >> >> >>inherent
>> > > > > > > > > >>> > >> >> >> >> >>>the
>> > > > > > > > > >>> > >> >> >> >> >>> > >partition from the source
>> > irrespective
>> > > > of
>> > > > > > > > whether
>> > > > > > > > > >>> there
>> > > > > > > > > >>> > >>is
>> > > > > > > > > >>> > >> >>a
>> > > > > > > > > >>> > >> >> >>key
>> > > > > > > > > >>> > >> >> >> >>or
>> > > > > > > > > >>> > >> >> >> >> >>> which
>> > > > > > > > > >>> > >> >> >> >> >>> > >partition that key would hash 
>>to.
>> > > > > > > > > >>> > >> >> >> >> >>> > Sorry that I did not explain 
>>this
>> > clear
>> > > > > > enough.
>> > > > > > > > The
>> > > > > > > > > >>> hash
>> > > > > > > > > >>> > >>of
>> > > > > > > > > >>> > >> >> >>topic
>> > > > > > > > > >>> > >> >> >> >> >>> > partition is only used when 
>>decide
>> > which
>> > > > > > mirror
>> > > > > > > > > >>>maker
>> > > > > > > > > >>> > >>data
>> > > > > > > > > >>> > >> >> >>channel
>> > > > > > > > > >>> > >> >> >> >> >>>queue
>> > > > > > > > > >>> > >> >> >> >> >>> > the consumer thread should put
>> > message
>> > > > into.
>> > > > > > It
>> > > > > > > > > >>>only
>> > > > > > > > > >>> > >>tries
>> > > > > > > > > >>> > >> >>to
>> > > > > > > > > >>> > >> >> >>make
>> > > > > > > > > >>> > >> >> >> >> >>>sure
>> > > > > > > > > >>> > >> >> >> >> >>> > the messages from the same
>> partition
>> > is
>> > > > sent
>> > > > > > by
>> > > > > > > > the
>> > > > > > > > > >>> same
>> > > > > > > > > >>> > >> >> >>producer
>> > > > > > > > > >>> > >> >> >> >> >>>thread
>> > > > > > > > > >>> > >> >> >> >> >>> > to guarantee the sending order.
>> This
>> > is
>> > > > not
>> > > > > > at
>> > > > > > > > all
>> > > > > > > > > >>> > >>related
>> > > > > > > > > >>> > >> >>to
>> > > > > > > > > >>> > >> >> >> >>which
>> > > > > > > > > >>> > >> >> >> >> >>> > partition in target cluster the
>> > messages
>> > > > end
>> > > > > > up.
>> > > > > > > > > >>>That
>> > > > > > > > > >>> is
>> > > > > > > > > >>> > >> >>still
>> > > > > > > > > >>> > >> >> >> >> >>>decided by
>> > > > > > > > > >>> > >> >> >> >> >>> > producer.
>> > > > > > > > > >>> > >> >> >> >> >>> > >
>> > > > > > > > > >>> > >> >> >> >> >>> > >3. You don't actually give the
>> > > > > > > > > >>> ConsumerRebalanceListener
>> > > > > > > > > >>> > >> >> >> >>interface.
>> > > > > > > > > >>> > >> >> >> >> >>>What
>> > > > > > > > > >>> > >> >> >> >> >>> > >is
>> > > > > > > > > >>> > >> >> >> >> >>> > >that going to look like?
>> > > > > > > > > >>> > >> >> >> >> >>> > Good point! I should have put 
>>it in
>> > the
>> > > > > > wiki. I
>> > > > > > > > > >>>just
>> > > > > > > > > >>> > >>added
>> > > > > > > > > >>> > >> >>it.
>> > > > > > > > > >>> > >> >> >> >> >>> > >
>> > > > > > > > > >>> > >> >> >> >> >>> > >4. What is MirrorMakerRecord? I
>> > think
>> > > > > > ideally
>> > > > > > > > the
>> > > > > > > > > >>> > >> >> >> >> >>> > >MirrorMakerMessageHandler
>> > > > > > > > > >>> > >> >> >> >> >>> > >interface would take a
>> > ConsumerRecord as
>> > > > > > input
>> > > > > > > > and
>> > > > > > > > > >>> > >>return a
>> > > > > > > > > >>> > >> >> >> >> >>> > >ProducerRecord,
>> > > > > > > > > >>> > >> >> >> >> >>> > >right? That would allow you to
>> > > > transform the
>> > > > > > > > key,
>> > > > > > > > > >>> value,
>> > > > > > > > > >>> > >> >> >> >>partition,
>> > > > > > > > > >>> > >> >> >> >> >>>or
>> > > > > > > > > >>> > >> >> >> >> >>> > >destination topic...
>> > > > > > > > > >>> > >> >> >> >> >>> > MirrorMakerRecord is introduced 
>>in
>> > > > > > KAFKA-1650,
>> > > > > > > > > >>>which is
>> > > > > > > > > >>> > >> >>exactly
>> > > > > > > > > >>> > >> >> >> >>the
>> > > > > > > > > >>> > >> >> >> >> >>>same
>> > > > > > > > > >>> > >> >> >> >> >>> > as ConsumerRecord in KAFKA-1760.
>> > > > > > > > > >>> > >> >> >> >> >>> > private[kafka] class
>> > MirrorMakerRecord
>> > > > (val
>> > > > > > > > > >>> sourceTopic:
>> > > > > > > > > >>> > >> >> >>String,
>> > > > > > > > > >>> > >> >> >> >> >>> >   val sourcePartition: Int,
>> > > > > > > > > >>> > >> >> >> >> >>> >   val sourceOffset: Long,
>> > > > > > > > > >>> > >> >> >> >> >>> >   val key: Array[Byte],
>> > > > > > > > > >>> > >> >> >> >> >>> >   val value: Array[Byte]) {
>> > > > > > > > > >>> > >> >> >> >> >>> >   def size = value.length + {if
>> (key
>> > ==
>> > > > > > null) 0
>> > > > > > > > > >>>else
>> > > > > > > > > >>> > >> >> >>key.length}
>> > > > > > > > > >>> > >> >> >> >> >>> > }
>> > > > > > > > > >>> > >> >> >> >> >>> >
>> > > > > > > > > >>> > >> >> >> >> >>> > However, because source 
>>partition
>> and
>> > > > offset
>> > > > > > is
>> > > > > > > > > >>>needed
>> > > > > > > > > >>> in
>> > > > > > > > > >>> > >> >> >>producer
>> > > > > > > > > >>> > >> >> >> >> >>>thread
>> > > > > > > > > >>> > >> >> >> >> >>> > for consumer offsets 
>>bookkeeping,
>> the
>> > > > record
>> > > > > > > > > >>>returned
>> > > > > > > > > >>> by
>> > > > > > > > > >>> > >> >> >> >> >>> > MirrorMakerMessageHandler needs 
>>to
>> > > > contain
>> > > > > > those
>> > > > > > > > > >>> > >> >>information.
>> > > > > > > > > >>> > >> >> >> >> >>>Therefore
>> > > > > > > > > >>> > >> >> >> >> >>> > ProducerRecord does not work 
>>here.
>> We
>> > > > could
>> > > > > > > > > >>>probably
>> > > > > > > > > >>> let
>> > > > > > > > > >>> > >> >> >>message
>> > > > > > > > > >>> > >> >> >> >> >>>handler
>> > > > > > > > > >>> > >> >> >> >> >>> > take ConsumerRecord for both 
>>input
>> > and
>> > > > > > output.
>> > > > > > > > > >>> > >> >> >> >> >>> > >
>> > > > > > > > > >>> > >> >> >> >> >>> > >5. Have you guys thought about
>> what
>> > the
>> > > > > > > > > >>>implementation
>> > > > > > > > > >>> > >>will
>> > > > > > > > > >>> > >> >> >>look
>> > > > > > > > > >>> > >> >> >> >> >>>like in
>> > > > > > > > > >>> > >> >> >> >> >>> > >terms of threading architecture
>> etc
>> > with
>> > > > > > the new
>> > > > > > > > > >>> > >>consumer?
>> > > > > > > > > >>> > >> >> >>That
>> > > > > > > > > >>> > >> >> >> >>will
>> > > > > > > > > >>> > >> >> >> >> >>>be
>> > > > > > > > > >>> > >> >> >> >> >>> > >soon so even if we aren't 
>>starting
>> > with
>> > > > that
>> > > > > > > > let's
>> > > > > > > > > >>> make
>> > > > > > > > > >>> > >> >>sure
>> > > > > > > > > >>> > >> >> >>we
>> > > > > > > > > >>> > >> >> >> >>can
>> > > > > > > > > >>> > >> >> >> >> >>>get
>> > > > > > > > > >>> > >> >> >> >> >>> > >rid
>> > > > > > > > > >>> > >> >> >> >> >>> > >of a lot of the current mirror
>> maker
>> > > > > > accidental
>> > > > > > > > > >>> > >>complexity
>> > > > > > > > > >>> > >> >>in
>> > > > > > > > > >>> > >> >> >> >>terms
>> > > > > > > > > >>> > >> >> >> >> >>>of
>> > > > > > > > > >>> > >> >> >> >> >>> > >threads and queues when we 
>>move to
>> > that.
>> > > > > > > > > >>> > >> >> >> >> >>> > I haven¹t thought about it
>> > throughly. The
>> > > > > > quick
>> > > > > > > > > >>>idea is
>> > > > > > > > > >>> > >> >>after
>> > > > > > > > > >>> > >> >> >> >> >>>migration
>> > > > > > > > > >>> > >> >> >> >> >>> to
>> > > > > > > > > >>> > >> >> >> >> >>> > the new consumer, it is probably
>> > better
>> > > > to
>> > > > > > use a
>> > > > > > > > > >>>single
>> > > > > > > > > >>> > >> >> >>consumer
>> > > > > > > > > >>> > >> >> >> >> >>>thread.
>> > > > > > > > > >>> > >> >> >> >> >>> > If multithread is needed,
>> decoupling
>> > > > > > consumption
>> > > > > > > > > >>>and
>> > > > > > > > > >>> > >> >>processing
>> > > > > > > > > >>> > >> >> >> >>might
>> > > > > > > > > >>> > >> >> >> >> >>>be
>> > > > > > > > > >>> > >> >> >> >> >>> > used. MirrorMaker definitely 
>>needs
>> > to be
>> > > > > > changed
>> > > > > > > > > >>>after
>> > > > > > > > > >>> > >>new
>> > > > > > > > > >>> > >> >> >> >>consumer
>> > > > > > > > > >>> > >> >> >> >> >>>get
>> > > > > > > > > >>> > >> >> >> >> >>> > checked in. I¹ll document the
>> changes
>> > > > and can
>> > > > > > > > > >>>submit
>> > > > > > > > > >>> > >>follow
>> > > > > > > > > >>> > >> >>up
>> > > > > > > > > >>> > >> >> >> >> >>>patches
>> > > > > > > > > >>> > >> >> >> >> >>> > after the new consumer is
>> available.
>> > > > > > > > > >>> > >> >> >> >> >>> > >
>> > > > > > > > > >>> > >> >> >> >> >>> > >-Jay
>> > > > > > > > > >>> > >> >> >> >> >>> > >
>> > > > > > > > > >>> > >> >> >> >> >>> > >On Tue, Jan 20, 2015 at 4:31 
>>PM,
>> > > > Jiangjie
>> > > > > > Qin
>> > > > > > > > > >>> > >> >> >> >> >>><[email protected]
>> > > > > > > > > >>> > >> >> >> >> >>> >
>> > > > > > > > > >>> > >> >> >> >> >>> > >wrote:
>> > > > > > > > > >>> > >> >> >> >> >>> > >
>> > > > > > > > > >>> > >> >> >> >> >>> > >> Hi Kafka Devs,
>> > > > > > > > > >>> > >> >> >> >> >>> > >>
>> > > > > > > > > >>> > >> >> >> >> >>> > >> We are working on Kafka 
>>Mirror
>> > Maker
>> > > > > > > > > >>>enhancement. A
>> > > > > > > > > >>> > >>KIP
>> > > > > > > > > >>> > >> >>is
>> > > > > > > > > >>> > >> >> >> >>posted
>> > > > > > > > > >>> > >> >> >> >> >>>to
>> > > > > > > > > >>> > >> >> >> >> >>> > >> document and discuss on the
>> > > > followings:
>> > > > > > > > > >>> > >> >> >> >> >>> > >> 1. KAFKA-1650: No Data loss
>> mirror
>> > > > maker
>> > > > > > > > change
>> > > > > > > > > >>> > >> >> >> >> >>> > >> 2. KAFKA-1839: To allow
>> partition
>> > > > aware
>> > > > > > > > mirror.
>> > > > > > > > > >>> > >> >> >> >> >>> > >> 3. KAFKA-1840: To allow 
>>message
>> > > > > > > > filtering/format
>> > > > > > > > > >>> > >> >>conversion
>> > > > > > > > > >>> > >> >> >> >> >>> > >> Feedbacks are welcome. Please
>> let
>> > us
>> > > > know
>> > > > > > if
>> > > > > > > > you
>> > > > > > > > > >>> have
>> > > > > > > > > >>> > >>any
>> > > > > > > > > >>> > >> >> >> >> >>>questions or
>> > > > > > > > > >>> > >> >> >> >> >>> > >> concerns.
>> > > > > > > > > >>> > >> >> >> >> >>> > >>
>> > > > > > > > > >>> > >> >> >> >> >>> > >> Thanks.
>> > > > > > > > > >>> > >> >> >> >> >>> > >>
>> > > > > > > > > >>> > >> >> >> >> >>> > >> Jiangjie (Becket) Qin
>> > > > > > > > > >>> > >> >> >> >> >>> > >>
>> > > > > > > > > >>> > >> >> >> >> >>> >
>> > > > > > > > > >>> > >> >> >> >> >>> >
>> > > > > > > > > >>> > >> >> >> >> >>>
>> > > > > > > > > >>> > >> >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >> >>--
>> > > > > > > > > >>> > >> >> >> >> >>Thanks,
>> > > > > > > > > >>> > >> >> >> >> >>Neha
>> > > > > > > > > >>> > >> >> >> >> >
>> > > > > > > > > >>> > >> >> >> >>
>> > > > > > > > > >>> > >> >> >> >>
>> > > > > > > > > >>> > >> >> >>
>> > > > > > > > > >>> > >> >> >>
>> > > > > > > > > >>> > >> >>
>> > > > > > > > > >>> > >> >>
>> > > > > > > > > >>> > >>
>> > > > > > > > > >>> > >>
>> > > > > > > > > >>> > >
>> > > > > > > > > >>> > >
>> > > > > > > > > >>> > >--
>> > > > > > > > > >>> > >Thanks,
>> > > > > > > > > >>> > >Neha
>> > > > > > > > > >>> >
>> > > > > > > > > >>> >
>> > > > > > > > > >>>
>> > > > > > > > > >>>
>> > > > > > > > > >>> --
>> > > > > > > > > >>> Thanks,
>> > > > > > > > > >>> Neha
>> > > > > > > > > >>>
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Thanks,
>> > > > > > > Neha
>> > > > > >
>> > > > > >
>> > > >
>> > > >
>> >
>> >
>>
>
>
>
>-- 
>Thanks,
>Neha

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

Reply via email to