@Dong,
Great example and explanation, thanks!

@All
Regarding the example given by Dong, it seems even if we use a queue, and a
dedicated controller request handling thread,
the same result can still happen because R1_a will be sent on one
connection, and R1_b & R2 will be sent on a different connection,
and there is no ordering between different connections on the broker side.
I was discussing with Mayuresh offline, and it seems correlation id within
the same NetworkClient object is monotonically increasing and never reset,
hence a broker can leverage that to properly reject obsolete requests.
Thoughts?

Thanks,
Lucas

On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
gharatmayures...@gmail.com> wrote:

> Actually nvm, correlationId is reset in case of connection loss, I think.
>
> Thanks,
>
> Mayuresh
>
> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> gharatmayures...@gmail.com>
> wrote:
>
> > I agree with Dong that out-of-order processing can happen with having 2
> > separate queues as well and it can even happen today.
> > Can we use the correlationId in the request from the controller to the
> > broker to handle ordering ?
> >
> > Thanks,
> >
> > Mayuresh
> >
> >
> > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <becket....@gmail.com> wrote:
> >
> >> Good point, Joel. I agree that a dedicated controller request handling
> >> thread would be a better isolation. It also solves the reordering issue.
> >>
> >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jjkosh...@gmail.com>
> wrote:
> >>
> >> > Good example. I think this scenario can occur in the current code as
> >> well
> >> > but with even lower probability given that there are other
> >> non-controller
> >> > requests interleaved. It is still sketchy though and I think a safer
> >> > approach would be separate queues and pinning controller request
> >> handling
> >> > to one handler thread.
> >> >
> >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <lindon...@gmail.com>
> wrote:
> >> >
> >> > > Hey Becket,
> >> > >
> >> > > I think you are right that there may be out-of-order processing.
> >> However,
> >> > > it seems that out-of-order processing may also happen even if we
> use a
> >> > > separate queue.
> >> > >
> >> > > Here is the example:
> >> > >
> >> > > - Controller sends R1 and got disconnected before receiving
> response.
> >> > Then
> >> > > it reconnects and sends R2. Both requests now stay in the controller
> >> > > request queue in the order they are sent.
> >> > > - thread1 takes R1_a from the request queue and then thread2 takes
> R2
> >> > from
> >> > > the request queue almost at the same time.
> >> > > - So R1_a and R2 are processed in parallel. There is chance that
> R2's
> >> > > processing is completed before R1.
> >> > >
> >> > > If out-of-order processing can happen for both approaches with very
> >> low
> >> > > probability, it may not be worthwhile to add the extra queue. What
> do
> >> you
> >> > > think?
> >> > >
> >> > > Thanks,
> >> > > Dong
> >> > >
> >> > >
> >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <becket....@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi Mayuresh/Joel,
> >> > > >
> >> > > > Using the request channel as a dequeue was bright up some time ago
> >> when
> >> > > we
> >> > > > initially thinking of prioritizing the request. The concern was
> that
> >> > the
> >> > > > controller requests are supposed to be processed in order. If we
> can
> >> > > ensure
> >> > > > that there is one controller request in the request channel, the
> >> order
> >> > is
> >> > > > not a concern. But in cases that there are more than one
> controller
> >> > > request
> >> > > > inserted into the queue, the controller request order may change
> and
> >> > > cause
> >> > > > problem. For example, think about the following sequence:
> >> > > > 1. Controller successfully sent a request R1 to broker
> >> > > > 2. Broker receives R1 and put the request to the head of the
> request
> >> > > queue.
> >> > > > 3. Controller to broker connection failed and the controller
> >> > reconnected
> >> > > to
> >> > > > the broker.
> >> > > > 4. Controller sends a request R2 to the broker
> >> > > > 5. Broker receives R2 and add it to the head of the request queue.
> >> > > > Now on the broker side, R2 will be processed before R1 is
> processed,
> >> > > which
> >> > > > may cause problem.
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Jiangjie (Becket) Qin
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jjkosh...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > @Mayuresh - I like your idea. It appears to be a simpler less
> >> > invasive
> >> > > > > alternative and it should work. Jun/Becket/others, do you see
> any
> >> > > > pitfalls
> >> > > > > with this approach?
> >> > > > >
> >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> >> lucasatu...@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > @Mayuresh,
> >> > > > > > That's a very interesting idea that I haven't thought before.
> >> > > > > > It seems to solve our problem at hand pretty well, and also
> >> > > > > > avoids the need to have a new size metric and capacity config
> >> > > > > > for the controller request queue. In fact, if we were to adopt
> >> > > > > > this design, there is no public interface change, and we
> >> > > > > > probably don't need a KIP.
> >> > > > > > Also implementation wise, it seems
> >> > > > > > the java class LinkedBlockingQueue can readily satisfy the
> >> > > requirement
> >> > > > > > by supporting a capacity, and also allowing inserting at both
> >> ends.
> >> > > > > >
> >> > > > > > My only concern is that this design is tied to the coincidence
> >> that
> >> > > > > > we have two request priorities and there are two ends to a
> >> deque.
> >> > > > > > Hence by using the proposed design, it seems the network layer
> >> is
> >> > > > > > more tightly coupled with upper layer logic, e.g. if we were
> to
> >> add
> >> > > > > > an extra priority level in the future for some reason, we
> would
> >> > > > probably
> >> > > > > > need to go back to the design of separate queues, one for each
> >> > > priority
> >> > > > > > level.
> >> > > > > >
> >> > > > > > In summary, I'm ok with both designs and lean toward your
> >> suggested
> >> > > > > > approach.
> >> > > > > > Let's hear what others think.
> >> > > > > >
> >> > > > > > @Becket,
> >> > > > > > In light of Mayuresh's suggested new design, I'm answering
> your
> >> > > > question
> >> > > > > > only in the context
> >> > > > > > of the current KIP design: I think your suggestion makes
> sense,
> >> and
> >> > > I'm
> >> > > > > ok
> >> > > > > > with removing the capacity config and
> >> > > > > > just relying on the default value of 20 being sufficient
> enough.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Lucas
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> >> > > > > > gharatmayures...@gmail.com
> >> > > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Lucas,
> >> > > > > > >
> >> > > > > > > Seems like the main intent here is to prioritize the
> >> controller
> >> > > > request
> >> > > > > > > over any other requests.
> >> > > > > > > In that case, we can change the request queue to a dequeue,
> >> where
> >> > > you
> >> > > > > > > always insert the normal requests (produce, consume,..etc)
> to
> >> the
> >> > > end
> >> > > > > of
> >> > > > > > > the dequeue, but if its a controller request, you insert it
> to
> >> > the
> >> > > > head
> >> > > > > > of
> >> > > > > > > the queue. This ensures that the controller request will be
> >> given
> >> > > > > higher
> >> > > > > > > priority over other requests.
> >> > > > > > >
> >> > > > > > > Also since we only read one request from the socket and mute
> >> it
> >> > and
> >> > > > > only
> >> > > > > > > unmute it after handling the request, this would ensure that
> >> we
> >> > > don't
> >> > > > > > > handle controller requests out of order.
> >> > > > > > >
> >> > > > > > > With this approach we can avoid the second queue and the
> >> > additional
> >> > > > > > config
> >> > > > > > > for the size of the queue.
> >> > > > > > >
> >> > > > > > > What do you think ?
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > >
> >> > > > > > > Mayuresh
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> >> becket....@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hey Joel,
> >> > > > > > > >
> >> > > > > > > > Thank for the detail explanation. I agree the current
> design
> >> > > makes
> >> > > > > > sense.
> >> > > > > > > > My confusion is about whether the new config for the
> >> controller
> >> > > > queue
> >> > > > > > > > capacity is necessary. I cannot think of a case in which
> >> users
> >> > > > would
> >> > > > > > > change
> >> > > > > > > > it.
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > >
> >> > > > > > > > Jiangjie (Becket) Qin
> >> > > > > > > >
> >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> >> > > becket....@gmail.com>
> >> > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Hi Lucas,
> >> > > > > > > > >
> >> > > > > > > > > I guess my question can be rephrased to "do we expect
> >> user to
> >> > > > ever
> >> > > > > > > change
> >> > > > > > > > > the controller request queue capacity"? If we agree that
> >> 20
> >> > is
> >> > > > > > already
> >> > > > > > > a
> >> > > > > > > > > very generous default number and we do not expect user
> to
> >> > > change
> >> > > > > it,
> >> > > > > > is
> >> > > > > > > > it
> >> > > > > > > > > still necessary to expose this as a config?
> >> > > > > > > > >
> >> > > > > > > > > Thanks,
> >> > > > > > > > >
> >> > > > > > > > > Jiangjie (Becket) Qin
> >> > > > > > > > >
> >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> >> > > > lucasatu...@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > >> @Becket
> >> > > > > > > > >> 1. Thanks for the comment. You are right that normally
> >> there
> >> > > > > should
> >> > > > > > be
> >> > > > > > > > >> just
> >> > > > > > > > >> one controller request because of muting,
> >> > > > > > > > >> and I had NOT intended to say there would be many
> >> enqueued
> >> > > > > > controller
> >> > > > > > > > >> requests.
> >> > > > > > > > >> I went through the KIP again, and I'm not sure which
> part
> >> > > > conveys
> >> > > > > > that
> >> > > > > > > > >> info.
> >> > > > > > > > >> I'd be happy to revise if you point it out the section.
> >> > > > > > > > >>
> >> > > > > > > > >> 2. Though it should not happen in normal conditions,
> the
> >> > > current
> >> > > > > > > design
> >> > > > > > > > >> does not preclude multiple controllers running
> >> > > > > > > > >> at the same time, hence if we don't have the controller
> >> > queue
> >> > > > > > capacity
> >> > > > > > > > >> config and simply make its capacity to be 1,
> >> > > > > > > > >> network threads handling requests from different
> >> controllers
> >> > > > will
> >> > > > > be
> >> > > > > > > > >> blocked during those troublesome times,
> >> > > > > > > > >> which is probably not what we want. On the other hand,
> >> > adding
> >> > > > the
> >> > > > > > > extra
> >> > > > > > > > >> config with a default value, say 20, guards us from
> >> issues
> >> > in
> >> > > > > those
> >> > > > > > > > >> troublesome times, and IMO there isn't much downside of
> >> > adding
> >> > > > the
> >> > > > > > > extra
> >> > > > > > > > >> config.
> >> > > > > > > > >>
> >> > > > > > > > >> @Mayuresh
> >> > > > > > > > >> Good catch, this sentence is an obsolete statement
> based
> >> on
> >> > a
> >> > > > > > previous
> >> > > > > > > > >> design. I've revised the wording in the KIP.
> >> > > > > > > > >>
> >> > > > > > > > >> Thanks,
> >> > > > > > > > >> Lucas
> >> > > > > > > > >>
> >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> >> > > > > > > > >> gharatmayures...@gmail.com> wrote:
> >> > > > > > > > >>
> >> > > > > > > > >> > Hi Lucas,
> >> > > > > > > > >> >
> >> > > > > > > > >> > Thanks for the KIP.
> >> > > > > > > > >> > I am trying to understand why you think "The memory
> >> > > > consumption
> >> > > > > > can
> >> > > > > > > > rise
> >> > > > > > > > >> > given the total number of queued requests can go up
> to
> >> 2x"
> >> > > in
> >> > > > > the
> >> > > > > > > > impact
> >> > > > > > > > >> > section. Normally the requests from controller to a
> >> Broker
> >> > > are
> >> > > > > not
> >> > > > > > > > high
> >> > > > > > > > >> > volume, right ?
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >> > Thanks,
> >> > > > > > > > >> >
> >> > > > > > > > >> > Mayuresh
> >> > > > > > > > >> >
> >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> >> > > > > becket....@gmail.com>
> >> > > > > > > > >> wrote:
> >> > > > > > > > >> >
> >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control
> >> plane
> >> > > from
> >> > > > > the
> >> > > > > > > > data
> >> > > > > > > > >> > plane
> >> > > > > > > > >> > > makes a lot of sense.
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > In the KIP you mentioned that the controller
> request
> >> > queue
> >> > > > may
> >> > > > > > > have
> >> > > > > > > > >> many
> >> > > > > > > > >> > > requests in it. Will this be a common case? The
> >> > controller
> >> > > > > > > requests
> >> > > > > > > > >> still
> >> > > > > > > > >> > > goes through the SocketServer. The SocketServer
> will
> >> > mute
> >> > > > the
> >> > > > > > > > channel
> >> > > > > > > > >> > once
> >> > > > > > > > >> > > a request is read and put into the request channel.
> >> So
> >> > > > > assuming
> >> > > > > > > > there
> >> > > > > > > > >> is
> >> > > > > > > > >> > > only one connection between controller and each
> >> broker,
> >> > on
> >> > > > the
> >> > > > > > > > broker
> >> > > > > > > > >> > side,
> >> > > > > > > > >> > > there should be only one controller request in the
> >> > > > controller
> >> > > > > > > > request
> >> > > > > > > > >> > queue
> >> > > > > > > > >> > > at any given time. If that is the case, do we need
> a
> >> > > > separate
> >> > > > > > > > >> controller
> >> > > > > > > > >> > > request queue capacity config? The default value 20
> >> > means
> >> > > > that
> >> > > > > > we
> >> > > > > > > > >> expect
> >> > > > > > > > >> > > there are 20 controller switches to happen in a
> short
> >> > > period
> >> > > > > of
> >> > > > > > > > time.
> >> > > > > > > > >> I
> >> > > > > > > > >> > am
> >> > > > > > > > >> > > not sure whether someone should increase the
> >> controller
> >> > > > > request
> >> > > > > > > > queue
> >> > > > > > > > >> > > capacity to handle such case, as it seems
> indicating
> >> > > > something
> >> > > > > > > very
> >> > > > > > > > >> wrong
> >> > > > > > > > >> > > has happened.
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > Thanks,
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > Jiangjie (Becket) Qin
> >> > > > > > > > >> > >
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> >> > > > > lindon...@gmail.com>
> >> > > > > > > > >> wrote:
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > > Thanks for the update Lucas.
> >> > > > > > > > >> > > >
> >> > > > > > > > >> > > > I think the motivation section is intuitive. It
> >> will
> >> > be
> >> > > > good
> >> > > > > > to
> >> > > > > > > > >> learn
> >> > > > > > > > >> > > more
> >> > > > > > > > >> > > > about the comments from other reviewers.
> >> > > > > > > > >> > > >
> >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> >> > > > > > > > lucasatu...@gmail.com>
> >> > > > > > > > >> > > wrote:
> >> > > > > > > > >> > > >
> >> > > > > > > > >> > > > > Hi Dong,
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > I've updated the motivation section of the KIP
> by
> >> > > > > explaining
> >> > > > > > > the
> >> > > > > > > > >> > cases
> >> > > > > > > > >> > > > that
> >> > > > > > > > >> > > > > would have user impacts.
> >> > > > > > > > >> > > > > Please take a look at let me know your
> comments.
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > Thanks,
> >> > > > > > > > >> > > > > Lucas
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> >> > > > > > > > lucasatu...@gmail.com
> >> > > > > > > > >> >
> >> > > > > > > > >> > > > wrote:
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > > Hi Dong,
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > The simulation of disk being slow is merely
> >> for me
> >> > > to
> >> > > > > > easily
> >> > > > > > > > >> > > construct
> >> > > > > > > > >> > > > a
> >> > > > > > > > >> > > > > > testing scenario
> >> > > > > > > > >> > > > > > with a backlog of produce requests. In
> >> production,
> >> > > > other
> >> > > > > > > than
> >> > > > > > > > >> the
> >> > > > > > > > >> > > disk
> >> > > > > > > > >> > > > > > being slow, a backlog of
> >> > > > > > > > >> > > > > > produce requests may also be caused by high
> >> > produce
> >> > > > QPS.
> >> > > > > > > > >> > > > > > In that case, we may not want to kill the
> >> broker
> >> > and
> >> > > > > > that's
> >> > > > > > > > when
> >> > > > > > > > >> > this
> >> > > > > > > > >> > > > KIP
> >> > > > > > > > >> > > > > > can be useful, both for JBOD
> >> > > > > > > > >> > > > > > and non-JBOD setup.
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > Going back to your previous question about
> each
> >> > > > > > > ProduceRequest
> >> > > > > > > > >> > > covering
> >> > > > > > > > >> > > > > 20
> >> > > > > > > > >> > > > > > partitions that are randomly
> >> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request
> >> is
> >> > > > > enqueued
> >> > > > > > > that
> >> > > > > > > > >> > tries
> >> > > > > > > > >> > > to
> >> > > > > > > > >> > > > > > switch the current broker, say broker0, from
> >> > leader
> >> > > to
> >> > > > > > > > follower
> >> > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*.
> For
> >> the
> >> > > > sake
> >> > > > > of
> >> > > > > > > > >> > argument,
> >> > > > > > > > >> > > > > > let's also assume the other brokers, say
> >> broker1,
> >> > > have
> >> > > > > > > > *stopped*
> >> > > > > > > > >> > > > fetching
> >> > > > > > > > >> > > > > > from
> >> > > > > > > > >> > > > > > the current broker, i.e. broker0.
> >> > > > > > > > >> > > > > > 1. If the enqueued produce requests have
> acks =
> >> > -1
> >> > > > > (ALL)
> >> > > > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests
> >> ahead
> >> > of
> >> > > > > > > > >> LeaderAndISR
> >> > > > > > > > >> > > will
> >> > > > > > > > >> > > > be
> >> > > > > > > > >> > > > > > put into the purgatory,
> >> > > > > > > > >> > > > > >         and since they'll never be replicated
> >> to
> >> > > other
> >> > > > > > > brokers
> >> > > > > > > > >> > > (because
> >> > > > > > > > >> > > > > of
> >> > > > > > > > >> > > > > > the assumption made above), they will
> >> > > > > > > > >> > > > > >         be completed either when the
> >> LeaderAndISR
> >> > > > > request
> >> > > > > > is
> >> > > > > > > > >> > > processed
> >> > > > > > > > >> > > > or
> >> > > > > > > > >> > > > > > when the timeout happens.
> >> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> >> > > > transition
> >> > > > > > the
> >> > > > > > > > >> > > partition
> >> > > > > > > > >> > > > > > test-0 to become a follower,
> >> > > > > > > > >> > > > > >         after the current broker sees the
> >> > > replication
> >> > > > of
> >> > > > > > the
> >> > > > > > > > >> > > remaining
> >> > > > > > > > >> > > > 19
> >> > > > > > > > >> > > > > > partitions, it can send a response indicating
> >> that
> >> > > > > > > > >> > > > > >         it's no longer the leader for the
> >> > "test-0".
> >> > > > > > > > >> > > > > >   To see the latency difference between 1.1
> and
> >> > 1.2,
> >> > > > > let's
> >> > > > > > > say
> >> > > > > > > > >> > there
> >> > > > > > > > >> > > > are
> >> > > > > > > > >> > > > > > 24K produce requests ahead of the
> LeaderAndISR,
> >> > and
> >> > > > > there
> >> > > > > > > are
> >> > > > > > > > 8
> >> > > > > > > > >> io
> >> > > > > > > > >> > > > > threads,
> >> > > > > > > > >> > > > > >   so each io thread will process
> approximately
> >> > 3000
> >> > > > > > produce
> >> > > > > > > > >> > requests.
> >> > > > > > > > >> > > > Now
> >> > > > > > > > >> > > > > > let's investigate the io thread that finally
> >> > > processed
> >> > > > > the
> >> > > > > > > > >> > > > LeaderAndISR.
> >> > > > > > > > >> > > > > >   For the 3000 produce requests, if we model
> >> the
> >> > > time
> >> > > > > when
> >> > > > > > > > their
> >> > > > > > > > >> > > > > remaining
> >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999,
> and
> >> > the
> >> > > > > > > > LeaderAndISR
> >> > > > > > > > >> > > > request
> >> > > > > > > > >> > > > > is
> >> > > > > > > > >> > > > > > processed at time t3000.
> >> > > > > > > > >> > > > > >   Without this KIP, the 1st produce request
> >> would
> >> > > have
> >> > > > > > > waited
> >> > > > > > > > an
> >> > > > > > > > >> > > extra
> >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an
> >> extra
> >> > > > time
> >> > > > > of
> >> > > > > > > > >> t3000 -
> >> > > > > > > > >> > > t1,
> >> > > > > > > > >> > > > > etc.
> >> > > > > > > > >> > > > > >   Roughly speaking, the latency difference is
> >> > bigger
> >> > > > for
> >> > > > > > the
> >> > > > > > > > >> > earlier
> >> > > > > > > > >> > > > > > produce requests than for the later ones. For
> >> the
> >> > > same
> >> > > > > > > reason,
> >> > > > > > > > >> the
> >> > > > > > > > >> > > more
> >> > > > > > > > >> > > > > > ProduceRequests queued
> >> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit
> >> we
> >> > get
> >> > > > > > (capped
> >> > > > > > > > by
> >> > > > > > > > >> the
> >> > > > > > > > >> > > > > > produce timeout).
> >> > > > > > > > >> > > > > > 2. If the enqueued produce requests have
> >> acks=0 or
> >> > > > > acks=1
> >> > > > > > > > >> > > > > >   There will be no latency differences in
> this
> >> > case,
> >> > > > but
> >> > > > > > > > >> > > > > >   2.1 without this KIP, the records of
> >> partition
> >> > > > test-0
> >> > > > > in
> >> > > > > > > the
> >> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR
> will
> >> be
> >> > > > > appended
> >> > > > > > > to
> >> > > > > > > > >> the
> >> > > > > > > > >> > > local
> >> > > > > > > > >> > > > > log,
> >> > > > > > > > >> > > > > >         and eventually be truncated after
> >> > processing
> >> > > > the
> >> > > > > > > > >> > > LeaderAndISR.
> >> > > > > > > > >> > > > > > This is what's referred to as
> >> > > > > > > > >> > > > > >         "some unofficial definition of data
> >> loss
> >> > in
> >> > > > > terms
> >> > > > > > of
> >> > > > > > > > >> > messages
> >> > > > > > > > >> > > > > > beyond the high watermark".
> >> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the
> effect
> >> > > since
> >> > > > if
> >> > > > > > the
> >> > > > > > > > >> > > > LeaderAndISR
> >> > > > > > > > >> > > > > > is immediately processed, the response to
> >> > producers
> >> > > > will
> >> > > > > > > have
> >> > > > > > > > >> > > > > >         the NotLeaderForPartition error,
> >> causing
> >> > > > > producers
> >> > > > > > > to
> >> > > > > > > > >> retry
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > This explanation above is the benefit for
> >> reducing
> >> > > the
> >> > > > > > > latency
> >> > > > > > > > >> of a
> >> > > > > > > > >> > > > > broker
> >> > > > > > > > >> > > > > > becoming the follower,
> >> > > > > > > > >> > > > > > closely related is reducing the latency of a
> >> > broker
> >> > > > > > becoming
> >> > > > > > > > the
> >> > > > > > > > >> > > > leader.
> >> > > > > > > > >> > > > > > In this case, the benefit is even more
> >> obvious, if
> >> > > > other
> >> > > > > > > > brokers
> >> > > > > > > > >> > have
> >> > > > > > > > >> > > > > > resigned leadership, and the
> >> > > > > > > > >> > > > > > current broker should take leadership. Any
> >> delay
> >> > in
> >> > > > > > > processing
> >> > > > > > > > >> the
> >> > > > > > > > >> > > > > > LeaderAndISR will be perceived
> >> > > > > > > > >> > > > > > by clients as unavailability. In extreme
> cases,
> >> > this
> >> > > > can
> >> > > > > > > cause
> >> > > > > > > > >> > failed
> >> > > > > > > > >> > > > > > produce requests if the retries are
> >> > > > > > > > >> > > > > > exhausted.
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > Another two types of controller requests are
> >> > > > > > UpdateMetadata
> >> > > > > > > > and
> >> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as
> >> > follows:
> >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed
> processing
> >> > > means
> >> > > > > > > clients
> >> > > > > > > > >> > > receiving
> >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong
> leadership
> >> > info
> >> > > > > > > > >> > > > > > for certain partitions, and the effect is
> more
> >> > > retries
> >> > > > > or
> >> > > > > > > even
> >> > > > > > > > >> > fatal
> >> > > > > > > > >> > > > > > failure if the retries are exhausted.
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > For StopReplica requests, a long queuing time
> >> may
> >> > > > > degrade
> >> > > > > > > the
> >> > > > > > > > >> > > > performance
> >> > > > > > > > >> > > > > > of topic deletion.
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > Regarding your last question of the delay for
> >> > > > > > > > >> > DescribeLogDirsRequest,
> >> > > > > > > > >> > > > you
> >> > > > > > > > >> > > > > > are right
> >> > > > > > > > >> > > > > > that this KIP cannot help with the latency in
> >> > > getting
> >> > > > > the
> >> > > > > > > log
> >> > > > > > > > >> dirs
> >> > > > > > > > >> > > > info,
> >> > > > > > > > >> > > > > > and it's only relevant
> >> > > > > > > > >> > > > > > when controller requests are involved.
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > Regards,
> >> > > > > > > > >> > > > > > Lucas
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> >> > > > > > > lindon...@gmail.com
> >> > > > > > > > >
> >> > > > > > > > >> > > wrote:
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> Hey Jun,
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> Thanks much for the comments. It is good
> >> point.
> >> > So
> >> > > > the
> >> > > > > > > > feature
> >> > > > > > > > >> may
> >> > > > > > > > >> > > be
> >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one
> question
> >> > > below.
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> Hey Lucas,
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> Do you think this feature is also useful for
> >> > > non-JBOD
> >> > > > > > setup
> >> > > > > > > > or
> >> > > > > > > > >> it
> >> > > > > > > > >> > is
> >> > > > > > > > >> > > > > only
> >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful
> to
> >> > > > > understand
> >> > > > > > > > this.
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in
> order
> >> to
> >> > > move
> >> > > > > > > leaders
> >> > > > > > > > >> on
> >> > > > > > > > >> > the
> >> > > > > > > > >> > > > > >> failed
> >> > > > > > > > >> > > > > >> disk to other disks, the system operator
> first
> >> > > needs
> >> > > > to
> >> > > > > > get
> >> > > > > > > > the
> >> > > > > > > > >> > list
> >> > > > > > > > >> > > > of
> >> > > > > > > > >> > > > > >> partitions on the failed disk. This is
> >> currently
> >> > > > > achieved
> >> > > > > > > > using
> >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> >> > > > > > > > >> DescribeLogDirsRequest
> >> > > > > > > > >> > to
> >> > > > > > > > >> > > > the
> >> > > > > > > > >> > > > > >> broker. If we only prioritize the controller
> >> > > > requests,
> >> > > > > > then
> >> > > > > > > > the
> >> > > > > > > > >> > > > > >> DescribeLogDirsRequest
> >> > > > > > > > >> > > > > >> may still take a long time to be processed
> by
> >> the
> >> > > > > broker.
> >> > > > > > > So
> >> > > > > > > > >> the
> >> > > > > > > > >> > > > overall
> >> > > > > > > > >> > > > > >> time to move leaders away from the failed
> disk
> >> > may
> >> > > > > still
> >> > > > > > be
> >> > > > > > > > >> long
> >> > > > > > > > >> > > even
> >> > > > > > > > >> > > > > with
> >> > > > > > > > >> > > > > >> this KIP. What do you think?
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> Thanks,
> >> > > > > > > > >> > > > > >> Dong
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> >> > > > > > > > >> lucasatu...@gmail.com
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > > > wrote:
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> >> > > > > > > > >> > > > > >> >
> >> > > > > > > > >> > > > > >> > @Dong,
> >> > > > > > > > >> > > > > >> > Since both of the two comments in your
> >> previous
> >> > > > email
> >> > > > > > are
> >> > > > > > > > >> about
> >> > > > > > > > >> > > the
> >> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's
> >> useful,
> >> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you
> agree
> >> > that
> >> > > > > this
> >> > > > > > > KIP
> >> > > > > > > > >> can
> >> > > > > > > > >> > be
> >> > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> >> > > > > > > > >> > > > > >> > Please let me know, thanks!
> >> > > > > > > > >> > > > > >> >
> >> > > > > > > > >> > > > > >> > Regards,
> >> > > > > > > > >> > > > > >> > Lucas
> >> > > > > > > > >> > > > > >> >
> >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> >> > > > > > > j...@confluent.io>
> >> > > > > > > > >> > wrote:
> >> > > > > > > > >> > > > > >> >
> >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one
> >> > probably
> >> > > > > > should
> >> > > > > > > > just
> >> > > > > > > > >> > kill
> >> > > > > > > > >> > > > the
> >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not
> >> help.
> >> > If
> >> > > > > only
> >> > > > > > > one
> >> > > > > > > > of
> >> > > > > > > > >> > the
> >> > > > > > > > >> > > > > disks
> >> > > > > > > > >> > > > > >> on
> >> > > > > > > > >> > > > > >> > a
> >> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail
> that
> >> > disk
> >> > > > and
> >> > > > > > move
> >> > > > > > > > the
> >> > > > > > > > >> > > > leaders
> >> > > > > > > > >> > > > > on
> >> > > > > > > > >> > > > > >> > that
> >> > > > > > > > >> > > > > >> > > disk to other brokers. In that case,
> being
> >> > able
> >> > > > to
> >> > > > > > > > process
> >> > > > > > > > >> the
> >> > > > > > > > >> > > > > >> > LeaderAndIsr
> >> > > > > > > > >> > > > > >> > > requests faster will potentially help
> the
> >> > > > producers
> >> > > > > > > > recover
> >> > > > > > > > >> > > > quicker.
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > Thanks,
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > Jun
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong
> Lin <
> >> > > > > > > > >> lindon...@gmail.com
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > > > wrote:
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > > Hey Lucas,
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
> >> > > questions
> >> > > > > > below.
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest
> >> covers
> >> > 20
> >> > > > > > > > partitions
> >> > > > > > > > >> > that
> >> > > > > > > > >> > > > are
> >> > > > > > > > >> > > > > >> > > randomly
> >> > > > > > > > >> > > > > >> > > > distributed across all partitions,
> then
> >> > each
> >> > > > > > > > >> ProduceRequest
> >> > > > > > > > >> > > will
> >> > > > > > > > >> > > > > >> likely
> >> > > > > > > > >> > > > > >> > > > cover some partitions for which the
> >> broker
> >> > is
> >> > > > > still
> >> > > > > > > > >> leader
> >> > > > > > > > >> > > after
> >> > > > > > > > >> > > > > it
> >> > > > > > > > >> > > > > >> > > quickly
> >> > > > > > > > >> > > > > >> > > > processes the
> >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will
> >> still
> >> > > be
> >> > > > > slow
> >> > > > > > > in
> >> > > > > > > > >> > > > processing
> >> > > > > > > > >> > > > > >> these
> >> > > > > > > > >> > > > > >> > > > ProduceRequest and request will still
> be
> >> > very
> >> > > > > high
> >> > > > > > > with
> >> > > > > > > > >> this
> >> > > > > > > > >> > > > KIP.
> >> > > > > > > > >> > > > > It
> >> > > > > > > > >> > > > > >> > > seems
> >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will still
> >> timeout
> >> > > > after
> >> > > > > > 30
> >> > > > > > > > >> > seconds.
> >> > > > > > > > >> > > Is
> >> > > > > > > > >> > > > > >> this
> >> > > > > > > > >> > > > > >> > > > understanding correct?
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest
> will
> >> > > still
> >> > > > > > > timeout
> >> > > > > > > > >> after
> >> > > > > > > > >> > > 30
> >> > > > > > > > >> > > > > >> > seconds,
> >> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP
> >> reduces
> >> > > > > average
> >> > > > > > > > >> produce
> >> > > > > > > > >> > > > > latency.
> >> > > > > > > > >> > > > > >> Can
> >> > > > > > > > >> > > > > >> > > you
> >> > > > > > > > >> > > > > >> > > > clarify what metrics can be improved
> by
> >> > this
> >> > > > KIP?
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > Not sure why system operator directly
> >> cares
> >> > > > > number
> >> > > > > > of
> >> > > > > > > > >> > > truncated
> >> > > > > > > > >> > > > > >> > messages.
> >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve
> average
> >> > > > > throughput
> >> > > > > > > or
> >> > > > > > > > >> > reduce
> >> > > > > > > > >> > > > > >> message
> >> > > > > > > > >> > > > > >> > > > duplication? It will be good to
> >> understand
> >> > > > this.
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > Thanks,
> >> > > > > > > > >> > > > > >> > > > Dong
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas
> >> Wang <
> >> > > > > > > > >> > > lucasatu...@gmail.com
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > wrote:
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > > Hi Dong,
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments.
> >> Please
> >> > > see
> >> > > > > my
> >> > > > > > > > reply
> >> > > > > > > > >> > > below.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
> >> > partition.
> >> > > > Now
> >> > > > > > > let's
> >> > > > > > > > >> > > consider
> >> > > > > > > > >> > > > a
> >> > > > > > > > >> > > > > >> more
> >> > > > > > > > >> > > > > >> > > > common
> >> > > > > > > > >> > > > > >> > > > > scenario
> >> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of many
> >> > > > partitions.
> >> > > > > > And
> >> > > > > > > > >> let's
> >> > > > > > > > >> > > say
> >> > > > > > > > >> > > > > for
> >> > > > > > > > >> > > > > >> > some
> >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> >> > > > > > > > >> > > > > >> > > > > The number of leader partitions on
> >> > broker0
> >> > > is
> >> > > > > so
> >> > > > > > > > large,
> >> > > > > > > > >> > say
> >> > > > > > > > >> > > > 10K,
> >> > > > > > > > >> > > > > >> that
> >> > > > > > > > >> > > > > >> > > the
> >> > > > > > > > >> > > > > >> > > > > cluster is skewed,
> >> > > > > > > > >> > > > > >> > > > > and the operator would like to shift
> >> the
> >> > > > > > leadership
> >> > > > > > > > >> for a
> >> > > > > > > > >> > > lot
> >> > > > > > > > >> > > > of
> >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other
> brokers,
> >> > > > > > > > >> > > > > >> > > > > either manually or through some
> >> service
> >> > > like
> >> > > > > > cruise
> >> > > > > > > > >> > control.
> >> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the
> >> > leadership
> >> > > > > > > > transitions
> >> > > > > > > > >> > > finish
> >> > > > > > > > >> > > > > >> more
> >> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
> >> > > becoming
> >> > > > > more
> >> > > > > > > > >> > balanced,
> >> > > > > > > > >> > > > > >> > > > > but all existing producers
> >> corresponding
> >> > to
> >> > > > the
> >> > > > > > 9K
> >> > > > > > > > >> > > partitions
> >> > > > > > > > >> > > > > will
> >> > > > > > > > >> > > > > >> > get
> >> > > > > > > > >> > > > > >> > > > the
> >> > > > > > > > >> > > > > >> > > > > errors relatively quickly
> >> > > > > > > > >> > > > > >> > > > > rather than relying on their
> timeout,
> >> > > thanks
> >> > > > to
> >> > > > > > the
> >> > > > > > > > >> > batched
> >> > > > > > > > >> > > > > async
> >> > > > > > > > >> > > > > >> ZK
> >> > > > > > > > >> > > > > >> > > > > operations.
> >> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to have
> >> > during
> >> > > > such
> >> > > > > > > > >> > troublesome
> >> > > > > > > > >> > > > > times.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc
> >> have
> >> > > > shown
> >> > > > > > > that
> >> > > > > > > > >> with
> >> > > > > > > > >> > > this
> >> > > > > > > > >> > > > > KIP
> >> > > > > > > > >> > > > > >> > many
> >> > > > > > > > >> > > > > >> > > > > producers
> >> > > > > > > > >> > > > > >> > > > > receive an explicit error
> >> > > > > NotLeaderForPartition,
> >> > > > > > > > based
> >> > > > > > > > >> on
> >> > > > > > > > >> > > > which
> >> > > > > > > > >> > > > > >> they
> >> > > > > > > > >> > > > > >> > > > retry
> >> > > > > > > > >> > > > > >> > > > > immediately.
> >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
> >> seconds+quick
> >> > > > retry)
> >> > > > > > for
> >> > > > > > > > >> their
> >> > > > > > > > >> > > > single
> >> > > > > > > > >> > > > > >> > > message
> >> > > > > > > > >> > > > > >> > > > is
> >> > > > > > > > >> > > > > >> > > > > much smaller
> >> > > > > > > > >> > > > > >> > > > > compared with the case of timing out
> >> > > without
> >> > > > > the
> >> > > > > > > KIP
> >> > > > > > > > >> (30
> >> > > > > > > > >> > > > seconds
> >> > > > > > > > >> > > > > >> for
> >> > > > > > > > >> > > > > >> > > > timing
> >> > > > > > > > >> > > > > >> > > > > out + quick retry).
> >> > > > > > > > >> > > > > >> > > > > One might argue that reducing the
> >> timing
> >> > > out
> >> > > > on
> >> > > > > > the
> >> > > > > > > > >> > producer
> >> > > > > > > > >> > > > > side
> >> > > > > > > > >> > > > > >> can
> >> > > > > > > > >> > > > > >> > > > > achieve the same result,
> >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> >> > > > > > drawbacks[1].
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to
> show
> >> the
> >> > > > > number
> >> > > > > > of
> >> > > > > > > > >> > > truncated
> >> > > > > > > > >> > > > > >> > messages
> >> > > > > > > > >> > > > > >> > > on
> >> > > > > > > > >> > > > > >> > > > > brokers,
> >> > > > > > > > >> > > > > >> > > > > with the experiments done in the
> >> Google
> >> > > Doc,
> >> > > > it
> >> > > > > > > > should
> >> > > > > > > > >> be
> >> > > > > > > > >> > > easy
> >> > > > > > > > >> > > > > to
> >> > > > > > > > >> > > > > >> see
> >> > > > > > > > >> > > > > >> > > > that
> >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
> >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> >> > > > up-to-date
> >> > > > > > > > >> metadata
> >> > > > > > > > >> > > > avoids
> >> > > > > > > > >> > > > > >> > > appending
> >> > > > > > > > >> > > > > >> > > > > of messages
> >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If
> we
> >> > talk
> >> > > > to a
> >> > > > > > > > system
> >> > > > > > > > >> > > > operator
> >> > > > > > > > >> > > > > >> and
> >> > > > > > > > >> > > > > >> > ask
> >> > > > > > > > >> > > > > >> > > > > whether
> >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I
> bet
> >> > most
> >> > > > > likely
> >> > > > > > > the
> >> > > > > > > > >> > answer
> >> > > > > > > > >> > > > is
> >> > > > > > > > >> > > > > >> yes.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I think
> it
> >> > > might
> >> > > > be
> >> > > > > > > > >> helpful to
> >> > > > > > > > >> > > > > >> construct
> >> > > > > > > > >> > > > > >> > > some
> >> > > > > > > > >> > > > > >> > > > > formulas.
> >> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going
> >> back
> >> > to
> >> > > > the
> >> > > > > > > case
> >> > > > > > > > >> where
> >> > > > > > > > >> > > > there
> >> > > > > > > > >> > > > > >> is
> >> > > > > > > > >> > > > > >> > > only
> >> > > > > > > > >> > > > > >> > > > > ONE partition involved.
> >> > > > > > > > >> > > > > >> > > > > Following the experiments in the
> >> Google
> >> > > Doc,
> >> > > > > > let's
> >> > > > > > > > say
> >> > > > > > > > >> > > broker0
> >> > > > > > > > >> > > > > >> > becomes
> >> > > > > > > > >> > > > > >> > > > the
> >> > > > > > > > >> > > > > >> > > > > follower at time t0,
> >> > > > > > > > >> > > > > >> > > > > and after t0 there were still N
> >> produce
> >> > > > > requests
> >> > > > > > in
> >> > > > > > > > its
> >> > > > > > > > >> > > > request
> >> > > > > > > > >> > > > > >> > queue.
> >> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought
> >> by
> >> > > this
> >> > > > > KIP,
> >> > > > > > > > >> broker0
> >> > > > > > > > >> > > can
> >> > > > > > > > >> > > > > >> reply
> >> > > > > > > > >> > > > > >> > > with
> >> > > > > > > > >> > > > > >> > > > an
> >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average
> >> > > processing
> >> > > > > > time
> >> > > > > > > of
> >> > > > > > > > >> > > replying
> >> > > > > > > > >> > > > > >> with
> >> > > > > > > > >> > > > > >> > > such
> >> > > > > > > > >> > > > > >> > > > an
> >> > > > > > > > >> > > > > >> > > > > error message.
> >> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will
> >> need to
> >> > > > > append
> >> > > > > > > > >> messages
> >> > > > > > > > >> > to
> >> > > > > > > > >> > > > > >> > segments,
> >> > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average
> >> > > processing
> >> > > > > > time
> >> > > > > > > > for
> >> > > > > > > > >> > such
> >> > > > > > > > >> > > > > logic.
> >> > > > > > > > >> > > > > >> > > > > Then the average extra latency
> >> incurred
> >> > > > without
> >> > > > > > > this
> >> > > > > > > > >> KIP
> >> > > > > > > > >> > is
> >> > > > > > > > >> > > N
> >> > > > > > > > >> > > > *
> >> > > > > > > > >> > > > > >> (M2 -
> >> > > > > > > > >> > > > > >> > > > M1) /
> >> > > > > > > > >> > > > > >> > > > > 2.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be
> >> larger
> >> > > than
> >> > > > > M1,
> >> > > > > > > > which
> >> > > > > > > > >> > means
> >> > > > > > > > >> > > > as
> >> > > > > > > > >> > > > > >> long
> >> > > > > > > > >> > > > > >> > > as N
> >> > > > > > > > >> > > > > >> > > > > is positive,
> >> > > > > > > > >> > > > > >> > > > > we would see improvements on the
> >> average
> >> > > > > latency.
> >> > > > > > > > >> > > > > >> > > > > There does not need to be
> significant
> >> > > backlog
> >> > > > > of
> >> > > > > > > > >> requests
> >> > > > > > > > >> > in
> >> > > > > > > > >> > > > the
> >> > > > > > > > >> > > > > >> > > request
> >> > > > > > > > >> > > > > >> > > > > queue,
> >> > > > > > > > >> > > > > >> > > > > or severe degradation of disk
> >> performance
> >> > > to
> >> > > > > have
> >> > > > > > > the
> >> > > > > > > > >> > > > > improvement.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > Regards,
> >> > > > > > > > >> > > > > >> > > > > Lucas
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the
> >> timeout on
> >> > > the
> >> > > > > > > > producer
> >> > > > > > > > >> > side
> >> > > > > > > > >> > > > can
> >> > > > > > > > >> > > > > >> > trigger
> >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> >> > > > > > > > >> > > > > >> > > > > when the corresponding leader broker
> >> is
> >> > > > > > overloaded,
> >> > > > > > > > >> > > > exacerbating
> >> > > > > > > > >> > > > > >> the
> >> > > > > > > > >> > > > > >> > > > > situation.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong
> >> Lin
> >> > <
> >> > > > > > > > >> > > lindon...@gmail.com
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > wrote:
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> >> > > documentation
> >> > > > of
> >> > > > > > the
> >> > > > > > > > >> > > > experiment.
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > Initially I also think having a
> >> > separate
> >> > > > > queue
> >> > > > > > > for
> >> > > > > > > > >> > > > controller
> >> > > > > > > > >> > > > > >> > > requests
> >> > > > > > > > >> > > > > >> > > > is
> >> > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned
> in
> >> the
> >> > > > > summary
> >> > > > > > > > >> section
> >> > > > > > > > >> > of
> >> > > > > > > > >> > > > the
> >> > > > > > > > >> > > > > >> > Google
> >> > > > > > > > >> > > > > >> > > > > doc,
> >> > > > > > > > >> > > > > >> > > > > > controller requests are generally
> >> more
> >> > > > > > important
> >> > > > > > > > than
> >> > > > > > > > >> > data
> >> > > > > > > > >> > > > > >> requests
> >> > > > > > > > >> > > > > >> > > and
> >> > > > > > > > >> > > > > >> > > > > we
> >> > > > > > > > >> > > > > >> > > > > > probably want controller requests
> >> to be
> >> > > > > > processed
> >> > > > > > > > >> > sooner.
> >> > > > > > > > >> > > > But
> >> > > > > > > > >> > > > > >> then
> >> > > > > > > > >> > > > > >> > > Eno
> >> > > > > > > > >> > > > > >> > > > > has
> >> > > > > > > > >> > > > > >> > > > > > two very good questions which I am
> >> not
> >> > > sure
> >> > > > > the
> >> > > > > > > > >> Google
> >> > > > > > > > >> > doc
> >> > > > > > > > >> > > > has
> >> > > > > > > > >> > > > > >> > > answered
> >> > > > > > > > >> > > > > >> > > > > > explicitly. Could you help with
> the
> >> > > > following
> >> > > > > > > > >> questions?
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is
> the
> >> > > actual
> >> > > > > > > benefit
> >> > > > > > > > of
> >> > > > > > > > >> > > > KIP-291
> >> > > > > > > > >> > > > > to
> >> > > > > > > > >> > > > > >> > > users.
> >> > > > > > > > >> > > > > >> > > > > The
> >> > > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> >> > > > simulates
> >> > > > > > the
> >> > > > > > > > >> > scenario
> >> > > > > > > > >> > > > that
> >> > > > > > > > >> > > > > >> > broker
> >> > > > > > > > >> > > > > >> > > > is
> >> > > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest
> >> due
> >> > to
> >> > > > e.g.
> >> > > > > > > slow
> >> > > > > > > > >> disk.
> >> > > > > > > > >> > > It
> >> > > > > > > > >> > > > > >> > currently
> >> > > > > > > > >> > > > > >> > > > > > assumes that there is only 1
> >> partition.
> >> > > But
> >> > > > > in
> >> > > > > > > the
> >> > > > > > > > >> > common
> >> > > > > > > > >> > > > > >> scenario,
> >> > > > > > > > >> > > > > >> > > it
> >> > > > > > > > >> > > > > >> > > > is
> >> > > > > > > > >> > > > > >> > > > > > probably reasonable to assume that
> >> > there
> >> > > > are
> >> > > > > > many
> >> > > > > > > > >> other
> >> > > > > > > > >> > > > > >> partitions
> >> > > > > > > > >> > > > > >> > > that
> >> > > > > > > > >> > > > > >> > > > > are
> >> > > > > > > > >> > > > > >> > > > > > also actively produced to and
> >> > > > ProduceRequest
> >> > > > > to
> >> > > > > > > > these
> >> > > > > > > > >> > > > > partition
> >> > > > > > > > >> > > > > >> > also
> >> > > > > > > > >> > > > > >> > > > > takes
> >> > > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So
> >> even
> >> > > if
> >> > > > > > > broker0
> >> > > > > > > > >> can
> >> > > > > > > > >> > > > become
> >> > > > > > > > >> > > > > >> > > follower
> >> > > > > > > > >> > > > > >> > > > > for
> >> > > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably
> >> still
> >> > > > needs
> >> > > > > > to
> >> > > > > > > > >> process
> >> > > > > > > > >> > > the
> >> > > > > > > > >> > > > > >> > > > > ProduceRequest
> >> > > > > > > > >> > > > > >> > > > > > slowly t in the queue because
> these
> >> > > > > > > ProduceRequests
> >> > > > > > > > >> > cover
> >> > > > > > > > >> > > > > other
> >> > > > > > > > >> > > > > >> > > > > partitions.
> >> > > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will
> still
> >> > > timeout
> >> > > > > > after
> >> > > > > > > > 30
> >> > > > > > > > >> > > seconds
> >> > > > > > > > >> > > > > and
> >> > > > > > > > >> > > > > >> > most
> >> > > > > > > > >> > > > > >> > > > > > clients will still likely timeout
> >> after
> >> > > 30
> >> > > > > > > seconds.
> >> > > > > > > > >> Then
> >> > > > > > > > >> > > it
> >> > > > > > > > >> > > > is
> >> > > > > > > > >> > > > > >> not
> >> > > > > > > > >> > > > > >> > > > > > obviously what is the benefit to
> >> client
> >> > > > since
> >> > > > > > > > client
> >> > > > > > > > >> > will
> >> > > > > > > > >> > > > > >> timeout
> >> > > > > > > > >> > > > > >> > > after
> >> > > > > > > > >> > > > > >> > > > > 30
> >> > > > > > > > >> > > > > >> > > > > > seconds before possibly
> >> re-connecting
> >> > to
> >> > > > > > broker1,
> >> > > > > > > > >> with
> >> > > > > > > > >> > or
> >> > > > > > > > >> > > > > >> without
> >> > > > > > > > >> > > > > >> > > > > KIP-291.
> >> > > > > > > > >> > > > > >> > > > > > Did I miss something here?
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
> >> > > specific
> >> > > > > > > > benefits
> >> > > > > > > > >> of
> >> > > > > > > > >> > > this
> >> > > > > > > > >> > > > > >> KIP to
> >> > > > > > > > >> > > > > >> > > > user
> >> > > > > > > > >> > > > > >> > > > > or
> >> > > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether
> >> this
> >> > > KIP
> >> > > > > > > > decreases
> >> > > > > > > > >> > > > average
> >> > > > > > > > >> > > > > >> > > latency,
> >> > > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably
> >> of
> >> > > > > exception
> >> > > > > > > > >> exposed
> >> > > > > > > > >> > to
> >> > > > > > > > >> > > > > >> client
> >> > > > > > > > >> > > > > >> > > etc.
> >> > > > > > > > >> > > > > >> > > > It
> >> > > > > > > > >> > > > > >> > > > > > is probably useful to clarify
> this.
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> >> > > > experience
> >> > > > > > > only
> >> > > > > > > > >> when
> >> > > > > > > > >> > > > there
> >> > > > > > > > >> > > > > is
> >> > > > > > > > >> > > > > >> > > issue
> >> > > > > > > > >> > > > > >> > > > > with
> >> > > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog
> in
> >> the
> >> > > > > request
> >> > > > > > > > queue
> >> > > > > > > > >> > due
> >> > > > > > > > >> > > to
> >> > > > > > > > >> > > > > >> slow
> >> > > > > > > > >> > > > > >> > > disk
> >> > > > > > > > >> > > > > >> > > > as
> >> > > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is
> >> this
> >> > > KIP
> >> > > > > > also
> >> > > > > > > > >> useful
> >> > > > > > > > >> > > when
> >> > > > > > > > >> > > > > >> there
> >> > > > > > > > >> > > > > >> > is
> >> > > > > > > > >> > > > > >> > > > no
> >> > > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It
> >> might
> >> > be
> >> > > > > > helpful
> >> > > > > > > > to
> >> > > > > > > > >> > > clarify
> >> > > > > > > > >> > > > > >> this
> >> > > > > > > > >> > > > > >> > to
> >> > > > > > > > >> > > > > >> > > > > > understand the benefit of this
> KIP.
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > Thanks much,
> >> > > > > > > > >> > > > > >> > > > > > Dong
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM,
> >> Lucas
> >> > > > Wang <
> >> > > > > > > > >> > > > > >> lucasatu...@gmail.com
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > > > wrote:
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > Hi Eno,
> >> > > > > > > > >> > > > > >> > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting
> the
> >> > > > > experiment
> >> > > > > > > > >> results.
> >> > > > > > > > >> > > > > >> > > > > > > Here is a link to the positive
> >> impact
> >> > > > > > achieved
> >> > > > > > > by
> >> > > > > > > > >> > > > > implementing
> >> > > > > > > > >> > > > > >> > the
> >> > > > > > > > >> > > > > >> > > > > > proposed
> >> > > > > > > > >> > > > > >> > > > > > > change:
> >> > > > > > > > >> > > > > >> > > > > > >
> >> https://docs.google.com/document/d/
> >> > > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> >> > > > > > > > >> > > > > >> > > > > > >
> >> FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> >> > > > > > > > >> > > > > >> > > > > > > Please take a look when you have
> >> time
> >> > > and
> >> > > > > let
> >> > > > > > > me
> >> > > > > > > > >> know
> >> > > > > > > > >> > > your
> >> > > > > > > > >> > > > > >> > > feedback.
> >> > > > > > > > >> > > > > >> > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > Regards,
> >> > > > > > > > >> > > > > >> > > > > > > Lucas
> >> > > > > > > > >> > > > > >> > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
> >> > > Harsha <
> >> > > > > > > > >> > > ka...@harsha.io>
> >> > > > > > > > >> > > > > >> wrote:
> >> > > > > > > > >> > > > > >> > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will
> >> take a
> >> > > > look
> >> > > > > > > might
> >> > > > > > > > >> suit
> >> > > > > > > > >> > > our
> >> > > > > > > > >> > > > > >> > > > requirements
> >> > > > > > > > >> > > > > >> > > > > > > > better.
> >> > > > > > > > >> > > > > >> > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > Thanks,
> >> > > > > > > > >> > > > > >> > > > > > > > Harsha
> >> > > > > > > > >> > > > > >> > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52
> >> PM,
> >> > > > Lucas
> >> > > > > > > Wang <
> >> > > > > > > > >> > > > > >> > > > lucasatu...@gmail.com
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > wrote:
> >> > > > > > > > >> > > > > >> > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly,
> the
> >> > > > > > replication
> >> > > > > > > > >> quota
> >> > > > > > > > >> > > > > mechanism
> >> > > > > > > > >> > > > > >> > > > proposed
> >> > > > > > > > >> > > > > >> > > > > > in
> >> > > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in
> that
> >> > > > scenario.
> >> > > > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > > Thanks,
> >> > > > > > > > >> > > > > >> > > > > > > > > Lucas
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >>
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>

Reply via email to