Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Mayuresh Gharat Thu, 19 Jul 2018 12:12:40 -0700

Actually nvm, correlationId is reset in case of connection loss, I think.

Thanks,


Mayuresh

On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <[email protected]>
wrote:

> I agree with Dong that out-of-order processing can happen with having 2
> separate queues as well and it can even happen today.
> Can we use the correlationId in the request from the controller to the
> broker to handle ordering ?
>
> Thanks,
>
> Mayuresh
>
>
> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <[email protected]> wrote:
>
>> Good point, Joel. I agree that a dedicated controller request handling
>> thread would be a better isolation. It also solves the reordering issue.
>>
>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <[email protected]> wrote:
>>
>> > Good example. I think this scenario can occur in the current code as
>> well
>> > but with even lower probability given that there are other
>> non-controller
>> > requests interleaved. It is still sketchy though and I think a safer
>> > approach would be separate queues and pinning controller request
>> handling
>> > to one handler thread.
>> >
>> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <[email protected]> wrote:
>> >
>> > > Hey Becket,
>> > >
>> > > I think you are right that there may be out-of-order processing.
>> However,
>> > > it seems that out-of-order processing may also happen even if we use a
>> > > separate queue.
>> > >
>> > > Here is the example:
>> > >
>> > > - Controller sends R1 and got disconnected before receiving response.
>> > Then
>> > > it reconnects and sends R2. Both requests now stay in the controller
>> > > request queue in the order they are sent.
>> > > - thread1 takes R1_a from the request queue and then thread2 takes R2
>> > from
>> > > the request queue almost at the same time.
>> > > - So R1_a and R2 are processed in parallel. There is chance that R2's
>> > > processing is completed before R1.
>> > >
>> > > If out-of-order processing can happen for both approaches with very
>> low
>> > > probability, it may not be worthwhile to add the extra queue. What do
>> you
>> > > think?
>> > >
>> > > Thanks,
>> > > Dong
>> > >
>> > >
>> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <[email protected]>
>> > wrote:
>> > >
>> > > > Hi Mayuresh/Joel,
>> > > >
>> > > > Using the request channel as a dequeue was bright up some time ago
>> when
>> > > we
>> > > > initially thinking of prioritizing the request. The concern was that
>> > the
>> > > > controller requests are supposed to be processed in order. If we can
>> > > ensure
>> > > > that there is one controller request in the request channel, the
>> order
>> > is
>> > > > not a concern. But in cases that there are more than one controller
>> > > request
>> > > > inserted into the queue, the controller request order may change and
>> > > cause
>> > > > problem. For example, think about the following sequence:
>> > > > 1. Controller successfully sent a request R1 to broker
>> > > > 2. Broker receives R1 and put the request to the head of the request
>> > > queue.
>> > > > 3. Controller to broker connection failed and the controller
>> > reconnected
>> > > to
>> > > > the broker.
>> > > > 4. Controller sends a request R2 to the broker
>> > > > 5. Broker receives R2 and add it to the head of the request queue.
>> > > > Now on the broker side, R2 will be processed before R1 is processed,
>> > > which
>> > > > may cause problem.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jiangjie (Becket) Qin
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <[email protected]>
>> > wrote:
>> > > >
>> > > > > @Mayuresh - I like your idea. It appears to be a simpler less
>> > invasive
>> > > > > alternative and it should work. Jun/Becket/others, do you see any
>> > > > pitfalls
>> > > > > with this approach?
>> > > > >
>> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
>> [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > @Mayuresh,
>> > > > > > That's a very interesting idea that I haven't thought before.
>> > > > > > It seems to solve our problem at hand pretty well, and also
>> > > > > > avoids the need to have a new size metric and capacity config
>> > > > > > for the controller request queue. In fact, if we were to adopt
>> > > > > > this design, there is no public interface change, and we
>> > > > > > probably don't need a KIP.
>> > > > > > Also implementation wise, it seems
>> > > > > > the java class LinkedBlockingQueue can readily satisfy the
>> > > requirement
>> > > > > > by supporting a capacity, and also allowing inserting at both
>> ends.
>> > > > > >
>> > > > > > My only concern is that this design is tied to the coincidence
>> that
>> > > > > > we have two request priorities and there are two ends to a
>> deque.
>> > > > > > Hence by using the proposed design, it seems the network layer
>> is
>> > > > > > more tightly coupled with upper layer logic, e.g. if we were to
>> add
>> > > > > > an extra priority level in the future for some reason, we would
>> > > > probably
>> > > > > > need to go back to the design of separate queues, one for each
>> > > priority
>> > > > > > level.
>> > > > > >
>> > > > > > In summary, I'm ok with both designs and lean toward your
>> suggested
>> > > > > > approach.
>> > > > > > Let's hear what others think.
>> > > > > >
>> > > > > > @Becket,
>> > > > > > In light of Mayuresh's suggested new design, I'm answering your
>> > > > question
>> > > > > > only in the context
>> > > > > > of the current KIP design: I think your suggestion makes sense,
>> and
>> > > I'm
>> > > > > ok
>> > > > > > with removing the capacity config and
>> > > > > > just relying on the default value of 20 being sufficient enough.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Lucas
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
>> > > > > > [email protected]
>> > > > > > > wrote:
>> > > > > >
>> > > > > > > Hi Lucas,
>> > > > > > >
>> > > > > > > Seems like the main intent here is to prioritize the
>> controller
>> > > > request
>> > > > > > > over any other requests.
>> > > > > > > In that case, we can change the request queue to a dequeue,
>> where
>> > > you
>> > > > > > > always insert the normal requests (produce, consume,..etc) to
>> the
>> > > end
>> > > > > of
>> > > > > > > the dequeue, but if its a controller request, you insert it to
>> > the
>> > > > head
>> > > > > > of
>> > > > > > > the queue. This ensures that the controller request will be
>> given
>> > > > > higher
>> > > > > > > priority over other requests.
>> > > > > > >
>> > > > > > > Also since we only read one request from the socket and mute
>> it
>> > and
>> > > > > only
>> > > > > > > unmute it after handling the request, this would ensure that
>> we
>> > > don't
>> > > > > > > handle controller requests out of order.
>> > > > > > >
>> > > > > > > With this approach we can avoid the second queue and the
>> > additional
>> > > > > > config
>> > > > > > > for the size of the queue.
>> > > > > > >
>> > > > > > > What do you think ?
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Mayuresh
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
>> [email protected]
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Hey Joel,
>> > > > > > > >
>> > > > > > > > Thank for the detail explanation. I agree the current design
>> > > makes
>> > > > > > sense.
>> > > > > > > > My confusion is about whether the new config for the
>> controller
>> > > > queue
>> > > > > > > > capacity is necessary. I cannot think of a case in which
>> users
>> > > > would
>> > > > > > > change
>> > > > > > > > it.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > > Jiangjie (Becket) Qin
>> > > > > > > >
>> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
>> > > [email protected]>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Lucas,
>> > > > > > > > >
>> > > > > > > > > I guess my question can be rephrased to "do we expect
>> user to
>> > > > ever
>> > > > > > > change
>> > > > > > > > > the controller request queue capacity"? If we agree that
>> 20
>> > is
>> > > > > > already
>> > > > > > > a
>> > > > > > > > > very generous default number and we do not expect user to
>> > > change
>> > > > > it,
>> > > > > > is
>> > > > > > > > it
>> > > > > > > > > still necessary to expose this as a config?
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >
>> > > > > > > > > Jiangjie (Becket) Qin
>> > > > > > > > >
>> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
>> > > > [email protected]
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > >> @Becket
>> > > > > > > > >> 1. Thanks for the comment. You are right that normally
>> there
>> > > > > should
>> > > > > > be
>> > > > > > > > >> just
>> > > > > > > > >> one controller request because of muting,
>> > > > > > > > >> and I had NOT intended to say there would be many
>> enqueued
>> > > > > > controller
>> > > > > > > > >> requests.
>> > > > > > > > >> I went through the KIP again, and I'm not sure which part
>> > > > conveys
>> > > > > > that
>> > > > > > > > >> info.
>> > > > > > > > >> I'd be happy to revise if you point it out the section.
>> > > > > > > > >>
>> > > > > > > > >> 2. Though it should not happen in normal conditions, the
>> > > current
>> > > > > > > design
>> > > > > > > > >> does not preclude multiple controllers running
>> > > > > > > > >> at the same time, hence if we don't have the controller
>> > queue
>> > > > > > capacity
>> > > > > > > > >> config and simply make its capacity to be 1,
>> > > > > > > > >> network threads handling requests from different
>> controllers
>> > > > will
>> > > > > be
>> > > > > > > > >> blocked during those troublesome times,
>> > > > > > > > >> which is probably not what we want. On the other hand,
>> > adding
>> > > > the
>> > > > > > > extra
>> > > > > > > > >> config with a default value, say 20, guards us from
>> issues
>> > in
>> > > > > those
>> > > > > > > > >> troublesome times, and IMO there isn't much downside of
>> > adding
>> > > > the
>> > > > > > > extra
>> > > > > > > > >> config.
>> > > > > > > > >>
>> > > > > > > > >> @Mayuresh
>> > > > > > > > >> Good catch, this sentence is an obsolete statement based
>> on
>> > a
>> > > > > > previous
>> > > > > > > > >> design. I've revised the wording in the KIP.
>> > > > > > > > >>
>> > > > > > > > >> Thanks,
>> > > > > > > > >> Lucas
>> > > > > > > > >>
>> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
>> > > > > > > > >> [email protected]> wrote:
>> > > > > > > > >>
>> > > > > > > > >> > Hi Lucas,
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks for the KIP.
>> > > > > > > > >> > I am trying to understand why you think "The memory
>> > > > consumption
>> > > > > > can
>> > > > > > > > rise
>> > > > > > > > >> > given the total number of queued requests can go up to
>> 2x"
>> > > in
>> > > > > the
>> > > > > > > > impact
>> > > > > > > > >> > section. Normally the requests from controller to a
>> Broker
>> > > are
>> > > > > not
>> > > > > > > > high
>> > > > > > > > >> > volume, right ?
>> > > > > > > > >> >
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks,
>> > > > > > > > >> >
>> > > > > > > > >> > Mayuresh
>> > > > > > > > >> >
>> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
>> > > > > [email protected]>
>> > > > > > > > >> wrote:
>> > > > > > > > >> >
>> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control
>> plane
>> > > from
>> > > > > the
>> > > > > > > > data
>> > > > > > > > >> > plane
>> > > > > > > > >> > > makes a lot of sense.
>> > > > > > > > >> > >
>> > > > > > > > >> > > In the KIP you mentioned that the controller request
>> > queue
>> > > > may
>> > > > > > > have
>> > > > > > > > >> many
>> > > > > > > > >> > > requests in it. Will this be a common case? The
>> > controller
>> > > > > > > requests
>> > > > > > > > >> still
>> > > > > > > > >> > > goes through the SocketServer. The SocketServer will
>> > mute
>> > > > the
>> > > > > > > > channel
>> > > > > > > > >> > once
>> > > > > > > > >> > > a request is read and put into the request channel.
>> So
>> > > > > assuming
>> > > > > > > > there
>> > > > > > > > >> is
>> > > > > > > > >> > > only one connection between controller and each
>> broker,
>> > on
>> > > > the
>> > > > > > > > broker
>> > > > > > > > >> > side,
>> > > > > > > > >> > > there should be only one controller request in the
>> > > > controller
>> > > > > > > > request
>> > > > > > > > >> > queue
>> > > > > > > > >> > > at any given time. If that is the case, do we need a
>> > > > separate
>> > > > > > > > >> controller
>> > > > > > > > >> > > request queue capacity config? The default value 20
>> > means
>> > > > that
>> > > > > > we
>> > > > > > > > >> expect
>> > > > > > > > >> > > there are 20 controller switches to happen in a short
>> > > period
>> > > > > of
>> > > > > > > > time.
>> > > > > > > > >> I
>> > > > > > > > >> > am
>> > > > > > > > >> > > not sure whether someone should increase the
>> controller
>> > > > > request
>> > > > > > > > queue
>> > > > > > > > >> > > capacity to handle such case, as it seems indicating
>> > > > something
>> > > > > > > very
>> > > > > > > > >> wrong
>> > > > > > > > >> > > has happened.
>> > > > > > > > >> > >
>> > > > > > > > >> > > Thanks,
>> > > > > > > > >> > >
>> > > > > > > > >> > > Jiangjie (Becket) Qin
>> > > > > > > > >> > >
>> > > > > > > > >> > >
>> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
>> > > > > [email protected]>
>> > > > > > > > >> wrote:
>> > > > > > > > >> > >
>> > > > > > > > >> > > > Thanks for the update Lucas.
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > I think the motivation section is intuitive. It
>> will
>> > be
>> > > > good
>> > > > > > to
>> > > > > > > > >> learn
>> > > > > > > > >> > > more
>> > > > > > > > >> > > > about the comments from other reviewers.
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
>> > > > > > > > [email protected]>
>> > > > > > > > >> > > wrote:
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > > Hi Dong,
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > I've updated the motivation section of the KIP by
>> > > > > explaining
>> > > > > > > the
>> > > > > > > > >> > cases
>> > > > > > > > >> > > > that
>> > > > > > > > >> > > > > would have user impacts.
>> > > > > > > > >> > > > > Please take a look at let me know your comments.
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > Thanks,
>> > > > > > > > >> > > > > Lucas
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
>> > > > > > > > [email protected]
>> > > > > > > > >> >
>> > > > > > > > >> > > > wrote:
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > > Hi Dong,
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > The simulation of disk being slow is merely
>> for me
>> > > to
>> > > > > > easily
>> > > > > > > > >> > > construct
>> > > > > > > > >> > > > a
>> > > > > > > > >> > > > > > testing scenario
>> > > > > > > > >> > > > > > with a backlog of produce requests. In
>> production,
>> > > > other
>> > > > > > > than
>> > > > > > > > >> the
>> > > > > > > > >> > > disk
>> > > > > > > > >> > > > > > being slow, a backlog of
>> > > > > > > > >> > > > > > produce requests may also be caused by high
>> > produce
>> > > > QPS.
>> > > > > > > > >> > > > > > In that case, we may not want to kill the
>> broker
>> > and
>> > > > > > that's
>> > > > > > > > when
>> > > > > > > > >> > this
>> > > > > > > > >> > > > KIP
>> > > > > > > > >> > > > > > can be useful, both for JBOD
>> > > > > > > > >> > > > > > and non-JBOD setup.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > Going back to your previous question about each
>> > > > > > > ProduceRequest
>> > > > > > > > >> > > covering
>> > > > > > > > >> > > > > 20
>> > > > > > > > >> > > > > > partitions that are randomly
>> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request
>> is
>> > > > > enqueued
>> > > > > > > that
>> > > > > > > > >> > tries
>> > > > > > > > >> > > to
>> > > > > > > > >> > > > > > switch the current broker, say broker0, from
>> > leader
>> > > to
>> > > > > > > > follower
>> > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For
>> the
>> > > > sake
>> > > > > of
>> > > > > > > > >> > argument,
>> > > > > > > > >> > > > > > let's also assume the other brokers, say
>> broker1,
>> > > have
>> > > > > > > > *stopped*
>> > > > > > > > >> > > > fetching
>> > > > > > > > >> > > > > > from
>> > > > > > > > >> > > > > > the current broker, i.e. broker0.
>> > > > > > > > >> > > > > > 1. If the enqueued produce requests have acks =
>> > -1
>> > > > > (ALL)
>> > > > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests
>> ahead
>> > of
>> > > > > > > > >> LeaderAndISR
>> > > > > > > > >> > > will
>> > > > > > > > >> > > > be
>> > > > > > > > >> > > > > > put into the purgatory,
>> > > > > > > > >> > > > > >         and since they'll never be replicated
>> to
>> > > other
>> > > > > > > brokers
>> > > > > > > > >> > > (because
>> > > > > > > > >> > > > > of
>> > > > > > > > >> > > > > > the assumption made above), they will
>> > > > > > > > >> > > > > >         be completed either when the
>> LeaderAndISR
>> > > > > request
>> > > > > > is
>> > > > > > > > >> > > processed
>> > > > > > > > >> > > > or
>> > > > > > > > >> > > > > > when the timeout happens.
>> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
>> > > > transition
>> > > > > > the
>> > > > > > > > >> > > partition
>> > > > > > > > >> > > > > > test-0 to become a follower,
>> > > > > > > > >> > > > > >         after the current broker sees the
>> > > replication
>> > > > of
>> > > > > > the
>> > > > > > > > >> > > remaining
>> > > > > > > > >> > > > 19
>> > > > > > > > >> > > > > > partitions, it can send a response indicating
>> that
>> > > > > > > > >> > > > > >         it's no longer the leader for the
>> > "test-0".
>> > > > > > > > >> > > > > >   To see the latency difference between 1.1 and
>> > 1.2,
>> > > > > let's
>> > > > > > > say
>> > > > > > > > >> > there
>> > > > > > > > >> > > > are
>> > > > > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR,
>> > and
>> > > > > there
>> > > > > > > are
>> > > > > > > > 8
>> > > > > > > > >> io
>> > > > > > > > >> > > > > threads,
>> > > > > > > > >> > > > > >   so each io thread will process approximately
>> > 3000
>> > > > > > produce
>> > > > > > > > >> > requests.
>> > > > > > > > >> > > > Now
>> > > > > > > > >> > > > > > let's investigate the io thread that finally
>> > > processed
>> > > > > the
>> > > > > > > > >> > > > LeaderAndISR.
>> > > > > > > > >> > > > > >   For the 3000 produce requests, if we model
>> the
>> > > time
>> > > > > when
>> > > > > > > > their
>> > > > > > > > >> > > > > remaining
>> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and
>> > the
>> > > > > > > > LeaderAndISR
>> > > > > > > > >> > > > request
>> > > > > > > > >> > > > > is
>> > > > > > > > >> > > > > > processed at time t3000.
>> > > > > > > > >> > > > > >   Without this KIP, the 1st produce request
>> would
>> > > have
>> > > > > > > waited
>> > > > > > > > an
>> > > > > > > > >> > > extra
>> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an
>> extra
>> > > > time
>> > > > > of
>> > > > > > > > >> t3000 -
>> > > > > > > > >> > > t1,
>> > > > > > > > >> > > > > etc.
>> > > > > > > > >> > > > > >   Roughly speaking, the latency difference is
>> > bigger
>> > > > for
>> > > > > > the
>> > > > > > > > >> > earlier
>> > > > > > > > >> > > > > > produce requests than for the later ones. For
>> the
>> > > same
>> > > > > > > reason,
>> > > > > > > > >> the
>> > > > > > > > >> > > more
>> > > > > > > > >> > > > > > ProduceRequests queued
>> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit
>> we
>> > get
>> > > > > > (capped
>> > > > > > > > by
>> > > > > > > > >> the
>> > > > > > > > >> > > > > > produce timeout).
>> > > > > > > > >> > > > > > 2. If the enqueued produce requests have
>> acks=0 or
>> > > > > acks=1
>> > > > > > > > >> > > > > >   There will be no latency differences in this
>> > case,
>> > > > but
>> > > > > > > > >> > > > > >   2.1 without this KIP, the records of
>> partition
>> > > > test-0
>> > > > > in
>> > > > > > > the
>> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will
>> be
>> > > > > appended
>> > > > > > > to
>> > > > > > > > >> the
>> > > > > > > > >> > > local
>> > > > > > > > >> > > > > log,
>> > > > > > > > >> > > > > >         and eventually be truncated after
>> > processing
>> > > > the
>> > > > > > > > >> > > LeaderAndISR.
>> > > > > > > > >> > > > > > This is what's referred to as
>> > > > > > > > >> > > > > >         "some unofficial definition of data
>> loss
>> > in
>> > > > > terms
>> > > > > > of
>> > > > > > > > >> > messages
>> > > > > > > > >> > > > > > beyond the high watermark".
>> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect
>> > > since
>> > > > if
>> > > > > > the
>> > > > > > > > >> > > > LeaderAndISR
>> > > > > > > > >> > > > > > is immediately processed, the response to
>> > producers
>> > > > will
>> > > > > > > have
>> > > > > > > > >> > > > > >         the NotLeaderForPartition error,
>> causing
>> > > > > producers
>> > > > > > > to
>> > > > > > > > >> retry
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > This explanation above is the benefit for
>> reducing
>> > > the
>> > > > > > > latency
>> > > > > > > > >> of a
>> > > > > > > > >> > > > > broker
>> > > > > > > > >> > > > > > becoming the follower,
>> > > > > > > > >> > > > > > closely related is reducing the latency of a
>> > broker
>> > > > > > becoming
>> > > > > > > > the
>> > > > > > > > >> > > > leader.
>> > > > > > > > >> > > > > > In this case, the benefit is even more
>> obvious, if
>> > > > other
>> > > > > > > > brokers
>> > > > > > > > >> > have
>> > > > > > > > >> > > > > > resigned leadership, and the
>> > > > > > > > >> > > > > > current broker should take leadership. Any
>> delay
>> > in
>> > > > > > > processing
>> > > > > > > > >> the
>> > > > > > > > >> > > > > > LeaderAndISR will be perceived
>> > > > > > > > >> > > > > > by clients as unavailability. In extreme cases,
>> > this
>> > > > can
>> > > > > > > cause
>> > > > > > > > >> > failed
>> > > > > > > > >> > > > > > produce requests if the retries are
>> > > > > > > > >> > > > > > exhausted.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > Another two types of controller requests are
>> > > > > > UpdateMetadata
>> > > > > > > > and
>> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as
>> > follows:
>> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed processing
>> > > means
>> > > > > > > clients
>> > > > > > > > >> > > receiving
>> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership
>> > info
>> > > > > > > > >> > > > > > for certain partitions, and the effect is more
>> > > retries
>> > > > > or
>> > > > > > > even
>> > > > > > > > >> > fatal
>> > > > > > > > >> > > > > > failure if the retries are exhausted.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > For StopReplica requests, a long queuing time
>> may
>> > > > > degrade
>> > > > > > > the
>> > > > > > > > >> > > > performance
>> > > > > > > > >> > > > > > of topic deletion.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > Regarding your last question of the delay for
>> > > > > > > > >> > DescribeLogDirsRequest,
>> > > > > > > > >> > > > you
>> > > > > > > > >> > > > > > are right
>> > > > > > > > >> > > > > > that this KIP cannot help with the latency in
>> > > getting
>> > > > > the
>> > > > > > > log
>> > > > > > > > >> dirs
>> > > > > > > > >> > > > info,
>> > > > > > > > >> > > > > > and it's only relevant
>> > > > > > > > >> > > > > > when controller requests are involved.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > Regards,
>> > > > > > > > >> > > > > > Lucas
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
>> > > > > > > [email protected]
>> > > > > > > > >
>> > > > > > > > >> > > wrote:
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > >> Hey Jun,
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> Thanks much for the comments. It is good
>> point.
>> > So
>> > > > the
>> > > > > > > > feature
>> > > > > > > > >> may
>> > > > > > > > >> > > be
>> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one question
>> > > below.
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> Hey Lucas,
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> Do you think this feature is also useful for
>> > > non-JBOD
>> > > > > > setup
>> > > > > > > > or
>> > > > > > > > >> it
>> > > > > > > > >> > is
>> > > > > > > > >> > > > > only
>> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
>> > > > > understand
>> > > > > > > > this.
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in order
>> to
>> > > move
>> > > > > > > leaders
>> > > > > > > > >> on
>> > > > > > > > >> > the
>> > > > > > > > >> > > > > >> failed
>> > > > > > > > >> > > > > >> disk to other disks, the system operator first
>> > > needs
>> > > > to
>> > > > > > get
>> > > > > > > > the
>> > > > > > > > >> > list
>> > > > > > > > >> > > > of
>> > > > > > > > >> > > > > >> partitions on the failed disk. This is
>> currently
>> > > > > achieved
>> > > > > > > > using
>> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
>> > > > > > > > >> DescribeLogDirsRequest
>> > > > > > > > >> > to
>> > > > > > > > >> > > > the
>> > > > > > > > >> > > > > >> broker. If we only prioritize the controller
>> > > > requests,
>> > > > > > then
>> > > > > > > > the
>> > > > > > > > >> > > > > >> DescribeLogDirsRequest
>> > > > > > > > >> > > > > >> may still take a long time to be processed by
>> the
>> > > > > broker.
>> > > > > > > So
>> > > > > > > > >> the
>> > > > > > > > >> > > > overall
>> > > > > > > > >> > > > > >> time to move leaders away from the failed disk
>> > may
>> > > > > still
>> > > > > > be
>> > > > > > > > >> long
>> > > > > > > > >> > > even
>> > > > > > > > >> > > > > with
>> > > > > > > > >> > > > > >> this KIP. What do you think?
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> Thanks,
>> > > > > > > > >> > > > > >> Dong
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
>> > > > > > > > >> [email protected]
>> > > > > > > > >> > >
>> > > > > > > > >> > > > > wrote:
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
>> > > > > > > > >> > > > > >> >
>> > > > > > > > >> > > > > >> > @Dong,
>> > > > > > > > >> > > > > >> > Since both of the two comments in your
>> previous
>> > > > email
>> > > > > > are
>> > > > > > > > >> about
>> > > > > > > > >> > > the
>> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's
>> useful,
>> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you agree
>> > that
>> > > > > this
>> > > > > > > KIP
>> > > > > > > > >> can
>> > > > > > > > >> > be
>> > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
>> > > > > > > > >> > > > > >> > Please let me know, thanks!
>> > > > > > > > >> > > > > >> >
>> > > > > > > > >> > > > > >> > Regards,
>> > > > > > > > >> > > > > >> > Lucas
>> > > > > > > > >> > > > > >> >
>> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
>> > > > > > > [email protected]>
>> > > > > > > > >> > wrote:
>> > > > > > > > >> > > > > >> >
>> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one
>> > probably
>> > > > > > should
>> > > > > > > > just
>> > > > > > > > >> > kill
>> > > > > > > > >> > > > the
>> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not
>> help.
>> > If
>> > > > > only
>> > > > > > > one
>> > > > > > > > of
>> > > > > > > > >> > the
>> > > > > > > > >> > > > > disks
>> > > > > > > > >> > > > > >> on
>> > > > > > > > >> > > > > >> > a
>> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail that
>> > disk
>> > > > and
>> > > > > > move
>> > > > > > > > the
>> > > > > > > > >> > > > leaders
>> > > > > > > > >> > > > > on
>> > > > > > > > >> > > > > >> > that
>> > > > > > > > >> > > > > >> > > disk to other brokers. In that case, being
>> > able
>> > > > to
>> > > > > > > > process
>> > > > > > > > >> the
>> > > > > > > > >> > > > > >> > LeaderAndIsr
>> > > > > > > > >> > > > > >> > > requests faster will potentially help the
>> > > > producers
>> > > > > > > > recover
>> > > > > > > > >> > > > quicker.
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > Thanks,
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > Jun
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
>> > > > > > > > >> [email protected]
>> > > > > > > > >> > >
>> > > > > > > > >> > > > > wrote:
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > > Hey Lucas,
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
>> > > questions
>> > > > > > below.
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest
>> covers
>> > 20
>> > > > > > > > partitions
>> > > > > > > > >> > that
>> > > > > > > > >> > > > are
>> > > > > > > > >> > > > > >> > > randomly
>> > > > > > > > >> > > > > >> > > > distributed across all partitions, then
>> > each
>> > > > > > > > >> ProduceRequest
>> > > > > > > > >> > > will
>> > > > > > > > >> > > > > >> likely
>> > > > > > > > >> > > > > >> > > > cover some partitions for which the
>> broker
>> > is
>> > > > > still
>> > > > > > > > >> leader
>> > > > > > > > >> > > after
>> > > > > > > > >> > > > > it
>> > > > > > > > >> > > > > >> > > quickly
>> > > > > > > > >> > > > > >> > > > processes the
>> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will
>> still
>> > > be
>> > > > > slow
>> > > > > > > in
>> > > > > > > > >> > > > processing
>> > > > > > > > >> > > > > >> these
>> > > > > > > > >> > > > > >> > > > ProduceRequest and request will still be
>> > very
>> > > > > high
>> > > > > > > with
>> > > > > > > > >> this
>> > > > > > > > >> > > > KIP.
>> > > > > > > > >> > > > > It
>> > > > > > > > >> > > > > >> > > seems
>> > > > > > > > >> > > > > >> > > > that most ProduceRequest will still
>> timeout
>> > > > after
>> > > > > > 30
>> > > > > > > > >> > seconds.
>> > > > > > > > >> > > Is
>> > > > > > > > >> > > > > >> this
>> > > > > > > > >> > > > > >> > > > understanding correct?
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will
>> > > still
>> > > > > > > timeout
>> > > > > > > > >> after
>> > > > > > > > >> > > 30
>> > > > > > > > >> > > > > >> > seconds,
>> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP
>> reduces
>> > > > > average
>> > > > > > > > >> produce
>> > > > > > > > >> > > > > latency.
>> > > > > > > > >> > > > > >> Can
>> > > > > > > > >> > > > > >> > > you
>> > > > > > > > >> > > > > >> > > > clarify what metrics can be improved by
>> > this
>> > > > KIP?
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > Not sure why system operator directly
>> cares
>> > > > > number
>> > > > > > of
>> > > > > > > > >> > > truncated
>> > > > > > > > >> > > > > >> > messages.
>> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve average
>> > > > > throughput
>> > > > > > > or
>> > > > > > > > >> > reduce
>> > > > > > > > >> > > > > >> message
>> > > > > > > > >> > > > > >> > > > duplication? It will be good to
>> understand
>> > > > this.
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > Thanks,
>> > > > > > > > >> > > > > >> > > > Dong
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas
>> Wang <
>> > > > > > > > >> > > [email protected]
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > >> > wrote:
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > > Hi Dong,
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments.
>> Please
>> > > see
>> > > > > my
>> > > > > > > > reply
>> > > > > > > > >> > > below.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
>> > partition.
>> > > > Now
>> > > > > > > let's
>> > > > > > > > >> > > consider
>> > > > > > > > >> > > > a
>> > > > > > > > >> > > > > >> more
>> > > > > > > > >> > > > > >> > > > common
>> > > > > > > > >> > > > > >> > > > > scenario
>> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of many
>> > > > partitions.
>> > > > > > And
>> > > > > > > > >> let's
>> > > > > > > > >> > > say
>> > > > > > > > >> > > > > for
>> > > > > > > > >> > > > > >> > some
>> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
>> > > > > > > > >> > > > > >> > > > > The number of leader partitions on
>> > broker0
>> > > is
>> > > > > so
>> > > > > > > > large,
>> > > > > > > > >> > say
>> > > > > > > > >> > > > 10K,
>> > > > > > > > >> > > > > >> that
>> > > > > > > > >> > > > > >> > > the
>> > > > > > > > >> > > > > >> > > > > cluster is skewed,
>> > > > > > > > >> > > > > >> > > > > and the operator would like to shift
>> the
>> > > > > > leadership
>> > > > > > > > >> for a
>> > > > > > > > >> > > lot
>> > > > > > > > >> > > > of
>> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
>> > > > > > > > >> > > > > >> > > > > either manually or through some
>> service
>> > > like
>> > > > > > cruise
>> > > > > > > > >> > control.
>> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the
>> > leadership
>> > > > > > > > transitions
>> > > > > > > > >> > > finish
>> > > > > > > > >> > > > > >> more
>> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
>> > > becoming
>> > > > > more
>> > > > > > > > >> > balanced,
>> > > > > > > > >> > > > > >> > > > > but all existing producers
>> corresponding
>> > to
>> > > > the
>> > > > > > 9K
>> > > > > > > > >> > > partitions
>> > > > > > > > >> > > > > will
>> > > > > > > > >> > > > > >> > get
>> > > > > > > > >> > > > > >> > > > the
>> > > > > > > > >> > > > > >> > > > > errors relatively quickly
>> > > > > > > > >> > > > > >> > > > > rather than relying on their timeout,
>> > > thanks
>> > > > to
>> > > > > > the
>> > > > > > > > >> > batched
>> > > > > > > > >> > > > > async
>> > > > > > > > >> > > > > >> ZK
>> > > > > > > > >> > > > > >> > > > > operations.
>> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to have
>> > during
>> > > > such
>> > > > > > > > >> > troublesome
>> > > > > > > > >> > > > > times.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc
>> have
>> > > > shown
>> > > > > > > that
>> > > > > > > > >> with
>> > > > > > > > >> > > this
>> > > > > > > > >> > > > > KIP
>> > > > > > > > >> > > > > >> > many
>> > > > > > > > >> > > > > >> > > > > producers
>> > > > > > > > >> > > > > >> > > > > receive an explicit error
>> > > > > NotLeaderForPartition,
>> > > > > > > > based
>> > > > > > > > >> on
>> > > > > > > > >> > > > which
>> > > > > > > > >> > > > > >> they
>> > > > > > > > >> > > > > >> > > > retry
>> > > > > > > > >> > > > > >> > > > > immediately.
>> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
>> seconds+quick
>> > > > retry)
>> > > > > > for
>> > > > > > > > >> their
>> > > > > > > > >> > > > single
>> > > > > > > > >> > > > > >> > > message
>> > > > > > > > >> > > > > >> > > > is
>> > > > > > > > >> > > > > >> > > > > much smaller
>> > > > > > > > >> > > > > >> > > > > compared with the case of timing out
>> > > without
>> > > > > the
>> > > > > > > KIP
>> > > > > > > > >> (30
>> > > > > > > > >> > > > seconds
>> > > > > > > > >> > > > > >> for
>> > > > > > > > >> > > > > >> > > > timing
>> > > > > > > > >> > > > > >> > > > > out + quick retry).
>> > > > > > > > >> > > > > >> > > > > One might argue that reducing the
>> timing
>> > > out
>> > > > on
>> > > > > > the
>> > > > > > > > >> > producer
>> > > > > > > > >> > > > > side
>> > > > > > > > >> > > > > >> can
>> > > > > > > > >> > > > > >> > > > > achieve the same result,
>> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
>> > > > > > drawbacks[1].
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show
>> the
>> > > > > number
>> > > > > > of
>> > > > > > > > >> > > truncated
>> > > > > > > > >> > > > > >> > messages
>> > > > > > > > >> > > > > >> > > on
>> > > > > > > > >> > > > > >> > > > > brokers,
>> > > > > > > > >> > > > > >> > > > > with the experiments done in the
>> Google
>> > > Doc,
>> > > > it
>> > > > > > > > should
>> > > > > > > > >> be
>> > > > > > > > >> > > easy
>> > > > > > > > >> > > > > to
>> > > > > > > > >> > > > > >> see
>> > > > > > > > >> > > > > >> > > > that
>> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
>> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
>> > > > up-to-date
>> > > > > > > > >> metadata
>> > > > > > > > >> > > > avoids
>> > > > > > > > >> > > > > >> > > appending
>> > > > > > > > >> > > > > >> > > > > of messages
>> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we
>> > talk
>> > > > to a
>> > > > > > > > system
>> > > > > > > > >> > > > operator
>> > > > > > > > >> > > > > >> and
>> > > > > > > > >> > > > > >> > ask
>> > > > > > > > >> > > > > >> > > > > whether
>> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet
>> > most
>> > > > > likely
>> > > > > > > the
>> > > > > > > > >> > answer
>> > > > > > > > >> > > > is
>> > > > > > > > >> > > > > >> yes.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I think it
>> > > might
>> > > > be
>> > > > > > > > >> helpful to
>> > > > > > > > >> > > > > >> construct
>> > > > > > > > >> > > > > >> > > some
>> > > > > > > > >> > > > > >> > > > > formulas.
>> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going
>> back
>> > to
>> > > > the
>> > > > > > > case
>> > > > > > > > >> where
>> > > > > > > > >> > > > there
>> > > > > > > > >> > > > > >> is
>> > > > > > > > >> > > > > >> > > only
>> > > > > > > > >> > > > > >> > > > > ONE partition involved.
>> > > > > > > > >> > > > > >> > > > > Following the experiments in the
>> Google
>> > > Doc,
>> > > > > > let's
>> > > > > > > > say
>> > > > > > > > >> > > broker0
>> > > > > > > > >> > > > > >> > becomes
>> > > > > > > > >> > > > > >> > > > the
>> > > > > > > > >> > > > > >> > > > > follower at time t0,
>> > > > > > > > >> > > > > >> > > > > and after t0 there were still N
>> produce
>> > > > > requests
>> > > > > > in
>> > > > > > > > its
>> > > > > > > > >> > > > request
>> > > > > > > > >> > > > > >> > queue.
>> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought
>> by
>> > > this
>> > > > > KIP,
>> > > > > > > > >> broker0
>> > > > > > > > >> > > can
>> > > > > > > > >> > > > > >> reply
>> > > > > > > > >> > > > > >> > > with
>> > > > > > > > >> > > > > >> > > > an
>> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
>> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average
>> > > processing
>> > > > > > time
>> > > > > > > of
>> > > > > > > > >> > > replying
>> > > > > > > > >> > > > > >> with
>> > > > > > > > >> > > > > >> > > such
>> > > > > > > > >> > > > > >> > > > an
>> > > > > > > > >> > > > > >> > > > > error message.
>> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will
>> need to
>> > > > > append
>> > > > > > > > >> messages
>> > > > > > > > >> > to
>> > > > > > > > >> > > > > >> > segments,
>> > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
>> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average
>> > > processing
>> > > > > > time
>> > > > > > > > for
>> > > > > > > > >> > such
>> > > > > > > > >> > > > > logic.
>> > > > > > > > >> > > > > >> > > > > Then the average extra latency
>> incurred
>> > > > without
>> > > > > > > this
>> > > > > > > > >> KIP
>> > > > > > > > >> > is
>> > > > > > > > >> > > N
>> > > > > > > > >> > > > *
>> > > > > > > > >> > > > > >> (M2 -
>> > > > > > > > >> > > > > >> > > > M1) /
>> > > > > > > > >> > > > > >> > > > > 2.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be
>> larger
>> > > than
>> > > > > M1,
>> > > > > > > > which
>> > > > > > > > >> > means
>> > > > > > > > >> > > > as
>> > > > > > > > >> > > > > >> long
>> > > > > > > > >> > > > > >> > > as N
>> > > > > > > > >> > > > > >> > > > > is positive,
>> > > > > > > > >> > > > > >> > > > > we would see improvements on the
>> average
>> > > > > latency.
>> > > > > > > > >> > > > > >> > > > > There does not need to be significant
>> > > backlog
>> > > > > of
>> > > > > > > > >> requests
>> > > > > > > > >> > in
>> > > > > > > > >> > > > the
>> > > > > > > > >> > > > > >> > > request
>> > > > > > > > >> > > > > >> > > > > queue,
>> > > > > > > > >> > > > > >> > > > > or severe degradation of disk
>> performance
>> > > to
>> > > > > have
>> > > > > > > the
>> > > > > > > > >> > > > > improvement.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > Regards,
>> > > > > > > > >> > > > > >> > > > > Lucas
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the
>> timeout on
>> > > the
>> > > > > > > > producer
>> > > > > > > > >> > side
>> > > > > > > > >> > > > can
>> > > > > > > > >> > > > > >> > trigger
>> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
>> > > > > > > > >> > > > > >> > > > > when the corresponding leader broker
>> is
>> > > > > > overloaded,
>> > > > > > > > >> > > > exacerbating
>> > > > > > > > >> > > > > >> the
>> > > > > > > > >> > > > > >> > > > > situation.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong
>> Lin
>> > <
>> > > > > > > > >> > > [email protected]
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > >> > wrote:
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
>> > > documentation
>> > > > of
>> > > > > > the
>> > > > > > > > >> > > > experiment.
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > Initially I also think having a
>> > separate
>> > > > > queue
>> > > > > > > for
>> > > > > > > > >> > > > controller
>> > > > > > > > >> > > > > >> > > requests
>> > > > > > > > >> > > > > >> > > > is
>> > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned in
>> the
>> > > > > summary
>> > > > > > > > >> section
>> > > > > > > > >> > of
>> > > > > > > > >> > > > the
>> > > > > > > > >> > > > > >> > Google
>> > > > > > > > >> > > > > >> > > > > doc,
>> > > > > > > > >> > > > > >> > > > > > controller requests are generally
>> more
>> > > > > > important
>> > > > > > > > than
>> > > > > > > > >> > data
>> > > > > > > > >> > > > > >> requests
>> > > > > > > > >> > > > > >> > > and
>> > > > > > > > >> > > > > >> > > > > we
>> > > > > > > > >> > > > > >> > > > > > probably want controller requests
>> to be
>> > > > > > processed
>> > > > > > > > >> > sooner.
>> > > > > > > > >> > > > But
>> > > > > > > > >> > > > > >> then
>> > > > > > > > >> > > > > >> > > Eno
>> > > > > > > > >> > > > > >> > > > > has
>> > > > > > > > >> > > > > >> > > > > > two very good questions which I am
>> not
>> > > sure
>> > > > > the
>> > > > > > > > >> Google
>> > > > > > > > >> > doc
>> > > > > > > > >> > > > has
>> > > > > > > > >> > > > > >> > > answered
>> > > > > > > > >> > > > > >> > > > > > explicitly. Could you help with the
>> > > > following
>> > > > > > > > >> questions?
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the
>> > > actual
>> > > > > > > benefit
>> > > > > > > > of
>> > > > > > > > >> > > > KIP-291
>> > > > > > > > >> > > > > to
>> > > > > > > > >> > > > > >> > > users.
>> > > > > > > > >> > > > > >> > > > > The
>> > > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
>> > > > simulates
>> > > > > > the
>> > > > > > > > >> > scenario
>> > > > > > > > >> > > > that
>> > > > > > > > >> > > > > >> > broker
>> > > > > > > > >> > > > > >> > > > is
>> > > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest
>> due
>> > to
>> > > > e.g.
>> > > > > > > slow
>> > > > > > > > >> disk.
>> > > > > > > > >> > > It
>> > > > > > > > >> > > > > >> > currently
>> > > > > > > > >> > > > > >> > > > > > assumes that there is only 1
>> partition.
>> > > But
>> > > > > in
>> > > > > > > the
>> > > > > > > > >> > common
>> > > > > > > > >> > > > > >> scenario,
>> > > > > > > > >> > > > > >> > > it
>> > > > > > > > >> > > > > >> > > > is
>> > > > > > > > >> > > > > >> > > > > > probably reasonable to assume that
>> > there
>> > > > are
>> > > > > > many
>> > > > > > > > >> other
>> > > > > > > > >> > > > > >> partitions
>> > > > > > > > >> > > > > >> > > that
>> > > > > > > > >> > > > > >> > > > > are
>> > > > > > > > >> > > > > >> > > > > > also actively produced to and
>> > > > ProduceRequest
>> > > > > to
>> > > > > > > > these
>> > > > > > > > >> > > > > partition
>> > > > > > > > >> > > > > >> > also
>> > > > > > > > >> > > > > >> > > > > takes
>> > > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So
>> even
>> > > if
>> > > > > > > broker0
>> > > > > > > > >> can
>> > > > > > > > >> > > > become
>> > > > > > > > >> > > > > >> > > follower
>> > > > > > > > >> > > > > >> > > > > for
>> > > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably
>> still
>> > > > needs
>> > > > > > to
>> > > > > > > > >> process
>> > > > > > > > >> > > the
>> > > > > > > > >> > > > > >> > > > > ProduceRequest
>> > > > > > > > >> > > > > >> > > > > > slowly t in the queue because these
>> > > > > > > ProduceRequests
>> > > > > > > > >> > cover
>> > > > > > > > >> > > > > other
>> > > > > > > > >> > > > > >> > > > > partitions.
>> > > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
>> > > timeout
>> > > > > > after
>> > > > > > > > 30
>> > > > > > > > >> > > seconds
>> > > > > > > > >> > > > > and
>> > > > > > > > >> > > > > >> > most
>> > > > > > > > >> > > > > >> > > > > > clients will still likely timeout
>> after
>> > > 30
>> > > > > > > seconds.
>> > > > > > > > >> Then
>> > > > > > > > >> > > it
>> > > > > > > > >> > > > is
>> > > > > > > > >> > > > > >> not
>> > > > > > > > >> > > > > >> > > > > > obviously what is the benefit to
>> client
>> > > > since
>> > > > > > > > client
>> > > > > > > > >> > will
>> > > > > > > > >> > > > > >> timeout
>> > > > > > > > >> > > > > >> > > after
>> > > > > > > > >> > > > > >> > > > > 30
>> > > > > > > > >> > > > > >> > > > > > seconds before possibly
>> re-connecting
>> > to
>> > > > > > broker1,
>> > > > > > > > >> with
>> > > > > > > > >> > or
>> > > > > > > > >> > > > > >> without
>> > > > > > > > >> > > > > >> > > > > KIP-291.
>> > > > > > > > >> > > > > >> > > > > > Did I miss something here?
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
>> > > specific
>> > > > > > > > benefits
>> > > > > > > > >> of
>> > > > > > > > >> > > this
>> > > > > > > > >> > > > > >> KIP to
>> > > > > > > > >> > > > > >> > > > user
>> > > > > > > > >> > > > > >> > > > > or
>> > > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether
>> this
>> > > KIP
>> > > > > > > > decreases
>> > > > > > > > >> > > > average
>> > > > > > > > >> > > > > >> > > latency,
>> > > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably
>> of
>> > > > > exception
>> > > > > > > > >> exposed
>> > > > > > > > >> > to
>> > > > > > > > >> > > > > >> client
>> > > > > > > > >> > > > > >> > > etc.
>> > > > > > > > >> > > > > >> > > > It
>> > > > > > > > >> > > > > >> > > > > > is probably useful to clarify this.
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
>> > > > experience
>> > > > > > > only
>> > > > > > > > >> when
>> > > > > > > > >> > > > there
>> > > > > > > > >> > > > > is
>> > > > > > > > >> > > > > >> > > issue
>> > > > > > > > >> > > > > >> > > > > with
>> > > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in
>> the
>> > > > > request
>> > > > > > > > queue
>> > > > > > > > >> > due
>> > > > > > > > >> > > to
>> > > > > > > > >> > > > > >> slow
>> > > > > > > > >> > > > > >> > > disk
>> > > > > > > > >> > > > > >> > > > as
>> > > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is
>> this
>> > > KIP
>> > > > > > also
>> > > > > > > > >> useful
>> > > > > > > > >> > > when
>> > > > > > > > >> > > > > >> there
>> > > > > > > > >> > > > > >> > is
>> > > > > > > > >> > > > > >> > > > no
>> > > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It
>> might
>> > be
>> > > > > > helpful
>> > > > > > > > to
>> > > > > > > > >> > > clarify
>> > > > > > > > >> > > > > >> this
>> > > > > > > > >> > > > > >> > to
>> > > > > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > Thanks much,
>> > > > > > > > >> > > > > >> > > > > > Dong
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM,
>> Lucas
>> > > > Wang <
>> > > > > > > > >> > > > > >> [email protected]
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > > > wrote:
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > > Hi Eno,
>> > > > > > > > >> > > > > >> > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
>> > > > > experiment
>> > > > > > > > >> results.
>> > > > > > > > >> > > > > >> > > > > > > Here is a link to the positive
>> impact
>> > > > > > achieved
>> > > > > > > by
>> > > > > > > > >> > > > > implementing
>> > > > > > > > >> > > > > >> > the
>> > > > > > > > >> > > > > >> > > > > > proposed
>> > > > > > > > >> > > > > >> > > > > > > change:
>> > > > > > > > >> > > > > >> > > > > > >
>> https://docs.google.com/document/d/
>> > > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
>> > > > > > > > >> > > > > >> > > > > > >
>> FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
>> > > > > > > > >> > > > > >> > > > > > > Please take a look when you have
>> time
>> > > and
>> > > > > let
>> > > > > > > me
>> > > > > > > > >> know
>> > > > > > > > >> > > your
>> > > > > > > > >> > > > > >> > > feedback.
>> > > > > > > > >> > > > > >> > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > Regards,
>> > > > > > > > >> > > > > >> > > > > > > Lucas
>> > > > > > > > >> > > > > >> > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
>> > > Harsha <
>> > > > > > > > >> > > [email protected]>
>> > > > > > > > >> > > > > >> wrote:
>> > > > > > > > >> > > > > >> > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will
>> take a
>> > > > look
>> > > > > > > might
>> > > > > > > > >> suit
>> > > > > > > > >> > > our
>> > > > > > > > >> > > > > >> > > > requirements
>> > > > > > > > >> > > > > >> > > > > > > > better.
>> > > > > > > > >> > > > > >> > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > Thanks,
>> > > > > > > > >> > > > > >> > > > > > > > Harsha
>> > > > > > > > >> > > > > >> > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52
>> PM,
>> > > > Lucas
>> > > > > > > Wang <
>> > > > > > > > >> > > > > >> > > > [email protected]
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > wrote:
>> > > > > > > > >> > > > > >> > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
>> > > > > > replication
>> > > > > > > > >> quota
>> > > > > > > > >> > > > > mechanism
>> > > > > > > > >> > > > > >> > > > proposed
>> > > > > > > > >> > > > > >> > > > > > in
>> > > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
>> > > > scenario.
>> > > > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > > Thanks,
>> > > > > > > > >> > > > > >> > > > > > > > > Lucas
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > >
>>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Reply via email to