Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Dong Lin Wed, 18 Jul 2018 21:01:45 -0700

Hey Becket,

Sorry I misunderstood your example. I thought you mean requests from
different controller are re-ordered.


I think you have provided a very good example and it should be safer to
still use two queues. Let me clarify the example a bit more below.

- If the controller has received response for R1 before it is disconnected
from the broker, it will send a new request R2 after it is re-connected to
the broker. There is no issue in this case because R1 will be processed
before R2.

- If the controller has not received response for R1 before it is
disconnected, it will re-send R1 followed by R2 after it is re-connected to
the broker. With high probability the order of processing should be R1, R1
and R2. This is because we have multiple request handler threads and the
first two R1 will typically both be processed before R2. With low
probability the order of processing will be R1, R2, R1, which can
potentially be a problem.

Thanks,
Dong

On Wed, Jul 18, 2018 at 6:24 PM, Dong Lin <[email protected]> wrote:

> Hey Becket,
>
> It seems that the requests from the old controller will be discarded due
> to old controller epoch. It is not clear whether this is a problem.
>
> And if this out-of-order processing of controller requests is a problem,
> it seems like an existing problem which also applies to the multi-queue
> based design. So it is probably not a concern specific to the use of deque.
> Does that sound reasonable?
>
> Thanks,
> Dong
>
>
> On Wed, 18 Jul 2018 at 6:17 PM Becket Qin <[email protected]> wrote:
>
>> Hi Mayuresh/Joel,
>>
>> Using the request channel as a dequeue was bright up some time ago when we
>> initially thinking of prioritizing the request. The concern was that the
>> controller requests are supposed to be processed in order. If we can
>> ensure
>> that there is one controller request in the request channel, the order is
>> not a concern. But in cases that there are more than one controller
>> request
>> inserted into the queue, the controller request order may change and cause
>> problem. For example, think about the following sequence:
>> 1. Controller successfully sent a request R1 to broker
>> 2. Broker receives R1 and put the request to the head of the request
>> queue.
>> 3. Controller to broker connection failed and the controller reconnected
>> to
>> the broker.
>> 4. Controller sends a request R2 to the broker
>> 5. Broker receives R2 and add it to the head of the request queue.
>> Now on the broker side, R2 will be processed before R1 is processed, which
>> may cause problem.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>>
>>
>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <[email protected]> wrote:
>>
>> > @Mayuresh - I like your idea. It appears to be a simpler less invasive
>> > alternative and it should work. Jun/Becket/others, do you see any
>> pitfalls
>> > with this approach?
>> >
>> > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <[email protected]>
>> > wrote:
>> >
>> > > @Mayuresh,
>> > > That's a very interesting idea that I haven't thought before.
>> > > It seems to solve our problem at hand pretty well, and also
>> > > avoids the need to have a new size metric and capacity config
>> > > for the controller request queue. In fact, if we were to adopt
>> > > this design, there is no public interface change, and we
>> > > probably don't need a KIP.
>> > > Also implementation wise, it seems
>> > > the java class LinkedBlockingQueue can readily satisfy the requirement
>> > > by supporting a capacity, and also allowing inserting at both ends.
>> > >
>> > > My only concern is that this design is tied to the coincidence that
>> > > we have two request priorities and there are two ends to a deque.
>> > > Hence by using the proposed design, it seems the network layer is
>> > > more tightly coupled with upper layer logic, e.g. if we were to add
>> > > an extra priority level in the future for some reason, we would
>> probably
>> > > need to go back to the design of separate queues, one for each
>> priority
>> > > level.
>> > >
>> > > In summary, I'm ok with both designs and lean toward your suggested
>> > > approach.
>> > > Let's hear what others think.
>> > >
>> > > @Becket,
>> > > In light of Mayuresh's suggested new design, I'm answering your
>> question
>> > > only in the context
>> > > of the current KIP design: I think your suggestion makes sense, and
>> I'm
>> > ok
>> > > with removing the capacity config and
>> > > just relying on the default value of 20 being sufficient enough.
>> > >
>> > > Thanks,
>> > > Lucas
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
>> > > [email protected]
>> > > > wrote:
>> > >
>> > > > Hi Lucas,
>> > > >
>> > > > Seems like the main intent here is to prioritize the controller
>> request
>> > > > over any other requests.
>> > > > In that case, we can change the request queue to a dequeue, where
>> you
>> > > > always insert the normal requests (produce, consume,..etc) to the
>> end
>> > of
>> > > > the dequeue, but if its a controller request, you insert it to the
>> head
>> > > of
>> > > > the queue. This ensures that the controller request will be given
>> > higher
>> > > > priority over other requests.
>> > > >
>> > > > Also since we only read one request from the socket and mute it and
>> > only
>> > > > unmute it after handling the request, this would ensure that we
>> don't
>> > > > handle controller requests out of order.
>> > > >
>> > > > With this approach we can avoid the second queue and the additional
>> > > config
>> > > > for the size of the queue.
>> > > >
>> > > > What do you think ?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Mayuresh
>> > > >
>> > > >
>> > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <[email protected]>
>> > wrote:
>> > > >
>> > > > > Hey Joel,
>> > > > >
>> > > > > Thank for the detail explanation. I agree the current design makes
>> > > sense.
>> > > > > My confusion is about whether the new config for the controller
>> queue
>> > > > > capacity is necessary. I cannot think of a case in which users
>> would
>> > > > change
>> > > > > it.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jiangjie (Becket) Qin
>> > > > >
>> > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <[email protected]
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi Lucas,
>> > > > > >
>> > > > > > I guess my question can be rephrased to "do we expect user to
>> ever
>> > > > change
>> > > > > > the controller request queue capacity"? If we agree that 20 is
>> > > already
>> > > > a
>> > > > > > very generous default number and we do not expect user to change
>> > it,
>> > > is
>> > > > > it
>> > > > > > still necessary to expose this as a config?
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > > Jiangjie (Becket) Qin
>> > > > > >
>> > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
>> [email protected]
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > >> @Becket
>> > > > > >> 1. Thanks for the comment. You are right that normally there
>> > should
>> > > be
>> > > > > >> just
>> > > > > >> one controller request because of muting,
>> > > > > >> and I had NOT intended to say there would be many enqueued
>> > > controller
>> > > > > >> requests.
>> > > > > >> I went through the KIP again, and I'm not sure which part
>> conveys
>> > > that
>> > > > > >> info.
>> > > > > >> I'd be happy to revise if you point it out the section.
>> > > > > >>
>> > > > > >> 2. Though it should not happen in normal conditions, the
>> current
>> > > > design
>> > > > > >> does not preclude multiple controllers running
>> > > > > >> at the same time, hence if we don't have the controller queue
>> > > capacity
>> > > > > >> config and simply make its capacity to be 1,
>> > > > > >> network threads handling requests from different controllers
>> will
>> > be
>> > > > > >> blocked during those troublesome times,
>> > > > > >> which is probably not what we want. On the other hand, adding
>> the
>> > > > extra
>> > > > > >> config with a default value, say 20, guards us from issues in
>> > those
>> > > > > >> troublesome times, and IMO there isn't much downside of adding
>> the
>> > > > extra
>> > > > > >> config.
>> > > > > >>
>> > > > > >> @Mayuresh
>> > > > > >> Good catch, this sentence is an obsolete statement based on a
>> > > previous
>> > > > > >> design. I've revised the wording in the KIP.
>> > > > > >>
>> > > > > >> Thanks,
>> > > > > >> Lucas
>> > > > > >>
>> > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
>> > > > > >> [email protected]> wrote:
>> > > > > >>
>> > > > > >> > Hi Lucas,
>> > > > > >> >
>> > > > > >> > Thanks for the KIP.
>> > > > > >> > I am trying to understand why you think "The memory
>> consumption
>> > > can
>> > > > > rise
>> > > > > >> > given the total number of queued requests can go up to 2x" in
>> > the
>> > > > > impact
>> > > > > >> > section. Normally the requests from controller to a Broker
>> are
>> > not
>> > > > > high
>> > > > > >> > volume, right ?
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > Thanks,
>> > > > > >> >
>> > > > > >> > Mayuresh
>> > > > > >> >
>> > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
>> > [email protected]>
>> > > > > >> wrote:
>> > > > > >> >
>> > > > > >> > > Thanks for the KIP, Lucas. Separating the control plane
>> from
>> > the
>> > > > > data
>> > > > > >> > plane
>> > > > > >> > > makes a lot of sense.
>> > > > > >> > >
>> > > > > >> > > In the KIP you mentioned that the controller request queue
>> may
>> > > > have
>> > > > > >> many
>> > > > > >> > > requests in it. Will this be a common case? The controller
>> > > > requests
>> > > > > >> still
>> > > > > >> > > goes through the SocketServer. The SocketServer will mute
>> the
>> > > > > channel
>> > > > > >> > once
>> > > > > >> > > a request is read and put into the request channel. So
>> > assuming
>> > > > > there
>> > > > > >> is
>> > > > > >> > > only one connection between controller and each broker, on
>> the
>> > > > > broker
>> > > > > >> > side,
>> > > > > >> > > there should be only one controller request in the
>> controller
>> > > > > request
>> > > > > >> > queue
>> > > > > >> > > at any given time. If that is the case, do we need a
>> separate
>> > > > > >> controller
>> > > > > >> > > request queue capacity config? The default value 20 means
>> that
>> > > we
>> > > > > >> expect
>> > > > > >> > > there are 20 controller switches to happen in a short
>> period
>> > of
>> > > > > time.
>> > > > > >> I
>> > > > > >> > am
>> > > > > >> > > not sure whether someone should increase the controller
>> > request
>> > > > > queue
>> > > > > >> > > capacity to handle such case, as it seems indicating
>> something
>> > > > very
>> > > > > >> wrong
>> > > > > >> > > has happened.
>> > > > > >> > >
>> > > > > >> > > Thanks,
>> > > > > >> > >
>> > > > > >> > > Jiangjie (Becket) Qin
>> > > > > >> > >
>> > > > > >> > >
>> > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
>> > [email protected]>
>> > > > > >> wrote:
>> > > > > >> > >
>> > > > > >> > > > Thanks for the update Lucas.
>> > > > > >> > > >
>> > > > > >> > > > I think the motivation section is intuitive. It will be
>> good
>> > > to
>> > > > > >> learn
>> > > > > >> > > more
>> > > > > >> > > > about the comments from other reviewers.
>> > > > > >> > > >
>> > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
>> > > > > [email protected]>
>> > > > > >> > > wrote:
>> > > > > >> > > >
>> > > > > >> > > > > Hi Dong,
>> > > > > >> > > > >
>> > > > > >> > > > > I've updated the motivation section of the KIP by
>> > explaining
>> > > > the
>> > > > > >> > cases
>> > > > > >> > > > that
>> > > > > >> > > > > would have user impacts.
>> > > > > >> > > > > Please take a look at let me know your comments.
>> > > > > >> > > > >
>> > > > > >> > > > > Thanks,
>> > > > > >> > > > > Lucas
>> > > > > >> > > > >
>> > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
>> > > > > [email protected]
>> > > > > >> >
>> > > > > >> > > > wrote:
>> > > > > >> > > > >
>> > > > > >> > > > > > Hi Dong,
>> > > > > >> > > > > >
>> > > > > >> > > > > > The simulation of disk being slow is merely for me to
>> > > easily
>> > > > > >> > > construct
>> > > > > >> > > > a
>> > > > > >> > > > > > testing scenario
>> > > > > >> > > > > > with a backlog of produce requests. In production,
>> other
>> > > > than
>> > > > > >> the
>> > > > > >> > > disk
>> > > > > >> > > > > > being slow, a backlog of
>> > > > > >> > > > > > produce requests may also be caused by high produce
>> QPS.
>> > > > > >> > > > > > In that case, we may not want to kill the broker and
>> > > that's
>> > > > > when
>> > > > > >> > this
>> > > > > >> > > > KIP
>> > > > > >> > > > > > can be useful, both for JBOD
>> > > > > >> > > > > > and non-JBOD setup.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Going back to your previous question about each
>> > > > ProduceRequest
>> > > > > >> > > covering
>> > > > > >> > > > > 20
>> > > > > >> > > > > > partitions that are randomly
>> > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
>> > enqueued
>> > > > that
>> > > > > >> > tries
>> > > > > >> > > to
>> > > > > >> > > > > > switch the current broker, say broker0, from leader
>> to
>> > > > > follower
>> > > > > >> > > > > > *for one of the partitions*, say *test-0*. For the
>> sake
>> > of
>> > > > > >> > argument,
>> > > > > >> > > > > > let's also assume the other brokers, say broker1,
>> have
>> > > > > *stopped*
>> > > > > >> > > > fetching
>> > > > > >> > > > > > from
>> > > > > >> > > > > > the current broker, i.e. broker0.
>> > > > > >> > > > > > 1. If the enqueued produce requests have acks =  -1
>> > (ALL)
>> > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
>> > > > > >> LeaderAndISR
>> > > > > >> > > will
>> > > > > >> > > > be
>> > > > > >> > > > > > put into the purgatory,
>> > > > > >> > > > > >         and since they'll never be replicated to
>> other
>> > > > brokers
>> > > > > >> > > (because
>> > > > > >> > > > > of
>> > > > > >> > > > > > the assumption made above), they will
>> > > > > >> > > > > >         be completed either when the LeaderAndISR
>> > request
>> > > is
>> > > > > >> > > processed
>> > > > > >> > > > or
>> > > > > >> > > > > > when the timeout happens.
>> > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
>> transition
>> > > the
>> > > > > >> > > partition
>> > > > > >> > > > > > test-0 to become a follower,
>> > > > > >> > > > > >         after the current broker sees the
>> replication of
>> > > the
>> > > > > >> > > remaining
>> > > > > >> > > > 19
>> > > > > >> > > > > > partitions, it can send a response indicating that
>> > > > > >> > > > > >         it's no longer the leader for the "test-0".
>> > > > > >> > > > > >   To see the latency difference between 1.1 and 1.2,
>> > let's
>> > > > say
>> > > > > >> > there
>> > > > > >> > > > are
>> > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR, and
>> > there
>> > > > are
>> > > > > 8
>> > > > > >> io
>> > > > > >> > > > > threads,
>> > > > > >> > > > > >   so each io thread will process approximately 3000
>> > > produce
>> > > > > >> > requests.
>> > > > > >> > > > Now
>> > > > > >> > > > > > let's investigate the io thread that finally
>> processed
>> > the
>> > > > > >> > > > LeaderAndISR.
>> > > > > >> > > > > >   For the 3000 produce requests, if we model the time
>> > when
>> > > > > their
>> > > > > >> > > > > remaining
>> > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the
>> > > > > LeaderAndISR
>> > > > > >> > > > request
>> > > > > >> > > > > is
>> > > > > >> > > > > > processed at time t3000.
>> > > > > >> > > > > >   Without this KIP, the 1st produce request would
>> have
>> > > > waited
>> > > > > an
>> > > > > >> > > extra
>> > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra
>> time
>> > of
>> > > > > >> t3000 -
>> > > > > >> > > t1,
>> > > > > >> > > > > etc.
>> > > > > >> > > > > >   Roughly speaking, the latency difference is bigger
>> for
>> > > the
>> > > > > >> > earlier
>> > > > > >> > > > > > produce requests than for the later ones. For the
>> same
>> > > > reason,
>> > > > > >> the
>> > > > > >> > > more
>> > > > > >> > > > > > ProduceRequests queued
>> > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we get
>> > > (capped
>> > > > > by
>> > > > > >> the
>> > > > > >> > > > > > produce timeout).
>> > > > > >> > > > > > 2. If the enqueued produce requests have acks=0 or
>> > acks=1
>> > > > > >> > > > > >   There will be no latency differences in this case,
>> but
>> > > > > >> > > > > >   2.1 without this KIP, the records of partition
>> test-0
>> > in
>> > > > the
>> > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be
>> > appended
>> > > > to
>> > > > > >> the
>> > > > > >> > > local
>> > > > > >> > > > > log,
>> > > > > >> > > > > >         and eventually be truncated after processing
>> the
>> > > > > >> > > LeaderAndISR.
>> > > > > >> > > > > > This is what's referred to as
>> > > > > >> > > > > >         "some unofficial definition of data loss in
>> > terms
>> > > of
>> > > > > >> > messages
>> > > > > >> > > > > > beyond the high watermark".
>> > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect
>> since if
>> > > the
>> > > > > >> > > > LeaderAndISR
>> > > > > >> > > > > > is immediately processed, the response to producers
>> will
>> > > > have
>> > > > > >> > > > > >         the NotLeaderForPartition error, causing
>> > producers
>> > > > to
>> > > > > >> retry
>> > > > > >> > > > > >
>> > > > > >> > > > > > This explanation above is the benefit for reducing
>> the
>> > > > latency
>> > > > > >> of a
>> > > > > >> > > > > broker
>> > > > > >> > > > > > becoming the follower,
>> > > > > >> > > > > > closely related is reducing the latency of a broker
>> > > becoming
>> > > > > the
>> > > > > >> > > > leader.
>> > > > > >> > > > > > In this case, the benefit is even more obvious, if
>> other
>> > > > > brokers
>> > > > > >> > have
>> > > > > >> > > > > > resigned leadership, and the
>> > > > > >> > > > > > current broker should take leadership. Any delay in
>> > > > processing
>> > > > > >> the
>> > > > > >> > > > > > LeaderAndISR will be perceived
>> > > > > >> > > > > > by clients as unavailability. In extreme cases, this
>> can
>> > > > cause
>> > > > > >> > failed
>> > > > > >> > > > > > produce requests if the retries are
>> > > > > >> > > > > > exhausted.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Another two types of controller requests are
>> > > UpdateMetadata
>> > > > > and
>> > > > > >> > > > > > StopReplica, which I'll briefly discuss as follows:
>> > > > > >> > > > > > For UpdateMetadata requests, delayed processing means
>> > > > clients
>> > > > > >> > > receiving
>> > > > > >> > > > > > stale metadata, e.g. with the wrong leadership info
>> > > > > >> > > > > > for certain partitions, and the effect is more
>> retries
>> > or
>> > > > even
>> > > > > >> > fatal
>> > > > > >> > > > > > failure if the retries are exhausted.
>> > > > > >> > > > > >
>> > > > > >> > > > > > For StopReplica requests, a long queuing time may
>> > degrade
>> > > > the
>> > > > > >> > > > performance
>> > > > > >> > > > > > of topic deletion.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Regarding your last question of the delay for
>> > > > > >> > DescribeLogDirsRequest,
>> > > > > >> > > > you
>> > > > > >> > > > > > are right
>> > > > > >> > > > > > that this KIP cannot help with the latency in getting
>> > the
>> > > > log
>> > > > > >> dirs
>> > > > > >> > > > info,
>> > > > > >> > > > > > and it's only relevant
>> > > > > >> > > > > > when controller requests are involved.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Regards,
>> > > > > >> > > > > > Lucas
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
>> > > > [email protected]
>> > > > > >
>> > > > > >> > > wrote:
>> > > > > >> > > > > >
>> > > > > >> > > > > >> Hey Jun,
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> Thanks much for the comments. It is good point. So
>> the
>> > > > > feature
>> > > > > >> may
>> > > > > >> > > be
>> > > > > >> > > > > >> useful for JBOD use-case. I have one question below.
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> Hey Lucas,
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> Do you think this feature is also useful for
>> non-JBOD
>> > > setup
>> > > > > or
>> > > > > >> it
>> > > > > >> > is
>> > > > > >> > > > > only
>> > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
>> > understand
>> > > > > this.
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> When the broker is setup using JBOD, in order to
>> move
>> > > > leaders
>> > > > > >> on
>> > > > > >> > the
>> > > > > >> > > > > >> failed
>> > > > > >> > > > > >> disk to other disks, the system operator first
>> needs to
>> > > get
>> > > > > the
>> > > > > >> > list
>> > > > > >> > > > of
>> > > > > >> > > > > >> partitions on the failed disk. This is currently
>> > achieved
>> > > > > using
>> > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
>> > > > > >> DescribeLogDirsRequest
>> > > > > >> > to
>> > > > > >> > > > the
>> > > > > >> > > > > >> broker. If we only prioritize the controller
>> requests,
>> > > then
>> > > > > the
>> > > > > >> > > > > >> DescribeLogDirsRequest
>> > > > > >> > > > > >> may still take a long time to be processed by the
>> > broker.
>> > > > So
>> > > > > >> the
>> > > > > >> > > > overall
>> > > > > >> > > > > >> time to move leaders away from the failed disk may
>> > still
>> > > be
>> > > > > >> long
>> > > > > >> > > even
>> > > > > >> > > > > with
>> > > > > >> > > > > >> this KIP. What do you think?
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> Thanks,
>> > > > > >> > > > > >> Dong
>> > > > > >> > > > > >>
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
>> > > > > >> [email protected]
>> > > > > >> > >
>> > > > > >> > > > > wrote:
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
>> > > > > >> > > > > >> >
>> > > > > >> > > > > >> > @Dong,
>> > > > > >> > > > > >> > Since both of the two comments in your previous
>> email
>> > > are
>> > > > > >> about
>> > > > > >> > > the
>> > > > > >> > > > > >> > benefits of this KIP and whether it's useful,
>> > > > > >> > > > > >> > in light of Jun's last comment, do you agree that
>> > this
>> > > > KIP
>> > > > > >> can
>> > > > > >> > be
>> > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
>> > > > > >> > > > > >> > Please let me know, thanks!
>> > > > > >> > > > > >> >
>> > > > > >> > > > > >> > Regards,
>> > > > > >> > > > > >> > Lucas
>> > > > > >> > > > > >> >
>> > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
>> > > > [email protected]>
>> > > > > >> > wrote:
>> > > > > >> > > > > >> >
>> > > > > >> > > > > >> > > Hi, Lucas, Dong,
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > If all disks on a broker are slow, one probably
>> > > should
>> > > > > just
>> > > > > >> > kill
>> > > > > >> > > > the
>> > > > > >> > > > > >> > > broker. In that case, this KIP may not help. If
>> > only
>> > > > one
>> > > > > of
>> > > > > >> > the
>> > > > > >> > > > > disks
>> > > > > >> > > > > >> on
>> > > > > >> > > > > >> > a
>> > > > > >> > > > > >> > > broker is slow, one may want to fail that disk
>> and
>> > > move
>> > > > > the
>> > > > > >> > > > leaders
>> > > > > >> > > > > on
>> > > > > >> > > > > >> > that
>> > > > > >> > > > > >> > > disk to other brokers. In that case, being able
>> to
>> > > > > process
>> > > > > >> the
>> > > > > >> > > > > >> > LeaderAndIsr
>> > > > > >> > > > > >> > > requests faster will potentially help the
>> producers
>> > > > > recover
>> > > > > >> > > > quicker.
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > Thanks,
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > Jun
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
>> > > > > >> [email protected]
>> > > > > >> > >
>> > > > > >> > > > > wrote:
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > > Hey Lucas,
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > Thanks for the reply. Some follow up questions
>> > > below.
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20
>> > > > > partitions
>> > > > > >> > that
>> > > > > >> > > > are
>> > > > > >> > > > > >> > > randomly
>> > > > > >> > > > > >> > > > distributed across all partitions, then each
>> > > > > >> ProduceRequest
>> > > > > >> > > will
>> > > > > >> > > > > >> likely
>> > > > > >> > > > > >> > > > cover some partitions for which the broker is
>> > still
>> > > > > >> leader
>> > > > > >> > > after
>> > > > > >> > > > > it
>> > > > > >> > > > > >> > > quickly
>> > > > > >> > > > > >> > > > processes the
>> > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be
>> > slow
>> > > > in
>> > > > > >> > > > processing
>> > > > > >> > > > > >> these
>> > > > > >> > > > > >> > > > ProduceRequest and request will still be very
>> > high
>> > > > with
>> > > > > >> this
>> > > > > >> > > > KIP.
>> > > > > >> > > > > It
>> > > > > >> > > > > >> > > seems
>> > > > > >> > > > > >> > > > that most ProduceRequest will still timeout
>> after
>> > > 30
>> > > > > >> > seconds.
>> > > > > >> > > Is
>> > > > > >> > > > > >> this
>> > > > > >> > > > > >> > > > understanding correct?
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will still
>> > > > timeout
>> > > > > >> after
>> > > > > >> > > 30
>> > > > > >> > > > > >> > seconds,
>> > > > > >> > > > > >> > > > then it is less clear how this KIP reduces
>> > average
>> > > > > >> produce
>> > > > > >> > > > > latency.
>> > > > > >> > > > > >> Can
>> > > > > >> > > > > >> > > you
>> > > > > >> > > > > >> > > > clarify what metrics can be improved by this
>> KIP?
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > Not sure why system operator directly cares
>> > number
>> > > of
>> > > > > >> > > truncated
>> > > > > >> > > > > >> > messages.
>> > > > > >> > > > > >> > > > Do you mean this KIP can improve average
>> > throughput
>> > > > or
>> > > > > >> > reduce
>> > > > > >> > > > > >> message
>> > > > > >> > > > > >> > > > duplication? It will be good to understand
>> this.
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > Thanks,
>> > > > > >> > > > > >> > > > Dong
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
>> > > > > >> > > [email protected]
>> > > > > >> > > > >
>> > > > > >> > > > > >> > wrote:
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > > Hi Dong,
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > Thanks for your valuable comments. Please
>> see
>> > my
>> > > > > reply
>> > > > > >> > > below.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 partition.
>> Now
>> > > > let's
>> > > > > >> > > consider
>> > > > > >> > > > a
>> > > > > >> > > > > >> more
>> > > > > >> > > > > >> > > > common
>> > > > > >> > > > > >> > > > > scenario
>> > > > > >> > > > > >> > > > > where broker0 is the leader of many
>> partitions.
>> > > And
>> > > > > >> let's
>> > > > > >> > > say
>> > > > > >> > > > > for
>> > > > > >> > > > > >> > some
>> > > > > >> > > > > >> > > > > reason its IO becomes slow.
>> > > > > >> > > > > >> > > > > The number of leader partitions on broker0
>> is
>> > so
>> > > > > large,
>> > > > > >> > say
>> > > > > >> > > > 10K,
>> > > > > >> > > > > >> that
>> > > > > >> > > > > >> > > the
>> > > > > >> > > > > >> > > > > cluster is skewed,
>> > > > > >> > > > > >> > > > > and the operator would like to shift the
>> > > leadership
>> > > > > >> for a
>> > > > > >> > > lot
>> > > > > >> > > > of
>> > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
>> > > > > >> > > > > >> > > > > either manually or through some service like
>> > > cruise
>> > > > > >> > control.
>> > > > > >> > > > > >> > > > > With this KIP, not only will the leadership
>> > > > > transitions
>> > > > > >> > > finish
>> > > > > >> > > > > >> more
>> > > > > >> > > > > >> > > > > quickly, helping the cluster itself becoming
>> > more
>> > > > > >> > balanced,
>> > > > > >> > > > > >> > > > > but all existing producers corresponding to
>> the
>> > > 9K
>> > > > > >> > > partitions
>> > > > > >> > > > > will
>> > > > > >> > > > > >> > get
>> > > > > >> > > > > >> > > > the
>> > > > > >> > > > > >> > > > > errors relatively quickly
>> > > > > >> > > > > >> > > > > rather than relying on their timeout,
>> thanks to
>> > > the
>> > > > > >> > batched
>> > > > > >> > > > > async
>> > > > > >> > > > > >> ZK
>> > > > > >> > > > > >> > > > > operations.
>> > > > > >> > > > > >> > > > > To me it's a useful feature to have during
>> such
>> > > > > >> > troublesome
>> > > > > >> > > > > times.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have
>> shown
>> > > > that
>> > > > > >> with
>> > > > > >> > > this
>> > > > > >> > > > > KIP
>> > > > > >> > > > > >> > many
>> > > > > >> > > > > >> > > > > producers
>> > > > > >> > > > > >> > > > > receive an explicit error
>> > NotLeaderForPartition,
>> > > > > based
>> > > > > >> on
>> > > > > >> > > > which
>> > > > > >> > > > > >> they
>> > > > > >> > > > > >> > > > retry
>> > > > > >> > > > > >> > > > > immediately.
>> > > > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick
>> retry)
>> > > for
>> > > > > >> their
>> > > > > >> > > > single
>> > > > > >> > > > > >> > > message
>> > > > > >> > > > > >> > > > is
>> > > > > >> > > > > >> > > > > much smaller
>> > > > > >> > > > > >> > > > > compared with the case of timing out without
>> > the
>> > > > KIP
>> > > > > >> (30
>> > > > > >> > > > seconds
>> > > > > >> > > > > >> for
>> > > > > >> > > > > >> > > > timing
>> > > > > >> > > > > >> > > > > out + quick retry).
>> > > > > >> > > > > >> > > > > One might argue that reducing the timing
>> out on
>> > > the
>> > > > > >> > producer
>> > > > > >> > > > > side
>> > > > > >> > > > > >> can
>> > > > > >> > > > > >> > > > > achieve the same result,
>> > > > > >> > > > > >> > > > > yet reducing the timeout has its own
>> > > drawbacks[1].
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > Also *IF* there were a metric to show the
>> > number
>> > > of
>> > > > > >> > > truncated
>> > > > > >> > > > > >> > messages
>> > > > > >> > > > > >> > > on
>> > > > > >> > > > > >> > > > > brokers,
>> > > > > >> > > > > >> > > > > with the experiments done in the Google
>> Doc, it
>> > > > > should
>> > > > > >> be
>> > > > > >> > > easy
>> > > > > >> > > > > to
>> > > > > >> > > > > >> see
>> > > > > >> > > > > >> > > > that
>> > > > > >> > > > > >> > > > > a lot fewer messages need
>> > > > > >> > > > > >> > > > > to be truncated on broker0 since the
>> up-to-date
>> > > > > >> metadata
>> > > > > >> > > > avoids
>> > > > > >> > > > > >> > > appending
>> > > > > >> > > > > >> > > > > of messages
>> > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk
>> to a
>> > > > > system
>> > > > > >> > > > operator
>> > > > > >> > > > > >> and
>> > > > > >> > > > > >> > ask
>> > > > > >> > > > > >> > > > > whether
>> > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most
>> > likely
>> > > > the
>> > > > > >> > answer
>> > > > > >> > > > is
>> > > > > >> > > > > >> yes.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > 3. To answer your question, I think it
>> might be
>> > > > > >> helpful to
>> > > > > >> > > > > >> construct
>> > > > > >> > > > > >> > > some
>> > > > > >> > > > > >> > > > > formulas.
>> > > > > >> > > > > >> > > > > To simplify the modeling, I'm going back to
>> the
>> > > > case
>> > > > > >> where
>> > > > > >> > > > there
>> > > > > >> > > > > >> is
>> > > > > >> > > > > >> > > only
>> > > > > >> > > > > >> > > > > ONE partition involved.
>> > > > > >> > > > > >> > > > > Following the experiments in the Google Doc,
>> > > let's
>> > > > > say
>> > > > > >> > > broker0
>> > > > > >> > > > > >> > becomes
>> > > > > >> > > > > >> > > > the
>> > > > > >> > > > > >> > > > > follower at time t0,
>> > > > > >> > > > > >> > > > > and after t0 there were still N produce
>> > requests
>> > > in
>> > > > > its
>> > > > > >> > > > request
>> > > > > >> > > > > >> > queue.
>> > > > > >> > > > > >> > > > > With the up-to-date metadata brought by this
>> > KIP,
>> > > > > >> broker0
>> > > > > >> > > can
>> > > > > >> > > > > >> reply
>> > > > > >> > > > > >> > > with
>> > > > > >> > > > > >> > > > an
>> > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
>> > > > > >> > > > > >> > > > > let's use M1 to denote the average
>> processing
>> > > time
>> > > > of
>> > > > > >> > > replying
>> > > > > >> > > > > >> with
>> > > > > >> > > > > >> > > such
>> > > > > >> > > > > >> > > > an
>> > > > > >> > > > > >> > > > > error message.
>> > > > > >> > > > > >> > > > > Without this KIP, the broker will need to
>> > append
>> > > > > >> messages
>> > > > > >> > to
>> > > > > >> > > > > >> > segments,
>> > > > > >> > > > > >> > > > > which may trigger a flush to disk,
>> > > > > >> > > > > >> > > > > let's use M2 to denote the average
>> processing
>> > > time
>> > > > > for
>> > > > > >> > such
>> > > > > >> > > > > logic.
>> > > > > >> > > > > >> > > > > Then the average extra latency incurred
>> without
>> > > > this
>> > > > > >> KIP
>> > > > > >> > is
>> > > > > >> > > N
>> > > > > >> > > > *
>> > > > > >> > > > > >> (M2 -
>> > > > > >> > > > > >> > > > M1) /
>> > > > > >> > > > > >> > > > > 2.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > In practice, M2 should always be larger than
>> > M1,
>> > > > > which
>> > > > > >> > means
>> > > > > >> > > > as
>> > > > > >> > > > > >> long
>> > > > > >> > > > > >> > > as N
>> > > > > >> > > > > >> > > > > is positive,
>> > > > > >> > > > > >> > > > > we would see improvements on the average
>> > latency.
>> > > > > >> > > > > >> > > > > There does not need to be significant
>> backlog
>> > of
>> > > > > >> requests
>> > > > > >> > in
>> > > > > >> > > > the
>> > > > > >> > > > > >> > > request
>> > > > > >> > > > > >> > > > > queue,
>> > > > > >> > > > > >> > > > > or severe degradation of disk performance to
>> > have
>> > > > the
>> > > > > >> > > > > improvement.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > Regards,
>> > > > > >> > > > > >> > > > > Lucas
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on
>> the
>> > > > > producer
>> > > > > >> > side
>> > > > > >> > > > can
>> > > > > >> > > > > >> > trigger
>> > > > > >> > > > > >> > > > > unnecessary duplicate requests
>> > > > > >> > > > > >> > > > > when the corresponding leader broker is
>> > > overloaded,
>> > > > > >> > > > exacerbating
>> > > > > >> > > > > >> the
>> > > > > >> > > > > >> > > > > situation.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
>> > > > > >> > > [email protected]
>> > > > > >> > > > >
>> > > > > >> > > > > >> > wrote:
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > > Hey Lucas,
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > Thanks much for the detailed
>> documentation of
>> > > the
>> > > > > >> > > > experiment.
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > Initially I also think having a separate
>> > queue
>> > > > for
>> > > > > >> > > > controller
>> > > > > >> > > > > >> > > requests
>> > > > > >> > > > > >> > > > is
>> > > > > >> > > > > >> > > > > > useful because, as you mentioned in the
>> > summary
>> > > > > >> section
>> > > > > >> > of
>> > > > > >> > > > the
>> > > > > >> > > > > >> > Google
>> > > > > >> > > > > >> > > > > doc,
>> > > > > >> > > > > >> > > > > > controller requests are generally more
>> > > important
>> > > > > than
>> > > > > >> > data
>> > > > > >> > > > > >> requests
>> > > > > >> > > > > >> > > and
>> > > > > >> > > > > >> > > > > we
>> > > > > >> > > > > >> > > > > > probably want controller requests to be
>> > > processed
>> > > > > >> > sooner.
>> > > > > >> > > > But
>> > > > > >> > > > > >> then
>> > > > > >> > > > > >> > > Eno
>> > > > > >> > > > > >> > > > > has
>> > > > > >> > > > > >> > > > > > two very good questions which I am not
>> sure
>> > the
>> > > > > >> Google
>> > > > > >> > doc
>> > > > > >> > > > has
>> > > > > >> > > > > >> > > answered
>> > > > > >> > > > > >> > > > > > explicitly. Could you help with the
>> following
>> > > > > >> questions?
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > 1) It is not very clear what is the actual
>> > > > benefit
>> > > > > of
>> > > > > >> > > > KIP-291
>> > > > > >> > > > > to
>> > > > > >> > > > > >> > > users.
>> > > > > >> > > > > >> > > > > The
>> > > > > >> > > > > >> > > > > > experiment setup in the Google doc
>> simulates
>> > > the
>> > > > > >> > scenario
>> > > > > >> > > > that
>> > > > > >> > > > > >> > broker
>> > > > > >> > > > > >> > > > is
>> > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due to
>> e.g.
>> > > > slow
>> > > > > >> disk.
>> > > > > >> > > It
>> > > > > >> > > > > >> > currently
>> > > > > >> > > > > >> > > > > > assumes that there is only 1 partition.
>> But
>> > in
>> > > > the
>> > > > > >> > common
>> > > > > >> > > > > >> scenario,
>> > > > > >> > > > > >> > > it
>> > > > > >> > > > > >> > > > is
>> > > > > >> > > > > >> > > > > > probably reasonable to assume that there
>> are
>> > > many
>> > > > > >> other
>> > > > > >> > > > > >> partitions
>> > > > > >> > > > > >> > > that
>> > > > > >> > > > > >> > > > > are
>> > > > > >> > > > > >> > > > > > also actively produced to and
>> ProduceRequest
>> > to
>> > > > > these
>> > > > > >> > > > > partition
>> > > > > >> > > > > >> > also
>> > > > > >> > > > > >> > > > > takes
>> > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if
>> > > > broker0
>> > > > > >> can
>> > > > > >> > > > become
>> > > > > >> > > > > >> > > follower
>> > > > > >> > > > > >> > > > > for
>> > > > > >> > > > > >> > > > > > the partition 0 soon, it probably still
>> needs
>> > > to
>> > > > > >> process
>> > > > > >> > > the
>> > > > > >> > > > > >> > > > > ProduceRequest
>> > > > > >> > > > > >> > > > > > slowly t in the queue because these
>> > > > ProduceRequests
>> > > > > >> > cover
>> > > > > >> > > > > other
>> > > > > >> > > > > >> > > > > partitions.
>> > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
>> timeout
>> > > after
>> > > > > 30
>> > > > > >> > > seconds
>> > > > > >> > > > > and
>> > > > > >> > > > > >> > most
>> > > > > >> > > > > >> > > > > > clients will still likely timeout after 30
>> > > > seconds.
>> > > > > >> Then
>> > > > > >> > > it
>> > > > > >> > > > is
>> > > > > >> > > > > >> not
>> > > > > >> > > > > >> > > > > > obviously what is the benefit to client
>> since
>> > > > > client
>> > > > > >> > will
>> > > > > >> > > > > >> timeout
>> > > > > >> > > > > >> > > after
>> > > > > >> > > > > >> > > > > 30
>> > > > > >> > > > > >> > > > > > seconds before possibly re-connecting to
>> > > broker1,
>> > > > > >> with
>> > > > > >> > or
>> > > > > >> > > > > >> without
>> > > > > >> > > > > >> > > > > KIP-291.
>> > > > > >> > > > > >> > > > > > Did I miss something here?
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
>> specific
>> > > > > benefits
>> > > > > >> of
>> > > > > >> > > this
>> > > > > >> > > > > >> KIP to
>> > > > > >> > > > > >> > > > user
>> > > > > >> > > > > >> > > > > or
>> > > > > >> > > > > >> > > > > > system administrator, e.g. whether this
>> KIP
>> > > > > decreases
>> > > > > >> > > > average
>> > > > > >> > > > > >> > > latency,
>> > > > > >> > > > > >> > > > > > 999th percentile latency, probably of
>> > exception
>> > > > > >> exposed
>> > > > > >> > to
>> > > > > >> > > > > >> client
>> > > > > >> > > > > >> > > etc.
>> > > > > >> > > > > >> > > > It
>> > > > > >> > > > > >> > > > > > is probably useful to clarify this.
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
>> experience
>> > > > only
>> > > > > >> when
>> > > > > >> > > > there
>> > > > > >> > > > > is
>> > > > > >> > > > > >> > > issue
>> > > > > >> > > > > >> > > > > with
>> > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in the
>> > request
>> > > > > queue
>> > > > > >> > due
>> > > > > >> > > to
>> > > > > >> > > > > >> slow
>> > > > > >> > > > > >> > > disk
>> > > > > >> > > > > >> > > > as
>> > > > > >> > > > > >> > > > > > described in the Google doc? Or is this
>> KIP
>> > > also
>> > > > > >> useful
>> > > > > >> > > when
>> > > > > >> > > > > >> there
>> > > > > >> > > > > >> > is
>> > > > > >> > > > > >> > > > no
>> > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It might be
>> > > helpful
>> > > > > to
>> > > > > >> > > clarify
>> > > > > >> > > > > >> this
>> > > > > >> > > > > >> > to
>> > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > Thanks much,
>> > > > > >> > > > > >> > > > > > Dong
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas
>> Wang <
>> > > > > >> > > > > >> [email protected]
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > > > wrote:
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > > Hi Eno,
>> > > > > >> > > > > >> > > > > > >
>> > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
>> > experiment
>> > > > > >> results.
>> > > > > >> > > > > >> > > > > > > Here is a link to the positive impact
>> > > achieved
>> > > > by
>> > > > > >> > > > > implementing
>> > > > > >> > > > > >> > the
>> > > > > >> > > > > >> > > > > > proposed
>> > > > > >> > > > > >> > > > > > > change:
>> > > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/
>> > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
>> > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
>> > > > > >> > > > > >> > > > > > > Please take a look when you have time
>> and
>> > let
>> > > > me
>> > > > > >> know
>> > > > > >> > > your
>> > > > > >> > > > > >> > > feedback.
>> > > > > >> > > > > >> > > > > > >
>> > > > > >> > > > > >> > > > > > > Regards,
>> > > > > >> > > > > >> > > > > > > Lucas
>> > > > > >> > > > > >> > > > > > >
>> > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha
>> <
>> > > > > >> > > [email protected]>
>> > > > > >> > > > > >> wrote:
>> > > > > >> > > > > >> > > > > > >
>> > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a
>> look
>> > > > might
>> > > > > >> suit
>> > > > > >> > > our
>> > > > > >> > > > > >> > > > requirements
>> > > > > >> > > > > >> > > > > > > > better.
>> > > > > >> > > > > >> > > > > > > >
>> > > > > >> > > > > >> > > > > > > > Thanks,
>> > > > > >> > > > > >> > > > > > > > Harsha
>> > > > > >> > > > > >> > > > > > > >
>> > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM,
>> Lucas
>> > > > Wang <
>> > > > > >> > > > > >> > > > [email protected]
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > > > wrote:
>> > > > > >> > > > > >> > > > > > > >
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > Hi Harsha,
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
>> > > replication
>> > > > > >> quota
>> > > > > >> > > > > mechanism
>> > > > > >> > > > > >> > > > proposed
>> > > > > >> > > > > >> > > > > > in
>> > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
>> scenario.
>> > > > > >> > > > > >> > > > > > > > > Have you tried it out?
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > Thanks,
>> > > > > >> > > > > >> > > > > > > > > Lucas
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM,
>> > Harsha <
>> > > > > >> > > > > [email protected]
>> > > > > >> > > > > >> >
>> > > > > >> > > > > >> > > > wrote:
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > Hi Lucas,
>> > > > > >> > > > > >> > > > > > > > > > One more question, any thoughts on
>> > > making
>> > > > > >> this
>> > > > > >> > > > > >> configurable
>> > > > > >> > > > > >> > > > > > > > > > and also allowing subset of data
>> > > requests
>> > > > > to
>> > > > > >> be
>> > > > > >> > > > > >> > prioritized.
>> > > > > >> > > > > >> > > > For
>> > > > > >> > > > > >> > > > > > > > example
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when we
>> > take
>> > > > out
>> > > > > a
>> > > > > >> > > broker
>> > > > > >> > > > > and
>> > > > > >> > > > > >> > bring
>> > > > > >> > > > > >> > > > new
>> > > > > >> > > > > >> > > > > > one
>> > > > > >> > > > > >> > > > > > > > it
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > will try to become follower and
>> have
>> > > lot
>> > > > of
>> > > > > >> > fetch
>> > > > > >> > > > > >> requests
>> > > > > >> > > > > >> > to
>> > > > > >> > > > > >> > > > > other
>> > > > > >> > > > > >> > > > > > > > > leaders
>> > > > > >> > > > > >> > > > > > > > > > in clusters. This will negatively
>> > > effect
>> > > > > the
>> > > > > >> > > > > >> > > application/client
>> > > > > >> > > > > >> > > > > > > > > requests.
>> > > > > >> > > > > >> > > > > > > > > > We are also exploring the similar
>> > > > solution
>> > > > > to
>> > > > > >> > > > > >> de-prioritize
>> > > > > >> > > > > >> > > if
>> > > > > >> > > > > >> > > > a
>> > > > > >> > > > > >> > > > > > new
>> > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch
>> requests,
>> > we
>> > > > are
>> > > > > >> ok
>> > > > > >> > > with
>> > > > > >> > > > > the
>> > > > > >> > > > > >> > > replica
>> > > > > >> > > > > >> > > > > to
>> > > > > >> > > > > >> > > > > > be
>> > > > > >> > > > > >> > > > > > > > > > taking time but the leaders should
>> > > > > prioritize
>> > > > > >> > the
>> > > > > >> > > > > client
>> > > > > >> > > > > >> > > > > requests.
>> > > > > >> > > > > >> > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > Thanks,
>> > > > > >> > > > > >> > > > > > > > > > Harsha
>> > > > > >> > > > > >> > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM
>> > > Lucas
>> > > > > Wang
>> > > > > >> > > wrote:
>> > > > > >> > > > > >> > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > Hi Eno,
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed response.
>> > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the
>> feature
>> > > > yet,
>> > > > > >> so no
>> > > > > >> > > > > >> > experimental
>> > > > > >> > > > > >> > > > > > results
>> > > > > >> > > > > >> > > > > > > > so
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > far.
>> > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out in the
>> > > > > following
>> > > > > >> > days.
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right that
>> the
>> > > > > >> priority
>> > > > > >> > > queue
>> > > > > >> > > > > >> does
>> > > > > >> > > > > >> > not
>> > > > > >> > > > > >> > > > > > > > completely
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > prevent
>> > > > > >> > > > > >> > > > > > > > > > > data requests being processed
>> ahead
>> > > of
>> > > > > >> > > controller
>> > > > > >> > > > > >> > requests.
>> > > > > >> > > > > >> > > > > > > > > > > That being said, I expect it to
>> > > greatly
>> > > > > >> > mitigate
>> > > > > >> > > > the
>> > > > > >> > > > > >> > effect
>> > > > > >> > > > > >> > > > of
>> > > > > >> > > > > >> > > > > > > stable
>> > > > > >> > > > > >> > > > > > > > > > > metadata.
>> > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out and
>> > post
>> > > > the
>> > > > > >> > > results
>> > > > > >> > > > > >> when I
>> > > > > >> > > > > >> > > have
>> > > > > >> > > > > >> > > > > it.
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > Regards,
>> > > > > >> > > > > >> > > > > > > > > > > Lucas
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM,
>> > Eno
>> > > > > >> Thereska
>> > > > > >> > <
>> > > > > >> > > > > >> > > > > > > > [email protected]
>> > > > > >> > > > > >> > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > wrote:
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
>> > > > > >> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just had
>> a
>> > > look
>> > > > at
>> > > > > >> > this.
>> > > > > >> > > A
>> > > > > >> > > > > >> couple
>> > > > > >> > > > > >> > of
>> > > > > >> > > > > >> > > > > > > > questions:
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > - did you notice any positive
>> > > change
>> > > > > >> after
>> > > > > >> > > > > >> implementing
>> > > > > >> > > > > >> > > > this
>> > > > > >> > > > > >> > > > > > KIP?
>> > > > > >> > > > > >> > > > > > > > > I'm
>> > > > > >> > > > > >> > > > > > > > > > > > wondering if you have any
>> > > > experimental
>> > > > > >> > results
>> > > > > >> > > > > that
>> > > > > >> > > > > >> > show
>> > > > > >> > > > > >> > > > the
>> > > > > >> > > > > >> > > > > > > > benefit
>> > > > > >> > > > > >> > > > > > > > > of
>> > > > > >> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > >> > > > > > > > > > > > two queues.
>> > > > > >> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not
>> > > sufficient
>> > > > in
>> > > > > >> > > > addressing
>> > > > > >> > > > > >> the
>> > > > > >> > > > > >> > > > > problem
>> > > > > >> > > > > >> > > > > > > the
>> > > > > >> > > > > >> > > > > > > > > KIP
>> > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with priority
>> > > > queues,
>> > > > > >> you
>> > > > > >> > > will
>> > > > > >> > > > > >> > sometimes
>> > > > > >> > > > > >> > > > > > > (often?)
>> > > > > >> > > > > >> > > > > > > > > have
>> > > > > >> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > >> > > > > > > > > > > > case that data plane requests
>> > will
>> > > be
>> > > > > >> ahead
>> > > > > >> > of
>> > > > > >> > > > the
>> > > > > >> > > > > >> > > control
>> > > > > >> > > > > >> > > > > > plane
>> > > > > >> > > > > >> > > > > > > > > > > requests.
>> > > > > >> > > > > >> > > > > > > > > > > > This happens because the
>> system
>> > > might
>> > > > > >> have
>> > > > > >> > > > already
>> > > > > >> > > > > >> > > started
>> > > > > >> > > > > >> > > > > > > > > processing
>> > > > > >> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > >> > > > > > > > > > > > data plane requests before the
>> > > > control
>> > > > > >> plane
>> > > > > >> > > > ones
>> > > > > >> > > > > >> > > arrived.
>> > > > > >> > > > > >> > > > So
>> > > > > >> > > > > >> > > > > > it
>> > > > > >> > > > > >> > > > > > > > > would
>> > > > > >> > > > > >> > > > > > > > > > > be
>> > > > > >> > > > > >> > > > > > > > > > > > good to know what % of the
>> > problem
>> > > > this
>> > > > > >> KIP
>> > > > > >> > > > > >> addresses.
>> > > > > >> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > Thanks
>> > > > > >> > > > > >> > > > > > > > > > > > Eno
>> > > > > >> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44
>> PM,
>> > > Ted
>> > > > > Yu <
>> > > > > >> > > > > >> > > > > [email protected]
>> > > > > >> > > > > >> > > > > > >
>> > > > > >> > > > > >> > > > > > > > > wrote:
>> > > > > >> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
>> > > > > >> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > Thanks
>> > > > > >> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42
>> > AM,
>> > > > > Lucas
>> > > > > >> > Wang
>> > > > > >> > > <
>> > > > > >> > > > > >> > > > > > > > [email protected]
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > wrote:
>> > > > > >> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
>> > > > > >> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion.
>> > I've
>> > > > > >> updated
>> > > > > >> > > the
>> > > > > >> > > > > KIP.
>> > > > > >> > > > > >> > > Please
>> > > > > >> > > > > >> > > > > > take
>> > > > > >> > > > > >> > > > > > > > > > another
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > look.
>> > > > > >> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > Lucas
>> > > > > >> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
>> 6:34
>> > > PM,
>> > > > > Ted
>> > > > > >> Yu
>> > > > > >> > <
>> > > > > >> > > > > >> > > > > > > [email protected]
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > wrote:
>> > > > > >> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
>> > > KafkaConfig.scala
>> > > > :
>> > > > > >> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests =
>> 500
>> > > > > >> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if you
>> can
>> > > > > include
>> > > > > >> > the
>> > > > > >> > > > > >> default
>> > > > > >> > > > > >> > > value
>> > > > > >> > > > > >> > > > > for
>> > > > > >> > > > > >> > > > > > > > this
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >>
>
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Reply via email to