Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Becket Qin Wed, 18 Jul 2018 03:00:53 -0700

Hi Lucas,

I guess my question can be rephrased to "do we expect user to ever change
the controller request queue capacity"? If we agree that 20 is already a
very generous default number and we do not expect user to change it, is it
still necessary to expose this as a config?


Thanks,

Jiangjie (Becket) Qin

On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <[email protected]> wrote:

> @Becket
> 1. Thanks for the comment. You are right that normally there should be just
> one controller request because of muting,
> and I had NOT intended to say there would be many enqueued controller
> requests.
> I went through the KIP again, and I'm not sure which part conveys that
> info.
> I'd be happy to revise if you point it out the section.
>
> 2. Though it should not happen in normal conditions, the current design
> does not preclude multiple controllers running
> at the same time, hence if we don't have the controller queue capacity
> config and simply make its capacity to be 1,
> network threads handling requests from different controllers will be
> blocked during those troublesome times,
> which is probably not what we want. On the other hand, adding the extra
> config with a default value, say 20, guards us from issues in those
> troublesome times, and IMO there isn't much downside of adding the extra
> config.
>
> @Mayuresh
> Good catch, this sentence is an obsolete statement based on a previous
> design. I've revised the wording in the KIP.
>
> Thanks,
> Lucas
>
> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> [email protected]> wrote:
>
> > Hi Lucas,
> >
> > Thanks for the KIP.
> > I am trying to understand why you think "The memory consumption can rise
> > given the total number of queued requests can go up to 2x" in the impact
> > section. Normally the requests from controller to a Broker are not high
> > volume, right ?
> >
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <[email protected]> wrote:
> >
> > > Thanks for the KIP, Lucas. Separating the control plane from the data
> > plane
> > > makes a lot of sense.
> > >
> > > In the KIP you mentioned that the controller request queue may have
> many
> > > requests in it. Will this be a common case? The controller requests
> still
> > > goes through the SocketServer. The SocketServer will mute the channel
> > once
> > > a request is read and put into the request channel. So assuming there
> is
> > > only one connection between controller and each broker, on the broker
> > side,
> > > there should be only one controller request in the controller request
> > queue
> > > at any given time. If that is the case, do we need a separate
> controller
> > > request queue capacity config? The default value 20 means that we
> expect
> > > there are 20 controller switches to happen in a short period of time. I
> > am
> > > not sure whether someone should increase the controller request queue
> > > capacity to handle such case, as it seems indicating something very
> wrong
> > > has happened.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <[email protected]> wrote:
> > >
> > > > Thanks for the update Lucas.
> > > >
> > > > I think the motivation section is intuitive. It will be good to learn
> > > more
> > > > about the comments from other reviewers.
> > > >
> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <[email protected]>
> > > wrote:
> > > >
> > > > > Hi Dong,
> > > > >
> > > > > I've updated the motivation section of the KIP by explaining the
> > cases
> > > > that
> > > > > would have user impacts.
> > > > > Please take a look at let me know your comments.
> > > > >
> > > > > Thanks,
> > > > > Lucas
> > > > >
> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Hi Dong,
> > > > > >
> > > > > > The simulation of disk being slow is merely for me to easily
> > > construct
> > > > a
> > > > > > testing scenario
> > > > > > with a backlog of produce requests. In production, other than the
> > > disk
> > > > > > being slow, a backlog of
> > > > > > produce requests may also be caused by high produce QPS.
> > > > > > In that case, we may not want to kill the broker and that's when
> > this
> > > > KIP
> > > > > > can be useful, both for JBOD
> > > > > > and non-JBOD setup.
> > > > > >
> > > > > > Going back to your previous question about each ProduceRequest
> > > covering
> > > > > 20
> > > > > > partitions that are randomly
> > > > > > distributed, let's say a LeaderAndIsr request is enqueued that
> > tries
> > > to
> > > > > > switch the current broker, say broker0, from leader to follower
> > > > > > *for one of the partitions*, say *test-0*. For the sake of
> > argument,
> > > > > > let's also assume the other brokers, say broker1, have *stopped*
> > > > fetching
> > > > > > from
> > > > > > the current broker, i.e. broker0.
> > > > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
> > > > > >   1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR
> > > will
> > > > be
> > > > > > put into the purgatory,
> > > > > >         and since they'll never be replicated to other brokers
> > > (because
> > > > > of
> > > > > > the assumption made above), they will
> > > > > >         be completed either when the LeaderAndISR request is
> > > processed
> > > > or
> > > > > > when the timeout happens.
> > > > > >   1.2 With this KIP, broker0 will immediately transition the
> > > partition
> > > > > > test-0 to become a follower,
> > > > > >         after the current broker sees the replication of the
> > > remaining
> > > > 19
> > > > > > partitions, it can send a response indicating that
> > > > > >         it's no longer the leader for the "test-0".
> > > > > >   To see the latency difference between 1.1 and 1.2, let's say
> > there
> > > > are
> > > > > > 24K produce requests ahead of the LeaderAndISR, and there are 8
> io
> > > > > threads,
> > > > > >   so each io thread will process approximately 3000 produce
> > requests.
> > > > Now
> > > > > > let's investigate the io thread that finally processed the
> > > > LeaderAndISR.
> > > > > >   For the 3000 produce requests, if we model the time when their
> > > > > remaining
> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR
> > > > request
> > > > > is
> > > > > > processed at time t3000.
> > > > > >   Without this KIP, the 1st produce request would have waited an
> > > extra
> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra time of t3000
> -
> > > t1,
> > > > > etc.
> > > > > >   Roughly speaking, the latency difference is bigger for the
> > earlier
> > > > > > produce requests than for the later ones. For the same reason,
> the
> > > more
> > > > > > ProduceRequests queued
> > > > > >   before the LeaderAndISR, the bigger benefit we get (capped by
> the
> > > > > > produce timeout).
> > > > > > 2. If the enqueued produce requests have acks=0 or acks=1
> > > > > >   There will be no latency differences in this case, but
> > > > > >   2.1 without this KIP, the records of partition test-0 in the
> > > > > > ProduceRequests ahead of the LeaderAndISR will be appended to the
> > > local
> > > > > log,
> > > > > >         and eventually be truncated after processing the
> > > LeaderAndISR.
> > > > > > This is what's referred to as
> > > > > >         "some unofficial definition of data loss in terms of
> > messages
> > > > > > beyond the high watermark".
> > > > > >   2.2 with this KIP, we can mitigate the effect since if the
> > > > LeaderAndISR
> > > > > > is immediately processed, the response to producers will have
> > > > > >         the NotLeaderForPartition error, causing producers to
> retry
> > > > > >
> > > > > > This explanation above is the benefit for reducing the latency
> of a
> > > > > broker
> > > > > > becoming the follower,
> > > > > > closely related is reducing the latency of a broker becoming the
> > > > leader.
> > > > > > In this case, the benefit is even more obvious, if other brokers
> > have
> > > > > > resigned leadership, and the
> > > > > > current broker should take leadership. Any delay in processing
> the
> > > > > > LeaderAndISR will be perceived
> > > > > > by clients as unavailability. In extreme cases, this can cause
> > failed
> > > > > > produce requests if the retries are
> > > > > > exhausted.
> > > > > >
> > > > > > Another two types of controller requests are UpdateMetadata and
> > > > > > StopReplica, which I'll briefly discuss as follows:
> > > > > > For UpdateMetadata requests, delayed processing means clients
> > > receiving
> > > > > > stale metadata, e.g. with the wrong leadership info
> > > > > > for certain partitions, and the effect is more retries or even
> > fatal
> > > > > > failure if the retries are exhausted.
> > > > > >
> > > > > > For StopReplica requests, a long queuing time may degrade the
> > > > performance
> > > > > > of topic deletion.
> > > > > >
> > > > > > Regarding your last question of the delay for
> > DescribeLogDirsRequest,
> > > > you
> > > > > > are right
> > > > > > that this KIP cannot help with the latency in getting the log
> dirs
> > > > info,
> > > > > > and it's only relevant
> > > > > > when controller requests are involved.
> > > > > >
> > > > > > Regards,
> > > > > > Lucas
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <[email protected]>
> > > wrote:
> > > > > >
> > > > > >> Hey Jun,
> > > > > >>
> > > > > >> Thanks much for the comments. It is good point. So the feature
> may
> > > be
> > > > > >> useful for JBOD use-case. I have one question below.
> > > > > >>
> > > > > >> Hey Lucas,
> > > > > >>
> > > > > >> Do you think this feature is also useful for non-JBOD setup or
> it
> > is
> > > > > only
> > > > > >> useful for the JBOD setup? It may be useful to understand this.
> > > > > >>
> > > > > >> When the broker is setup using JBOD, in order to move leaders on
> > the
> > > > > >> failed
> > > > > >> disk to other disks, the system operator first needs to get the
> > list
> > > > of
> > > > > >> partitions on the failed disk. This is currently achieved using
> > > > > >> AdminClient.describeLogDirs(), which sends
> DescribeLogDirsRequest
> > to
> > > > the
> > > > > >> broker. If we only prioritize the controller requests, then the
> > > > > >> DescribeLogDirsRequest
> > > > > >> may still take a long time to be processed by the broker. So the
> > > > overall
> > > > > >> time to move leaders away from the failed disk may still be long
> > > even
> > > > > with
> > > > > >> this KIP. What do you think?
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Dong
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> [email protected]
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > >> >
> > > > > >> > @Dong,
> > > > > >> > Since both of the two comments in your previous email are
> about
> > > the
> > > > > >> > benefits of this KIP and whether it's useful,
> > > > > >> > in light of Jun's last comment, do you agree that this KIP can
> > be
> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > >> > Please let me know, thanks!
> > > > > >> >
> > > > > >> > Regards,
> > > > > >> > Lucas
> > > > > >> >
> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <[email protected]>
> > wrote:
> > > > > >> >
> > > > > >> > > Hi, Lucas, Dong,
> > > > > >> > >
> > > > > >> > > If all disks on a broker are slow, one probably should just
> > kill
> > > > the
> > > > > >> > > broker. In that case, this KIP may not help. If only one of
> > the
> > > > > disks
> > > > > >> on
> > > > > >> > a
> > > > > >> > > broker is slow, one may want to fail that disk and move the
> > > > leaders
> > > > > on
> > > > > >> > that
> > > > > >> > > disk to other brokers. In that case, being able to process
> the
> > > > > >> > LeaderAndIsr
> > > > > >> > > requests faster will potentially help the producers recover
> > > > quicker.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > Jun
> > > > > >> > >
> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> [email protected]
> > >
> > > > > wrote:
> > > > > >> > >
> > > > > >> > > > Hey Lucas,
> > > > > >> > > >
> > > > > >> > > > Thanks for the reply. Some follow up questions below.
> > > > > >> > > >
> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20 partitions
> > that
> > > > are
> > > > > >> > > randomly
> > > > > >> > > > distributed across all partitions, then each
> ProduceRequest
> > > will
> > > > > >> likely
> > > > > >> > > > cover some partitions for which the broker is still leader
> > > after
> > > > > it
> > > > > >> > > quickly
> > > > > >> > > > processes the
> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be slow in
> > > > processing
> > > > > >> these
> > > > > >> > > > ProduceRequest and request will still be very high with
> this
> > > > KIP.
> > > > > It
> > > > > >> > > seems
> > > > > >> > > > that most ProduceRequest will still timeout after 30
> > seconds.
> > > Is
> > > > > >> this
> > > > > >> > > > understanding correct?
> > > > > >> > > >
> > > > > >> > > > Regarding 2, if most ProduceRequest will still timeout
> after
> > > 30
> > > > > >> > seconds,
> > > > > >> > > > then it is less clear how this KIP reduces average produce
> > > > > latency.
> > > > > >> Can
> > > > > >> > > you
> > > > > >> > > > clarify what metrics can be improved by this KIP?
> > > > > >> > > >
> > > > > >> > > > Not sure why system operator directly cares number of
> > > truncated
> > > > > >> > messages.
> > > > > >> > > > Do you mean this KIP can improve average throughput or
> > reduce
> > > > > >> message
> > > > > >> > > > duplication? It will be good to understand this.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Dong
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > > [email protected]
> > > > >
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi Dong,
> > > > > >> > > > >
> > > > > >> > > > > Thanks for your valuable comments. Please see my reply
> > > below.
> > > > > >> > > > >
> > > > > >> > > > > 1. The Google doc showed only 1 partition. Now let's
> > > consider
> > > > a
> > > > > >> more
> > > > > >> > > > common
> > > > > >> > > > > scenario
> > > > > >> > > > > where broker0 is the leader of many partitions. And
> let's
> > > say
> > > > > for
> > > > > >> > some
> > > > > >> > > > > reason its IO becomes slow.
> > > > > >> > > > > The number of leader partitions on broker0 is so large,
> > say
> > > > 10K,
> > > > > >> that
> > > > > >> > > the
> > > > > >> > > > > cluster is skewed,
> > > > > >> > > > > and the operator would like to shift the leadership for
> a
> > > lot
> > > > of
> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > >> > > > > either manually or through some service like cruise
> > control.
> > > > > >> > > > > With this KIP, not only will the leadership transitions
> > > finish
> > > > > >> more
> > > > > >> > > > > quickly, helping the cluster itself becoming more
> > balanced,
> > > > > >> > > > > but all existing producers corresponding to the 9K
> > > partitions
> > > > > will
> > > > > >> > get
> > > > > >> > > > the
> > > > > >> > > > > errors relatively quickly
> > > > > >> > > > > rather than relying on their timeout, thanks to the
> > batched
> > > > > async
> > > > > >> ZK
> > > > > >> > > > > operations.
> > > > > >> > > > > To me it's a useful feature to have during such
> > troublesome
> > > > > times.
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > 2. The experiments in the Google Doc have shown that
> with
> > > this
> > > > > KIP
> > > > > >> > many
> > > > > >> > > > > producers
> > > > > >> > > > > receive an explicit error NotLeaderForPartition, based
> on
> > > > which
> > > > > >> they
> > > > > >> > > > retry
> > > > > >> > > > > immediately.
> > > > > >> > > > > Therefore the latency (~14 seconds+quick retry) for
> their
> > > > single
> > > > > >> > > message
> > > > > >> > > > is
> > > > > >> > > > > much smaller
> > > > > >> > > > > compared with the case of timing out without the KIP (30
> > > > seconds
> > > > > >> for
> > > > > >> > > > timing
> > > > > >> > > > > out + quick retry).
> > > > > >> > > > > One might argue that reducing the timing out on the
> > producer
> > > > > side
> > > > > >> can
> > > > > >> > > > > achieve the same result,
> > > > > >> > > > > yet reducing the timeout has its own drawbacks[1].
> > > > > >> > > > >
> > > > > >> > > > > Also *IF* there were a metric to show the number of
> > > truncated
> > > > > >> > messages
> > > > > >> > > on
> > > > > >> > > > > brokers,
> > > > > >> > > > > with the experiments done in the Google Doc, it should
> be
> > > easy
> > > > > to
> > > > > >> see
> > > > > >> > > > that
> > > > > >> > > > > a lot fewer messages need
> > > > > >> > > > > to be truncated on broker0 since the up-to-date metadata
> > > > avoids
> > > > > >> > > appending
> > > > > >> > > > > of messages
> > > > > >> > > > > in subsequent PRODUCE requests. If we talk to a system
> > > > operator
> > > > > >> and
> > > > > >> > ask
> > > > > >> > > > > whether
> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most likely the
> > answer
> > > > is
> > > > > >> yes.
> > > > > >> > > > >
> > > > > >> > > > > 3. To answer your question, I think it might be helpful
> to
> > > > > >> construct
> > > > > >> > > some
> > > > > >> > > > > formulas.
> > > > > >> > > > > To simplify the modeling, I'm going back to the case
> where
> > > > there
> > > > > >> is
> > > > > >> > > only
> > > > > >> > > > > ONE partition involved.
> > > > > >> > > > > Following the experiments in the Google Doc, let's say
> > > broker0
> > > > > >> > becomes
> > > > > >> > > > the
> > > > > >> > > > > follower at time t0,
> > > > > >> > > > > and after t0 there were still N produce requests in its
> > > > request
> > > > > >> > queue.
> > > > > >> > > > > With the up-to-date metadata brought by this KIP,
> broker0
> > > can
> > > > > >> reply
> > > > > >> > > with
> > > > > >> > > > an
> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > >> > > > > let's use M1 to denote the average processing time of
> > > replying
> > > > > >> with
> > > > > >> > > such
> > > > > >> > > > an
> > > > > >> > > > > error message.
> > > > > >> > > > > Without this KIP, the broker will need to append
> messages
> > to
> > > > > >> > segments,
> > > > > >> > > > > which may trigger a flush to disk,
> > > > > >> > > > > let's use M2 to denote the average processing time for
> > such
> > > > > logic.
> > > > > >> > > > > Then the average extra latency incurred without this KIP
> > is
> > > N
> > > > *
> > > > > >> (M2 -
> > > > > >> > > > M1) /
> > > > > >> > > > > 2.
> > > > > >> > > > >
> > > > > >> > > > > In practice, M2 should always be larger than M1, which
> > means
> > > > as
> > > > > >> long
> > > > > >> > > as N
> > > > > >> > > > > is positive,
> > > > > >> > > > > we would see improvements on the average latency.
> > > > > >> > > > > There does not need to be significant backlog of
> requests
> > in
> > > > the
> > > > > >> > > request
> > > > > >> > > > > queue,
> > > > > >> > > > > or severe degradation of disk performance to have the
> > > > > improvement.
> > > > > >> > > > >
> > > > > >> > > > > Regards,
> > > > > >> > > > > Lucas
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > [1] For instance, reducing the timeout on the producer
> > side
> > > > can
> > > > > >> > trigger
> > > > > >> > > > > unnecessary duplicate requests
> > > > > >> > > > > when the corresponding leader broker is overloaded,
> > > > exacerbating
> > > > > >> the
> > > > > >> > > > > situation.
> > > > > >> > > > >
> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> > > [email protected]
> > > > >
> > > > > >> > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hey Lucas,
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks much for the detailed documentation of the
> > > > experiment.
> > > > > >> > > > > >
> > > > > >> > > > > > Initially I also think having a separate queue for
> > > > controller
> > > > > >> > > requests
> > > > > >> > > > is
> > > > > >> > > > > > useful because, as you mentioned in the summary
> section
> > of
> > > > the
> > > > > >> > Google
> > > > > >> > > > > doc,
> > > > > >> > > > > > controller requests are generally more important than
> > data
> > > > > >> requests
> > > > > >> > > and
> > > > > >> > > > > we
> > > > > >> > > > > > probably want controller requests to be processed
> > sooner.
> > > > But
> > > > > >> then
> > > > > >> > > Eno
> > > > > >> > > > > has
> > > > > >> > > > > > two very good questions which I am not sure the Google
> > doc
> > > > has
> > > > > >> > > answered
> > > > > >> > > > > > explicitly. Could you help with the following
> questions?
> > > > > >> > > > > >
> > > > > >> > > > > > 1) It is not very clear what is the actual benefit of
> > > > KIP-291
> > > > > to
> > > > > >> > > users.
> > > > > >> > > > > The
> > > > > >> > > > > > experiment setup in the Google doc simulates the
> > scenario
> > > > that
> > > > > >> > broker
> > > > > >> > > > is
> > > > > >> > > > > > very slow handling ProduceRequest due to e.g. slow
> disk.
> > > It
> > > > > >> > currently
> > > > > >> > > > > > assumes that there is only 1 partition. But in the
> > common
> > > > > >> scenario,
> > > > > >> > > it
> > > > > >> > > > is
> > > > > >> > > > > > probably reasonable to assume that there are many
> other
> > > > > >> partitions
> > > > > >> > > that
> > > > > >> > > > > are
> > > > > >> > > > > > also actively produced to and ProduceRequest to these
> > > > > partition
> > > > > >> > also
> > > > > >> > > > > takes
> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if broker0 can
> > > > become
> > > > > >> > > follower
> > > > > >> > > > > for
> > > > > >> > > > > > the partition 0 soon, it probably still needs to
> process
> > > the
> > > > > >> > > > > ProduceRequest
> > > > > >> > > > > > slowly t in the queue because these ProduceRequests
> > cover
> > > > > other
> > > > > >> > > > > partitions.
> > > > > >> > > > > > Thus most ProduceRequest will still timeout after 30
> > > seconds
> > > > > and
> > > > > >> > most
> > > > > >> > > > > > clients will still likely timeout after 30 seconds.
> Then
> > > it
> > > > is
> > > > > >> not
> > > > > >> > > > > > obviously what is the benefit to client since client
> > will
> > > > > >> timeout
> > > > > >> > > after
> > > > > >> > > > > 30
> > > > > >> > > > > > seconds before possibly re-connecting to broker1, with
> > or
> > > > > >> without
> > > > > >> > > > > KIP-291.
> > > > > >> > > > > > Did I miss something here?
> > > > > >> > > > > >
> > > > > >> > > > > > 2) I guess Eno's is asking for the specific benefits
> of
> > > this
> > > > > >> KIP to
> > > > > >> > > > user
> > > > > >> > > > > or
> > > > > >> > > > > > system administrator, e.g. whether this KIP decreases
> > > > average
> > > > > >> > > latency,
> > > > > >> > > > > > 999th percentile latency, probably of exception
> exposed
> > to
> > > > > >> client
> > > > > >> > > etc.
> > > > > >> > > > It
> > > > > >> > > > > > is probably useful to clarify this.
> > > > > >> > > > > >
> > > > > >> > > > > > 3) Does this KIP help improve user experience only
> when
> > > > there
> > > > > is
> > > > > >> > > issue
> > > > > >> > > > > with
> > > > > >> > > > > > broker, e.g. significant backlog in the request queue
> > due
> > > to
> > > > > >> slow
> > > > > >> > > disk
> > > > > >> > > > as
> > > > > >> > > > > > described in the Google doc? Or is this KIP also
> useful
> > > when
> > > > > >> there
> > > > > >> > is
> > > > > >> > > > no
> > > > > >> > > > > > ongoing issue in the cluster? It might be helpful to
> > > clarify
> > > > > >> this
> > > > > >> > to
> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks much,
> > > > > >> > > > > > Dong
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> > > > > >> [email protected]
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > Hi Eno,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Sorry for the delay in getting the experiment
> results.
> > > > > >> > > > > > > Here is a link to the positive impact achieved by
> > > > > implementing
> > > > > >> > the
> > > > > >> > > > > > proposed
> > > > > >> > > > > > > change:
> > > > > >> > > > > > > https://docs.google.com/document/d/
> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > >> > > > > > > Please take a look when you have time and let me
> know
> > > your
> > > > > >> > > feedback.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Regards,
> > > > > >> > > > > > > Lucas
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> > > [email protected]>
> > > > > >> wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Thanks for the pointer. Will take a look might
> suit
> > > our
> > > > > >> > > > requirements
> > > > > >> > > > > > > > better.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thanks,
> > > > > >> > > > > > > > Harsha
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> > > > > >> > > > [email protected]
> > > > > >> > > > > >
> > > > > >> > > > > > > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Hi Harsha,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > If I understand correctly, the replication quota
> > > > > mechanism
> > > > > >> > > > proposed
> > > > > >> > > > > > in
> > > > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> > > > > >> > > > > > > > > Have you tried it out?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > > > > > Lucas
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> > > > > [email protected]
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > Hi Lucas,
> > > > > >> > > > > > > > > > One more question, any thoughts on making this
> > > > > >> configurable
> > > > > >> > > > > > > > > > and also allowing subset of data requests to
> be
> > > > > >> > prioritized.
> > > > > >> > > > For
> > > > > >> > > > > > > > example
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > ,we notice in our cluster when we take out a
> > > broker
> > > > > and
> > > > > >> > bring
> > > > > >> > > > new
> > > > > >> > > > > > one
> > > > > >> > > > > > > > it
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > will try to become follower and have lot of
> > fetch
> > > > > >> requests
> > > > > >> > to
> > > > > >> > > > > other
> > > > > >> > > > > > > > > leaders
> > > > > >> > > > > > > > > > in clusters. This will negatively effect the
> > > > > >> > > application/client
> > > > > >> > > > > > > > > requests.
> > > > > >> > > > > > > > > > We are also exploring the similar solution to
> > > > > >> de-prioritize
> > > > > >> > > if
> > > > > >> > > > a
> > > > > >> > > > > > new
> > > > > >> > > > > > > > > > replica comes in for fetch requests, we are ok
> > > with
> > > > > the
> > > > > >> > > replica
> > > > > >> > > > > to
> > > > > >> > > > > > be
> > > > > >> > > > > > > > > > taking time but the leaders should prioritize
> > the
> > > > > client
> > > > > >> > > > > requests.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Thanks,
> > > > > >> > > > > > > > > > Harsha
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang
> > > wrote:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Hi Eno,
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Sorry for the delayed response.
> > > > > >> > > > > > > > > > > - I haven't implemented the feature yet, so
> no
> > > > > >> > experimental
> > > > > >> > > > > > results
> > > > > >> > > > > > > > so
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > far.
> > > > > >> > > > > > > > > > > And I plan to test in out in the following
> > days.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > - You are absolutely right that the priority
> > > queue
> > > > > >> does
> > > > > >> > not
> > > > > >> > > > > > > > completely
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > prevent
> > > > > >> > > > > > > > > > > data requests being processed ahead of
> > > controller
> > > > > >> > requests.
> > > > > >> > > > > > > > > > > That being said, I expect it to greatly
> > mitigate
> > > > the
> > > > > >> > effect
> > > > > >> > > > of
> > > > > >> > > > > > > stable
> > > > > >> > > > > > > > > > > metadata.
> > > > > >> > > > > > > > > > > In any case, I'll try it out and post the
> > > results
> > > > > >> when I
> > > > > >> > > have
> > > > > >> > > > > it.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Regards,
> > > > > >> > > > > > > > > > > Lucas
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno
> Thereska
> > <
> > > > > >> > > > > > > > [email protected]
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a look at
> > this.
> > > A
> > > > > >> couple
> > > > > >> > of
> > > > > >> > > > > > > > questions:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > > - did you notice any positive change after
> > > > > >> implementing
> > > > > >> > > > this
> > > > > >> > > > > > KIP?
> > > > > >> > > > > > > > > I'm
> > > > > >> > > > > > > > > > > > wondering if you have any experimental
> > results
> > > > > that
> > > > > >> > show
> > > > > >> > > > the
> > > > > >> > > > > > > > benefit
> > > > > >> > > > > > > > > of
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > two queues.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > - priority is usually not sufficient in
> > > > addressing
> > > > > >> the
> > > > > >> > > > > problem
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > KIP
> > > > > >> > > > > > > > > > > > identifies. Even with priority queues, you
> > > will
> > > > > >> > sometimes
> > > > > >> > > > > > > (often?)
> > > > > >> > > > > > > > > have
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > case that data plane requests will be
> ahead
> > of
> > > > the
> > > > > >> > > control
> > > > > >> > > > > > plane
> > > > > >> > > > > > > > > > > requests.
> > > > > >> > > > > > > > > > > > This happens because the system might have
> > > > already
> > > > > >> > > started
> > > > > >> > > > > > > > > processing
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > data plane requests before the control
> plane
> > > > ones
> > > > > >> > > arrived.
> > > > > >> > > > So
> > > > > >> > > > > > it
> > > > > >> > > > > > > > > would
> > > > > >> > > > > > > > > > > be
> > > > > >> > > > > > > > > > > > good to know what % of the problem this
> KIP
> > > > > >> addresses.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Thanks
> > > > > >> > > > > > > > > > > > Eno
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > > > > >> > > > > [email protected]
> > > > > >> > > > > > >
> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > Thanks
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas
> > Wang
> > > <
> > > > > >> > > > > > > > [email protected]
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've
> updated
> > > the
> > > > > KIP.
> > > > > >> > > Please
> > > > > >> > > > > > take
> > > > > >> > > > > > > > > > another
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > look.
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > Lucas
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted
> Yu
> > <
> > > > > >> > > > > > > [email protected]
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > It would be good if you can include
> > the
> > > > > >> default
> > > > > >> > > value
> > > > > >> > > > > for
> > > > > >> > > > > > > > this
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > new
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > config
> > > > > >> > > > > > > > > > > > > > > in the KIP.
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > Thanks
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM,
> Lucas
> > > > Wang
> > > > > <
> > > > > >> > > > > > > > > > [email protected]
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP by adding a
> new
> > > > > config,
> > > > > >> > > > instead
> > > > > >> > > > > of
> > > > > >> > > > > > > > > reusing
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > existing one.
> > > > > >> > > > > > > > > > > > > > > > Please take another look when you
> > have
> > > > > time.
> > > > > >> > > > Thanks a
> > > > > >> > > > > > > lot!
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > Lucas
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM,
> Ted
> > > Yu
> > > > <
> > > > > >> > > > > > > > [email protected]
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of resource
> if
> > > > > control
> > > > > >> > > request
> > > > > >> > > > > > rate
> > > > > >> > > > > > > is
> > > > > >> > > > > > > > > low
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > I don't know if control request
> > rate
> > > > can
> > > > > >> get
> > > > > >> > to
> > > > > >> > > > > > > 100,000,
> > > > > >> > > > > > > > > > > likely
> > > > > >> > > > > > > > > > > > > not.
> > > > > >> > > > > > > > > > > > > > > Then
> > > > > >> > > > > > > > > > > > > > > > > using the same bound as that for
> > > data
> > > > > >> > requests
> > > > > >> > > > > seems
> > > > > >> > > > > > > > high.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13
> PM,
> > > > Lucas
> > > > > >> Wang
> > > > > >> > <
> > > > > >> > > > > > > > > > > > > [email protected] >
> > > > > >> > > > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at
> this
> > > > KIP.
> > > > > >> > > > > > > > > > > > > > > > > > Let's say today the setting of
> > > > > >> > > > > > "queued.max.requests"
> > > > > >> > > > > > > in
> > > > > >> > > > > > > > > > > > cluster A
> > > > > >> > > > > > > > > > > > > > is
> > > > > >> > > > > > > > > > > > > > > > > 1000,
> > > > > >> > > > > > > > > > > > > > > > > > while the setting in cluster B
> > is
> > > > > >> 100,000.
> > > > > >> > > > > > > > > > > > > > > > > > The 100 times difference might
> > > have
> > > > > >> > indicated
> > > > > >> > > > > that
> > > > > >> > > > > > > > > machines
> > > > > >> > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > > cluster
> > > > > >> > > > > > > > > > > > > > > > B
> > > > > >> > > > > > > > > > > > > > > > > > have larger memory.
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
> > > > "queued.max.requests",
> > > > > >> the
> > > > > >> > > > > > > > > > > controlRequestQueue
> > > > > >> > > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > > > cluster
> > > > > >> > > > > > > > > > > > > > > > > B
> > > > > >> > > > > > > > > > > > > > > > > > automatically
> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
> > > > > explicitly
> > > > > >> > > > bothering
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > > > > operators.
> > > > > >> > > > > > > > > > > > > > > > > > I understand the counter
> > argument
> > > > can
> > > > > be
> > > > > >> > that
> > > > > >> > > > > maybe
> > > > > >> > > > > > > > > that's
> > > > > >> > > > > > > > > > a
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > waste
> > > > > >> > > > > > > > > > > > > > of
> > > > > >> > > > > > > > > > > > > > > > > > resource if control request
> > > > > >> > > > > > > > > > > > > > > > > > rate is low and operators may
> > want
> > > > to
> > > > > >> fine
> > > > > >> > > tune
> > > > > >> > > > > the
> > > > > >> > > > > > > > > > capacity
> > > > > >> > > > > > > > > > > of
> > > > > >> > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with either approach,
> and
> > > can
> > > > > >> change
> > > > > >> > > it
> > > > > >> > > > if
> > > > > >> > > > > > you
> > > > > >> > > > > > > > or
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > anyone
> > > > > >> > > > > > > > > > > > > > else
> > > > > >> > > > > > > > > > > > > > > > > feels
> > > > > >> > > > > > > > > > > > > > > > > > strong about adding the extra
> > > > config.
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > > > >> > > > > > > > > > > > > > > > > > Lucas
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11
> PM,
> > > Ted
> > > > > Yu
> > > > > >> <
> > > > > >> > > > > > > > > > [email protected]
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives,
> > #2,
> > > > can
> > > > > >> you
> > > > > >> > > > > > elaborate
> > > > > >> > > > > > > a
> > > > > >> > > > > > > > > bit
> > > > > >> > > > > > > > > > > more
> > > > > >> > > > > > > > > > > > > on
> > > > > >> > > > > > > > > > > > > > > why
> > > > > >> > > > > > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > > > > separate config has bigger
> > > impact
> > > > ?
> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00
> > PM,
> > > > > Dong
> > > > > >> > Lin <
> > > > > >> > > > > > > > > > > > [email protected]
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks
> > good
> > > > > >> overall.
> > > > > >> > > > Some
> > > > > >> > > > > > > > > comments
> > > > > >> > > > > > > > > > > > below:
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually specify the
> > full
> > > > > mbean
> > > > > >> for
> > > > > >> > > the
> > > > > >> > > > > new
> > > > > >> > > > > > > > > metrics
> > > > > >> > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > KIP.
> > > > > >> > > > > > > > > > > > > > > > > Can
> > > > > >> > > > > > > > > > > > > > > > > > > you
> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in the Public
> > > > Interface
> > > > > >> > > section
> > > > > >> > > > > > > similar
> > > > > >> > > > > > > > > to
> > > > > >> > > > > > > > > > > > KIP-237
> > > > > >> > > > > > > > > > > > > > > > > > > > <
> https://cwiki.apache.org/
> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > 237%3A+More+Controller+Health+
> > > > > >> Metrics>
> > > > > >> > > > > > > > > > > > > > > > > > > > ?
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow
> the
> > > same
> > > > > >> > pattern
> > > > > >> > > as
> > > > > >> > > > > > > KIP-153
> > > > > >> > > > > > > > > > > > > > > > > > > > <
> https://cwiki.apache.org/
> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > >
> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > >> > > > > > > > > > > > > metric>,
> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep the existing
> > > > sensor
> > > > > >> name
> > > > > >> > > > > > > > > "BytesInPerSec"
> > > > > >> > > > > > > > > > > and
> > > > > >> > > > > > > > > > > > > add
> > > > > >> > > > > > > > > > > > > > a
> > > > > >> > > > > > > > > > > > > > > > new
> > > > > >> > > > > > > > > > > > > > > > > > > sensor
> > > > > >> > > > > > > > > > > > > > > > > > > >
> "ReplicationBytesInPerSec",
> > > > rather
> > > > > >> than
> > > > > >> > > > > > replacing
> > > > > >> > > > > > > > > the
> > > > > >> > > > > > > > > > > > sensor
> > > > > >> > > > > > > > > > > > > > > name "
> > > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > > > > >> > > > > "ClientBytesInPerSec".
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP
> > > changes
> > > > > the
> > > > > >> > > > semantics
> > > > > >> > > > > > of
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > broker
> > > > > >> > > > > > > > > > > > > > > config
> > > > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests"
> > because
> > > > the
> > > > > >> > number
> > > > > >> > > of
> > > > > >> > > > > > total
> > > > > >> > > > > > > > > > > requests
> > > > > >> > > > > > > > > > > > > > queued
> > > > > >> > > > > > > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > > > > > broker will be no longer
> > > bounded
> > > > > by
> > > > > >> > > > > > > > > > > "queued.max.requests".
> > > > > >> > > > > > > > > > > > > This
> > > > > >> > > > > > > > > > > > > > > > > > probably
> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be specified in
> the
> > > > > Public
> > > > > >> > > > > Interfaces
> > > > > >> > > > > > > > > section
> > > > > >> > > > > > > > > > > for
> > > > > >> > > > > > > > > > > > > > > > > discussion.
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> > 12:45
> > > > PM,
> > > > > >> Lucas
> > > > > >> > > > Wang
> > > > > >> > > > > <
> > > > > >> > > > > > > > > > > > > > > > [email protected] >
> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to
> add a
> > > > > >> separate
> > > > > >> > > queue
> > > > > >> > > > > for
> > > > > >> > > > > > > > > > > controller
> > > > > >> > > > > > > > > > > > > > > > requests:
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> https://cwiki.apache.org/
> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > 291%
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > 3A+Have+separate+queues+for+
> > > > > >> > > > > > > > > > control+requests+and+data+
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > requests
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you please take a
> look
> > > and
> > > > > >> let me
> > > > > >> > > > know
> > > > > >> > > > > > your
> > > > > >> > > > > > > > > > > feedback?
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your
> > time!
> > > > > >> > > > > > > > > > > > > > > > > > > > > Regards,
> > > > > >> > > > > > > > > > > > > > > > > > > > > Lucas
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Reply via email to