Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Joel Koshy Tue, 17 Jul 2018 10:29:24 -0700

Hey Becket - good point. Lucas and I were talking about this offline last
week. It is true that there is only one request in flight for processing.
However, there may be more during a controller failover but it should not
be very high - basically the maximum number of controller failures that can
occur whilst handling any controller request.


This is in fact the more significant issue that may not have been fully
captured in the KIP motivation. i.e., right now, the server processes one
request at a time. Assuming there is moderate to heavy load on the broker,
while it is handling a controller request, it will accumulate a deluge of
regular client requests that will enter the request queue. After the sole
controller request is handled it will read the next controller request into
the request queue. So we end up with a single controller request, then a
potentially large number of regular requests, then a single controller
request, then a large number of regular requests and so on. This can become
especially problematic when you have many small controller requests (say if
you are steadily moving a few partitions at a time) spread over a short
span of time). With the prioritized queue this changes to: handle a
controller request, handle a vector of regular requests (where the vector
size is the number of request handler threads), handle the next controller
request, and so on. The maximum time between handling adjacent controller
requests will be within (*min(local time) of the vector of regular requests*).
So it helps significantly. We also considered the possibility of NOT
muting/unmuting the controller socket to help address this. This would also
mean we would need to pin the handling of all controller requests to one
specific request handler thread in order to ensure order. That change is
probably not worth the effort and we expect the current proposal to be
adequate.

Thanks,

Joel

On Tue, Jul 17, 2018 at 5:06 AM, Becket Qin <[email protected]> wrote:

> Thanks for the KIP, Lucas. Separating the control plane from the data plane
> makes a lot of sense.
>
> In the KIP you mentioned that the controller request queue may have many
> requests in it. Will this be a common case? The controller requests still
> goes through the SocketServer. The SocketServer will mute the channel once
> a request is read and put into the request channel. So assuming there is
> only one connection between controller and each broker, on the broker side,
> there should be only one controller request in the controller request queue
> at any given time. If that is the case, do we need a separate controller
> request queue capacity config? The default value 20 means that we expect
> there are 20 controller switches to happen in a short period of time. I am
> not sure whether someone should increase the controller request queue
> capacity to handle such case, as it seems indicating something very wrong
> has happened.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <[email protected]> wrote:
>
> > Thanks for the update Lucas.
> >
> > I think the motivation section is intuitive. It will be good to learn
> more
> > about the comments from other reviewers.
> >
> > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <[email protected]>
> wrote:
> >
> > > Hi Dong,
> > >
> > > I've updated the motivation section of the KIP by explaining the cases
> > that
> > > would have user impacts.
> > > Please take a look at let me know your comments.
> > >
> > > Thanks,
> > > Lucas
> > >
> > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <[email protected]>
> > wrote:
> > >
> > > > Hi Dong,
> > > >
> > > > The simulation of disk being slow is merely for me to easily
> construct
> > a
> > > > testing scenario
> > > > with a backlog of produce requests. In production, other than the
> disk
> > > > being slow, a backlog of
> > > > produce requests may also be caused by high produce QPS.
> > > > In that case, we may not want to kill the broker and that's when this
> > KIP
> > > > can be useful, both for JBOD
> > > > and non-JBOD setup.
> > > >
> > > > Going back to your previous question about each ProduceRequest
> covering
> > > 20
> > > > partitions that are randomly
> > > > distributed, let's say a LeaderAndIsr request is enqueued that tries
> to
> > > > switch the current broker, say broker0, from leader to follower
> > > > *for one of the partitions*, say *test-0*. For the sake of argument,
> > > > let's also assume the other brokers, say broker1, have *stopped*
> > fetching
> > > > from
> > > > the current broker, i.e. broker0.
> > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
> > > >   1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR
> will
> > be
> > > > put into the purgatory,
> > > >         and since they'll never be replicated to other brokers
> (because
> > > of
> > > > the assumption made above), they will
> > > >         be completed either when the LeaderAndISR request is
> processed
> > or
> > > > when the timeout happens.
> > > >   1.2 With this KIP, broker0 will immediately transition the
> partition
> > > > test-0 to become a follower,
> > > >         after the current broker sees the replication of the
> remaining
> > 19
> > > > partitions, it can send a response indicating that
> > > >         it's no longer the leader for the "test-0".
> > > >   To see the latency difference between 1.1 and 1.2, let's say there
> > are
> > > > 24K produce requests ahead of the LeaderAndISR, and there are 8 io
> > > threads,
> > > >   so each io thread will process approximately 3000 produce requests.
> > Now
> > > > let's investigate the io thread that finally processed the
> > LeaderAndISR.
> > > >   For the 3000 produce requests, if we model the time when their
> > > remaining
> > > > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR
> > request
> > > is
> > > > processed at time t3000.
> > > >   Without this KIP, the 1st produce request would have waited an
> extra
> > > > t3000 - t0 time in the purgatory, the 2nd an extra time of t3000 -
> t1,
> > > etc.
> > > >   Roughly speaking, the latency difference is bigger for the earlier
> > > > produce requests than for the later ones. For the same reason, the
> more
> > > > ProduceRequests queued
> > > >   before the LeaderAndISR, the bigger benefit we get (capped by the
> > > > produce timeout).
> > > > 2. If the enqueued produce requests have acks=0 or acks=1
> > > >   There will be no latency differences in this case, but
> > > >   2.1 without this KIP, the records of partition test-0 in the
> > > > ProduceRequests ahead of the LeaderAndISR will be appended to the
> local
> > > log,
> > > >         and eventually be truncated after processing the
> LeaderAndISR.
> > > > This is what's referred to as
> > > >         "some unofficial definition of data loss in terms of messages
> > > > beyond the high watermark".
> > > >   2.2 with this KIP, we can mitigate the effect since if the
> > LeaderAndISR
> > > > is immediately processed, the response to producers will have
> > > >         the NotLeaderForPartition error, causing producers to retry
> > > >
> > > > This explanation above is the benefit for reducing the latency of a
> > > broker
> > > > becoming the follower,
> > > > closely related is reducing the latency of a broker becoming the
> > leader.
> > > > In this case, the benefit is even more obvious, if other brokers have
> > > > resigned leadership, and the
> > > > current broker should take leadership. Any delay in processing the
> > > > LeaderAndISR will be perceived
> > > > by clients as unavailability. In extreme cases, this can cause failed
> > > > produce requests if the retries are
> > > > exhausted.
> > > >
> > > > Another two types of controller requests are UpdateMetadata and
> > > > StopReplica, which I'll briefly discuss as follows:
> > > > For UpdateMetadata requests, delayed processing means clients
> receiving
> > > > stale metadata, e.g. with the wrong leadership info
> > > > for certain partitions, and the effect is more retries or even fatal
> > > > failure if the retries are exhausted.
> > > >
> > > > For StopReplica requests, a long queuing time may degrade the
> > performance
> > > > of topic deletion.
> > > >
> > > > Regarding your last question of the delay for DescribeLogDirsRequest,
> > you
> > > > are right
> > > > that this KIP cannot help with the latency in getting the log dirs
> > info,
> > > > and it's only relevant
> > > > when controller requests are involved.
> > > >
> > > > Regards,
> > > > Lucas
> > > >
> > > >
> > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <[email protected]>
> wrote:
> > > >
> > > >> Hey Jun,
> > > >>
> > > >> Thanks much for the comments. It is good point. So the feature may
> be
> > > >> useful for JBOD use-case. I have one question below.
> > > >>
> > > >> Hey Lucas,
> > > >>
> > > >> Do you think this feature is also useful for non-JBOD setup or it is
> > > only
> > > >> useful for the JBOD setup? It may be useful to understand this.
> > > >>
> > > >> When the broker is setup using JBOD, in order to move leaders on the
> > > >> failed
> > > >> disk to other disks, the system operator first needs to get the list
> > of
> > > >> partitions on the failed disk. This is currently achieved using
> > > >> AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest to
> > the
> > > >> broker. If we only prioritize the controller requests, then the
> > > >> DescribeLogDirsRequest
> > > >> may still take a long time to be processed by the broker. So the
> > overall
> > > >> time to move leaders away from the failed disk may still be long
> even
> > > with
> > > >> this KIP. What do you think?
> > > >>
> > > >> Thanks,
> > > >> Dong
> > > >>
> > > >>
> > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <[email protected]>
> > > wrote:
> > > >>
> > > >> > Thanks for the insightful comment, Jun.
> > > >> >
> > > >> > @Dong,
> > > >> > Since both of the two comments in your previous email are about
> the
> > > >> > benefits of this KIP and whether it's useful,
> > > >> > in light of Jun's last comment, do you agree that this KIP can be
> > > >> > beneficial in the case mentioned by Jun?
> > > >> > Please let me know, thanks!
> > > >> >
> > > >> > Regards,
> > > >> > Lucas
> > > >> >
> > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <[email protected]> wrote:
> > > >> >
> > > >> > > Hi, Lucas, Dong,
> > > >> > >
> > > >> > > If all disks on a broker are slow, one probably should just kill
> > the
> > > >> > > broker. In that case, this KIP may not help. If only one of the
> > > disks
> > > >> on
> > > >> > a
> > > >> > > broker is slow, one may want to fail that disk and move the
> > leaders
> > > on
> > > >> > that
> > > >> > > disk to other brokers. In that case, being able to process the
> > > >> > LeaderAndIsr
> > > >> > > requests faster will potentially help the producers recover
> > quicker.
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Jun
> > > >> > >
> > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <[email protected]>
> > > wrote:
> > > >> > >
> > > >> > > > Hey Lucas,
> > > >> > > >
> > > >> > > > Thanks for the reply. Some follow up questions below.
> > > >> > > >
> > > >> > > > Regarding 1, if each ProduceRequest covers 20 partitions that
> > are
> > > >> > > randomly
> > > >> > > > distributed across all partitions, then each ProduceRequest
> will
> > > >> likely
> > > >> > > > cover some partitions for which the broker is still leader
> after
> > > it
> > > >> > > quickly
> > > >> > > > processes the
> > > >> > > > LeaderAndIsrRequest. Then broker will still be slow in
> > processing
> > > >> these
> > > >> > > > ProduceRequest and request will still be very high with this
> > KIP.
> > > It
> > > >> > > seems
> > > >> > > > that most ProduceRequest will still timeout after 30 seconds.
> Is
> > > >> this
> > > >> > > > understanding correct?
> > > >> > > >
> > > >> > > > Regarding 2, if most ProduceRequest will still timeout after
> 30
> > > >> > seconds,
> > > >> > > > then it is less clear how this KIP reduces average produce
> > > latency.
> > > >> Can
> > > >> > > you
> > > >> > > > clarify what metrics can be improved by this KIP?
> > > >> > > >
> > > >> > > > Not sure why system operator directly cares number of
> truncated
> > > >> > messages.
> > > >> > > > Do you mean this KIP can improve average throughput or reduce
> > > >> message
> > > >> > > > duplication? It will be good to understand this.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Dong
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> [email protected]
> > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > Hi Dong,
> > > >> > > > >
> > > >> > > > > Thanks for your valuable comments. Please see my reply
> below.
> > > >> > > > >
> > > >> > > > > 1. The Google doc showed only 1 partition. Now let's
> consider
> > a
> > > >> more
> > > >> > > > common
> > > >> > > > > scenario
> > > >> > > > > where broker0 is the leader of many partitions. And let's
> say
> > > for
> > > >> > some
> > > >> > > > > reason its IO becomes slow.
> > > >> > > > > The number of leader partitions on broker0 is so large, say
> > 10K,
> > > >> that
> > > >> > > the
> > > >> > > > > cluster is skewed,
> > > >> > > > > and the operator would like to shift the leadership for a
> lot
> > of
> > > >> > > > > partitions, say 9K, to other brokers,
> > > >> > > > > either manually or through some service like cruise control.
> > > >> > > > > With this KIP, not only will the leadership transitions
> finish
> > > >> more
> > > >> > > > > quickly, helping the cluster itself becoming more balanced,
> > > >> > > > > but all existing producers corresponding to the 9K
> partitions
> > > will
> > > >> > get
> > > >> > > > the
> > > >> > > > > errors relatively quickly
> > > >> > > > > rather than relying on their timeout, thanks to the batched
> > > async
> > > >> ZK
> > > >> > > > > operations.
> > > >> > > > > To me it's a useful feature to have during such troublesome
> > > times.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > 2. The experiments in the Google Doc have shown that with
> this
> > > KIP
> > > >> > many
> > > >> > > > > producers
> > > >> > > > > receive an explicit error NotLeaderForPartition, based on
> > which
> > > >> they
> > > >> > > > retry
> > > >> > > > > immediately.
> > > >> > > > > Therefore the latency (~14 seconds+quick retry) for their
> > single
> > > >> > > message
> > > >> > > > is
> > > >> > > > > much smaller
> > > >> > > > > compared with the case of timing out without the KIP (30
> > seconds
> > > >> for
> > > >> > > > timing
> > > >> > > > > out + quick retry).
> > > >> > > > > One might argue that reducing the timing out on the producer
> > > side
> > > >> can
> > > >> > > > > achieve the same result,
> > > >> > > > > yet reducing the timeout has its own drawbacks[1].
> > > >> > > > >
> > > >> > > > > Also *IF* there were a metric to show the number of
> truncated
> > > >> > messages
> > > >> > > on
> > > >> > > > > brokers,
> > > >> > > > > with the experiments done in the Google Doc, it should be
> easy
> > > to
> > > >> see
> > > >> > > > that
> > > >> > > > > a lot fewer messages need
> > > >> > > > > to be truncated on broker0 since the up-to-date metadata
> > avoids
> > > >> > > appending
> > > >> > > > > of messages
> > > >> > > > > in subsequent PRODUCE requests. If we talk to a system
> > operator
> > > >> and
> > > >> > ask
> > > >> > > > > whether
> > > >> > > > > they prefer fewer wasteful IOs, I bet most likely the answer
> > is
> > > >> yes.
> > > >> > > > >
> > > >> > > > > 3. To answer your question, I think it might be helpful to
> > > >> construct
> > > >> > > some
> > > >> > > > > formulas.
> > > >> > > > > To simplify the modeling, I'm going back to the case where
> > there
> > > >> is
> > > >> > > only
> > > >> > > > > ONE partition involved.
> > > >> > > > > Following the experiments in the Google Doc, let's say
> broker0
> > > >> > becomes
> > > >> > > > the
> > > >> > > > > follower at time t0,
> > > >> > > > > and after t0 there were still N produce requests in its
> > request
> > > >> > queue.
> > > >> > > > > With the up-to-date metadata brought by this KIP, broker0
> can
> > > >> reply
> > > >> > > with
> > > >> > > > an
> > > >> > > > > NotLeaderForPartition exception,
> > > >> > > > > let's use M1 to denote the average processing time of
> replying
> > > >> with
> > > >> > > such
> > > >> > > > an
> > > >> > > > > error message.
> > > >> > > > > Without this KIP, the broker will need to append messages to
> > > >> > segments,
> > > >> > > > > which may trigger a flush to disk,
> > > >> > > > > let's use M2 to denote the average processing time for such
> > > logic.
> > > >> > > > > Then the average extra latency incurred without this KIP is
> N
> > *
> > > >> (M2 -
> > > >> > > > M1) /
> > > >> > > > > 2.
> > > >> > > > >
> > > >> > > > > In practice, M2 should always be larger than M1, which means
> > as
> > > >> long
> > > >> > > as N
> > > >> > > > > is positive,
> > > >> > > > > we would see improvements on the average latency.
> > > >> > > > > There does not need to be significant backlog of requests in
> > the
> > > >> > > request
> > > >> > > > > queue,
> > > >> > > > > or severe degradation of disk performance to have the
> > > improvement.
> > > >> > > > >
> > > >> > > > > Regards,
> > > >> > > > > Lucas
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > [1] For instance, reducing the timeout on the producer side
> > can
> > > >> > trigger
> > > >> > > > > unnecessary duplicate requests
> > > >> > > > > when the corresponding leader broker is overloaded,
> > exacerbating
> > > >> the
> > > >> > > > > situation.
> > > >> > > > >
> > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> [email protected]
> > >
> > > >> > wrote:
> > > >> > > > >
> > > >> > > > > > Hey Lucas,
> > > >> > > > > >
> > > >> > > > > > Thanks much for the detailed documentation of the
> > experiment.
> > > >> > > > > >
> > > >> > > > > > Initially I also think having a separate queue for
> > controller
> > > >> > > requests
> > > >> > > > is
> > > >> > > > > > useful because, as you mentioned in the summary section of
> > the
> > > >> > Google
> > > >> > > > > doc,
> > > >> > > > > > controller requests are generally more important than data
> > > >> requests
> > > >> > > and
> > > >> > > > > we
> > > >> > > > > > probably want controller requests to be processed sooner.
> > But
> > > >> then
> > > >> > > Eno
> > > >> > > > > has
> > > >> > > > > > two very good questions which I am not sure the Google doc
> > has
> > > >> > > answered
> > > >> > > > > > explicitly. Could you help with the following questions?
> > > >> > > > > >
> > > >> > > > > > 1) It is not very clear what is the actual benefit of
> > KIP-291
> > > to
> > > >> > > users.
> > > >> > > > > The
> > > >> > > > > > experiment setup in the Google doc simulates the scenario
> > that
> > > >> > broker
> > > >> > > > is
> > > >> > > > > > very slow handling ProduceRequest due to e.g. slow disk.
> It
> > > >> > currently
> > > >> > > > > > assumes that there is only 1 partition. But in the common
> > > >> scenario,
> > > >> > > it
> > > >> > > > is
> > > >> > > > > > probably reasonable to assume that there are many other
> > > >> partitions
> > > >> > > that
> > > >> > > > > are
> > > >> > > > > > also actively produced to and ProduceRequest to these
> > > partition
> > > >> > also
> > > >> > > > > takes
> > > >> > > > > > e.g. 2 seconds to be processed. So even if broker0 can
> > become
> > > >> > > follower
> > > >> > > > > for
> > > >> > > > > > the partition 0 soon, it probably still needs to process
> the
> > > >> > > > > ProduceRequest
> > > >> > > > > > slowly t in the queue because these ProduceRequests cover
> > > other
> > > >> > > > > partitions.
> > > >> > > > > > Thus most ProduceRequest will still timeout after 30
> seconds
> > > and
> > > >> > most
> > > >> > > > > > clients will still likely timeout after 30 seconds. Then
> it
> > is
> > > >> not
> > > >> > > > > > obviously what is the benefit to client since client will
> > > >> timeout
> > > >> > > after
> > > >> > > > > 30
> > > >> > > > > > seconds before possibly re-connecting to broker1, with or
> > > >> without
> > > >> > > > > KIP-291.
> > > >> > > > > > Did I miss something here?
> > > >> > > > > >
> > > >> > > > > > 2) I guess Eno's is asking for the specific benefits of
> this
> > > >> KIP to
> > > >> > > > user
> > > >> > > > > or
> > > >> > > > > > system administrator, e.g. whether this KIP decreases
> > average
> > > >> > > latency,
> > > >> > > > > > 999th percentile latency, probably of exception exposed to
> > > >> client
> > > >> > > etc.
> > > >> > > > It
> > > >> > > > > > is probably useful to clarify this.
> > > >> > > > > >
> > > >> > > > > > 3) Does this KIP help improve user experience only when
> > there
> > > is
> > > >> > > issue
> > > >> > > > > with
> > > >> > > > > > broker, e.g. significant backlog in the request queue due
> to
> > > >> slow
> > > >> > > disk
> > > >> > > > as
> > > >> > > > > > described in the Google doc? Or is this KIP also useful
> when
> > > >> there
> > > >> > is
> > > >> > > > no
> > > >> > > > > > ongoing issue in the cluster? It might be helpful to
> clarify
> > > >> this
> > > >> > to
> > > >> > > > > > understand the benefit of this KIP.
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > Thanks much,
> > > >> > > > > > Dong
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> > > >> [email protected]
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Eno,
> > > >> > > > > > >
> > > >> > > > > > > Sorry for the delay in getting the experiment results.
> > > >> > > > > > > Here is a link to the positive impact achieved by
> > > implementing
> > > >> > the
> > > >> > > > > > proposed
> > > >> > > > > > > change:
> > > >> > > > > > > https://docs.google.com/document/d/
> > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > >> > > > > > > Please take a look when you have time and let me know
> your
> > > >> > > feedback.
> > > >> > > > > > >
> > > >> > > > > > > Regards,
> > > >> > > > > > > Lucas
> > > >> > > > > > >
> > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> [email protected]>
> > > >> wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Thanks for the pointer. Will take a look might suit
> our
> > > >> > > > requirements
> > > >> > > > > > > > better.
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > > Harsha
> > > >> > > > > > > >
> > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> > > >> > > > [email protected]
> > > >> > > > > >
> > > >> > > > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > Hi Harsha,
> > > >> > > > > > > > >
> > > >> > > > > > > > > If I understand correctly, the replication quota
> > > mechanism
> > > >> > > > proposed
> > > >> > > > > > in
> > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> > > >> > > > > > > > > Have you tried it out?
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks,
> > > >> > > > > > > > > Lucas
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> > > [email protected]
> > > >> >
> > > >> > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Hi Lucas,
> > > >> > > > > > > > > > One more question, any thoughts on making this
> > > >> configurable
> > > >> > > > > > > > > > and also allowing subset of data requests to be
> > > >> > prioritized.
> > > >> > > > For
> > > >> > > > > > > > example
> > > >> > > > > > > > >
> > > >> > > > > > > > > > ,we notice in our cluster when we take out a
> broker
> > > and
> > > >> > bring
> > > >> > > > new
> > > >> > > > > > one
> > > >> > > > > > > > it
> > > >> > > > > > > > >
> > > >> > > > > > > > > > will try to become follower and have lot of fetch
> > > >> requests
> > > >> > to
> > > >> > > > > other
> > > >> > > > > > > > > leaders
> > > >> > > > > > > > > > in clusters. This will negatively effect the
> > > >> > > application/client
> > > >> > > > > > > > > requests.
> > > >> > > > > > > > > > We are also exploring the similar solution to
> > > >> de-prioritize
> > > >> > > if
> > > >> > > > a
> > > >> > > > > > new
> > > >> > > > > > > > > > replica comes in for fetch requests, we are ok
> with
> > > the
> > > >> > > replica
> > > >> > > > > to
> > > >> > > > > > be
> > > >> > > > > > > > > > taking time but the leaders should prioritize the
> > > client
> > > >> > > > > requests.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Thanks,
> > > >> > > > > > > > > > Harsha
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang
> wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Hi Eno,
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Sorry for the delayed response.
> > > >> > > > > > > > > > > - I haven't implemented the feature yet, so no
> > > >> > experimental
> > > >> > > > > > results
> > > >> > > > > > > > so
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > far.
> > > >> > > > > > > > > > > And I plan to test in out in the following days.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > - You are absolutely right that the priority
> queue
> > > >> does
> > > >> > not
> > > >> > > > > > > > completely
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > prevent
> > > >> > > > > > > > > > > data requests being processed ahead of
> controller
> > > >> > requests.
> > > >> > > > > > > > > > > That being said, I expect it to greatly mitigate
> > the
> > > >> > effect
> > > >> > > > of
> > > >> > > > > > > stable
> > > >> > > > > > > > > > > metadata.
> > > >> > > > > > > > > > > In any case, I'll try it out and post the
> results
> > > >> when I
> > > >> > > have
> > > >> > > > > it.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Regards,
> > > >> > > > > > > > > > > Lucas
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > >> > > > > > > > [email protected]
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > Hi Lucas,
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > Sorry for the delay, just had a look at this.
> A
> > > >> couple
> > > >> > of
> > > >> > > > > > > > questions:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > > - did you notice any positive change after
> > > >> implementing
> > > >> > > > this
> > > >> > > > > > KIP?
> > > >> > > > > > > > > I'm
> > > >> > > > > > > > > > > > wondering if you have any experimental results
> > > that
> > > >> > show
> > > >> > > > the
> > > >> > > > > > > > benefit
> > > >> > > > > > > > > of
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > two queues.
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > - priority is usually not sufficient in
> > addressing
> > > >> the
> > > >> > > > > problem
> > > >> > > > > > > the
> > > >> > > > > > > > > KIP
> > > >> > > > > > > > > > > > identifies. Even with priority queues, you
> will
> > > >> > sometimes
> > > >> > > > > > > (often?)
> > > >> > > > > > > > > have
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > case that data plane requests will be ahead of
> > the
> > > >> > > control
> > > >> > > > > > plane
> > > >> > > > > > > > > > > requests.
> > > >> > > > > > > > > > > > This happens because the system might have
> > already
> > > >> > > started
> > > >> > > > > > > > > processing
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > data plane requests before the control plane
> > ones
> > > >> > > arrived.
> > > >> > > > So
> > > >> > > > > > it
> > > >> > > > > > > > > would
> > > >> > > > > > > > > > > be
> > > >> > > > > > > > > > > > good to know what % of the problem this KIP
> > > >> addresses.
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > Eno
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > > >> > > > > [email protected]
> > > >> > > > > > >
> > > >> > > > > > > > > wrote:
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > > Change looks good.
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang
> <
> > > >> > > > > > > > [email protected]
> > > >> > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Hi Ted,
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've updated
> the
> > > KIP.
> > > >> > > Please
> > > >> > > > > > take
> > > >> > > > > > > > > > another
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > look.
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > > >> > > > > > > [email protected]
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > It would be good if you can include the
> > > >> default
> > > >> > > value
> > > >> > > > > for
> > > >> > > > > > > > this
> > > >> > > > > > > > >
> > > >> > > > > > > > > > new
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > config
> > > >> > > > > > > > > > > > > > > in the KIP.
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas
> > Wang
> > > <
> > > >> > > > > > > > > > [email protected]
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > I've updated the KIP by adding a new
> > > config,
> > > >> > > > instead
> > > >> > > > > of
> > > >> > > > > > > > > reusing
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > existing one.
> > > >> > > > > > > > > > > > > > > > Please take another look when you have
> > > time.
> > > >> > > > Thanks a
> > > >> > > > > > > lot!
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted
> Yu
> > <
> > > >> > > > > > > > [email protected]
> > > >> > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > bq. that's a waste of resource if
> > > control
> > > >> > > request
> > > >> > > > > > rate
> > > >> > > > > > > is
> > > >> > > > > > > > > low
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > I don't know if control request rate
> > can
> > > >> get
> > > >> > to
> > > >> > > > > > > 100,000,
> > > >> > > > > > > > > > > likely
> > > >> > > > > > > > > > > > > not.
> > > >> > > > > > > > > > > > > > > Then
> > > >> > > > > > > > > > > > > > > > > using the same bound as that for
> data
> > > >> > requests
> > > >> > > > > seems
> > > >> > > > > > > > high.
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM,
> > Lucas
> > > >> Wang
> > > >> > <
> > > >> > > > > > > > > > > > > [email protected] >
> > > >> > > > > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at this
> > KIP.
> > > >> > > > > > > > > > > > > > > > > > Let's say today the setting of
> > > >> > > > > > "queued.max.requests"
> > > >> > > > > > > in
> > > >> > > > > > > > > > > > cluster A
> > > >> > > > > > > > > > > > > > is
> > > >> > > > > > > > > > > > > > > > > 1000,
> > > >> > > > > > > > > > > > > > > > > > while the setting in cluster B is
> > > >> 100,000.
> > > >> > > > > > > > > > > > > > > > > > The 100 times difference might
> have
> > > >> > indicated
> > > >> > > > > that
> > > >> > > > > > > > > machines
> > > >> > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > > cluster
> > > >> > > > > > > > > > > > > > > > B
> > > >> > > > > > > > > > > > > > > > > > have larger memory.
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > By reusing the
> > "queued.max.requests",
> > > >> the
> > > >> > > > > > > > > > > controlRequestQueue
> > > >> > > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > > > cluster
> > > >> > > > > > > > > > > > > > > > > B
> > > >> > > > > > > > > > > > > > > > > > automatically
> > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
> > > explicitly
> > > >> > > > bothering
> > > >> > > > > > the
> > > >> > > > > > > > > > > > operators.
> > > >> > > > > > > > > > > > > > > > > > I understand the counter argument
> > can
> > > be
> > > >> > that
> > > >> > > > > maybe
> > > >> > > > > > > > > that's
> > > >> > > > > > > > > > a
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > waste
> > > >> > > > > > > > > > > > > > of
> > > >> > > > > > > > > > > > > > > > > > resource if control request
> > > >> > > > > > > > > > > > > > > > > > rate is low and operators may want
> > to
> > > >> fine
> > > >> > > tune
> > > >> > > > > the
> > > >> > > > > > > > > > capacity
> > > >> > > > > > > > > > > of
> > > >> > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > I'm ok with either approach, and
> can
> > > >> change
> > > >> > > it
> > > >> > > > if
> > > >> > > > > > you
> > > >> > > > > > > > or
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > anyone
> > > >> > > > > > > > > > > > > > else
> > > >> > > > > > > > > > > > > > > > > feels
> > > >> > > > > > > > > > > > > > > > > > strong about adding the extra
> > config.
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > >> > > > > > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM,
> Ted
> > > Yu
> > > >> <
> > > >> > > > > > > > > > [email protected]
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > >> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives, #2,
> > can
> > > >> you
> > > >> > > > > > elaborate
> > > >> > > > > > > a
> > > >> > > > > > > > > bit
> > > >> > > > > > > > > > > more
> > > >> > > > > > > > > > > > > on
> > > >> > > > > > > > > > > > > > > why
> > > >> > > > > > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > > > separate config has bigger
> impact
> > ?
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM,
> > > Dong
> > > >> > Lin <
> > > >> > > > > > > > > > > > [email protected]
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good
> > > >> overall.
> > > >> > > > Some
> > > >> > > > > > > > > comments
> > > >> > > > > > > > > > > > below:
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > - We usually specify the full
> > > mbean
> > > >> for
> > > >> > > the
> > > >> > > > > new
> > > >> > > > > > > > > metrics
> > > >> > > > > > > > > > > in
> > > >> > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > KIP.
> > > >> > > > > > > > > > > > > > > > > Can
> > > >> > > > > > > > > > > > > > > > > > > you
> > > >> > > > > > > > > > > > > > > > > > > > specify it in the Public
> > Interface
> > > >> > > section
> > > >> > > > > > > similar
> > > >> > > > > > > > > to
> > > >> > > > > > > > > > > > KIP-237
> > > >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+
> > > >> Metrics>
> > > >> > > > > > > > > > > > > > > > > > > > ?
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow the
> same
> > > >> > pattern
> > > >> > > as
> > > >> > > > > > > KIP-153
> > > >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > 153%3A+Include+only+client+
> traffic+in+BytesOutPerSec+
> > > >> > > > > > > > > > > > > metric>,
> > > >> > > > > > > > > > > > > > > > > > > > where we keep the existing
> > sensor
> > > >> name
> > > >> > > > > > > > > "BytesInPerSec"
> > > >> > > > > > > > > > > and
> > > >> > > > > > > > > > > > > add
> > > >> > > > > > > > > > > > > > a
> > > >> > > > > > > > > > > > > > > > new
> > > >> > > > > > > > > > > > > > > > > > > sensor
> > > >> > > > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec",
> > rather
> > > >> than
> > > >> > > > > > replacing
> > > >> > > > > > > > > the
> > > >> > > > > > > > > > > > sensor
> > > >> > > > > > > > > > > > > > > name "
> > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > > >> > > > > "ClientBytesInPerSec".
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP
> changes
> > > the
> > > >> > > > semantics
> > > >> > > > > > of
> > > >> > > > > > > > the
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > broker
> > > >> > > > > > > > > > > > > > > config
> > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests" because
> > the
> > > >> > number
> > > >> > > of
> > > >> > > > > > total
> > > >> > > > > > > > > > > requests
> > > >> > > > > > > > > > > > > > queued
> > > >> > > > > > > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > > > > broker will be no longer
> bounded
> > > by
> > > >> > > > > > > > > > > "queued.max.requests".
> > > >> > > > > > > > > > > > > This
> > > >> > > > > > > > > > > > > > > > > > probably
> > > >> > > > > > > > > > > > > > > > > > > > needs to be specified in the
> > > Public
> > > >> > > > > Interfaces
> > > >> > > > > > > > > section
> > > >> > > > > > > > > > > for
> > > >> > > > > > > > > > > > > > > > > discussion.
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45
> > PM,
> > > >> Lucas
> > > >> > > > Wang
> > > >> > > > > <
> > > >> > > > > > > > > > > > > > > > [email protected] >
> > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to add a
> > > >> separate
> > > >> > > queue
> > > >> > > > > for
> > > >> > > > > > > > > > > controller
> > > >> > > > > > > > > > > > > > > > requests:
> > > >> > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > 291%
> > > >> > > > > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > >> > > > > > > > > > control+requests+and+data+
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > > requests
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > Can you please take a look
> and
> > > >> let me
> > > >> > > > know
> > > >> > > > > > your
> > > >> > > > > > > > > > > feedback?
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> > > >> > > > > > > > > > > > > > > > > > > > > Regards,
> > > >> > > > > > > > > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Reply via email to