Hey Joel, Thank for the detail explanation. I agree the current design makes sense. My confusion is about whether the new config for the controller queue capacity is necessary. I cannot think of a case in which users would change it.
Thanks, Jiangjie (Becket) Qin On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <becket....@gmail.com> wrote: > Hi Lucas, > > I guess my question can be rephrased to "do we expect user to ever change > the controller request queue capacity"? If we agree that 20 is already a > very generous default number and we do not expect user to change it, is it > still necessary to expose this as a config? > > Thanks, > > Jiangjie (Becket) Qin > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <lucasatu...@gmail.com> wrote: > >> @Becket >> 1. Thanks for the comment. You are right that normally there should be >> just >> one controller request because of muting, >> and I had NOT intended to say there would be many enqueued controller >> requests. >> I went through the KIP again, and I'm not sure which part conveys that >> info. >> I'd be happy to revise if you point it out the section. >> >> 2. Though it should not happen in normal conditions, the current design >> does not preclude multiple controllers running >> at the same time, hence if we don't have the controller queue capacity >> config and simply make its capacity to be 1, >> network threads handling requests from different controllers will be >> blocked during those troublesome times, >> which is probably not what we want. On the other hand, adding the extra >> config with a default value, say 20, guards us from issues in those >> troublesome times, and IMO there isn't much downside of adding the extra >> config. >> >> @Mayuresh >> Good catch, this sentence is an obsolete statement based on a previous >> design. I've revised the wording in the KIP. >> >> Thanks, >> Lucas >> >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat < >> gharatmayures...@gmail.com> wrote: >> >> > Hi Lucas, >> > >> > Thanks for the KIP. >> > I am trying to understand why you think "The memory consumption can rise >> > given the total number of queued requests can go up to 2x" in the impact >> > section. Normally the requests from controller to a Broker are not high >> > volume, right ? >> > >> > >> > Thanks, >> > >> > Mayuresh >> > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <becket....@gmail.com> >> wrote: >> > >> > > Thanks for the KIP, Lucas. Separating the control plane from the data >> > plane >> > > makes a lot of sense. >> > > >> > > In the KIP you mentioned that the controller request queue may have >> many >> > > requests in it. Will this be a common case? The controller requests >> still >> > > goes through the SocketServer. The SocketServer will mute the channel >> > once >> > > a request is read and put into the request channel. So assuming there >> is >> > > only one connection between controller and each broker, on the broker >> > side, >> > > there should be only one controller request in the controller request >> > queue >> > > at any given time. If that is the case, do we need a separate >> controller >> > > request queue capacity config? The default value 20 means that we >> expect >> > > there are 20 controller switches to happen in a short period of time. >> I >> > am >> > > not sure whether someone should increase the controller request queue >> > > capacity to handle such case, as it seems indicating something very >> wrong >> > > has happened. >> > > >> > > Thanks, >> > > >> > > Jiangjie (Becket) Qin >> > > >> > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <lindon...@gmail.com> >> wrote: >> > > >> > > > Thanks for the update Lucas. >> > > > >> > > > I think the motivation section is intuitive. It will be good to >> learn >> > > more >> > > > about the comments from other reviewers. >> > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <lucasatu...@gmail.com> >> > > wrote: >> > > > >> > > > > Hi Dong, >> > > > > >> > > > > I've updated the motivation section of the KIP by explaining the >> > cases >> > > > that >> > > > > would have user impacts. >> > > > > Please take a look at let me know your comments. >> > > > > >> > > > > Thanks, >> > > > > Lucas >> > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <lucasatu...@gmail.com >> > >> > > > wrote: >> > > > > >> > > > > > Hi Dong, >> > > > > > >> > > > > > The simulation of disk being slow is merely for me to easily >> > > construct >> > > > a >> > > > > > testing scenario >> > > > > > with a backlog of produce requests. In production, other than >> the >> > > disk >> > > > > > being slow, a backlog of >> > > > > > produce requests may also be caused by high produce QPS. >> > > > > > In that case, we may not want to kill the broker and that's when >> > this >> > > > KIP >> > > > > > can be useful, both for JBOD >> > > > > > and non-JBOD setup. >> > > > > > >> > > > > > Going back to your previous question about each ProduceRequest >> > > covering >> > > > > 20 >> > > > > > partitions that are randomly >> > > > > > distributed, let's say a LeaderAndIsr request is enqueued that >> > tries >> > > to >> > > > > > switch the current broker, say broker0, from leader to follower >> > > > > > *for one of the partitions*, say *test-0*. For the sake of >> > argument, >> > > > > > let's also assume the other brokers, say broker1, have *stopped* >> > > > fetching >> > > > > > from >> > > > > > the current broker, i.e. broker0. >> > > > > > 1. If the enqueued produce requests have acks = -1 (ALL) >> > > > > > 1.1 without this KIP, the ProduceRequests ahead of >> LeaderAndISR >> > > will >> > > > be >> > > > > > put into the purgatory, >> > > > > > and since they'll never be replicated to other brokers >> > > (because >> > > > > of >> > > > > > the assumption made above), they will >> > > > > > be completed either when the LeaderAndISR request is >> > > processed >> > > > or >> > > > > > when the timeout happens. >> > > > > > 1.2 With this KIP, broker0 will immediately transition the >> > > partition >> > > > > > test-0 to become a follower, >> > > > > > after the current broker sees the replication of the >> > > remaining >> > > > 19 >> > > > > > partitions, it can send a response indicating that >> > > > > > it's no longer the leader for the "test-0". >> > > > > > To see the latency difference between 1.1 and 1.2, let's say >> > there >> > > > are >> > > > > > 24K produce requests ahead of the LeaderAndISR, and there are 8 >> io >> > > > > threads, >> > > > > > so each io thread will process approximately 3000 produce >> > requests. >> > > > Now >> > > > > > let's investigate the io thread that finally processed the >> > > > LeaderAndISR. >> > > > > > For the 3000 produce requests, if we model the time when their >> > > > > remaining >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR >> > > > request >> > > > > is >> > > > > > processed at time t3000. >> > > > > > Without this KIP, the 1st produce request would have waited an >> > > extra >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra time of >> t3000 - >> > > t1, >> > > > > etc. >> > > > > > Roughly speaking, the latency difference is bigger for the >> > earlier >> > > > > > produce requests than for the later ones. For the same reason, >> the >> > > more >> > > > > > ProduceRequests queued >> > > > > > before the LeaderAndISR, the bigger benefit we get (capped by >> the >> > > > > > produce timeout). >> > > > > > 2. If the enqueued produce requests have acks=0 or acks=1 >> > > > > > There will be no latency differences in this case, but >> > > > > > 2.1 without this KIP, the records of partition test-0 in the >> > > > > > ProduceRequests ahead of the LeaderAndISR will be appended to >> the >> > > local >> > > > > log, >> > > > > > and eventually be truncated after processing the >> > > LeaderAndISR. >> > > > > > This is what's referred to as >> > > > > > "some unofficial definition of data loss in terms of >> > messages >> > > > > > beyond the high watermark". >> > > > > > 2.2 with this KIP, we can mitigate the effect since if the >> > > > LeaderAndISR >> > > > > > is immediately processed, the response to producers will have >> > > > > > the NotLeaderForPartition error, causing producers to >> retry >> > > > > > >> > > > > > This explanation above is the benefit for reducing the latency >> of a >> > > > > broker >> > > > > > becoming the follower, >> > > > > > closely related is reducing the latency of a broker becoming the >> > > > leader. >> > > > > > In this case, the benefit is even more obvious, if other brokers >> > have >> > > > > > resigned leadership, and the >> > > > > > current broker should take leadership. Any delay in processing >> the >> > > > > > LeaderAndISR will be perceived >> > > > > > by clients as unavailability. In extreme cases, this can cause >> > failed >> > > > > > produce requests if the retries are >> > > > > > exhausted. >> > > > > > >> > > > > > Another two types of controller requests are UpdateMetadata and >> > > > > > StopReplica, which I'll briefly discuss as follows: >> > > > > > For UpdateMetadata requests, delayed processing means clients >> > > receiving >> > > > > > stale metadata, e.g. with the wrong leadership info >> > > > > > for certain partitions, and the effect is more retries or even >> > fatal >> > > > > > failure if the retries are exhausted. >> > > > > > >> > > > > > For StopReplica requests, a long queuing time may degrade the >> > > > performance >> > > > > > of topic deletion. >> > > > > > >> > > > > > Regarding your last question of the delay for >> > DescribeLogDirsRequest, >> > > > you >> > > > > > are right >> > > > > > that this KIP cannot help with the latency in getting the log >> dirs >> > > > info, >> > > > > > and it's only relevant >> > > > > > when controller requests are involved. >> > > > > > >> > > > > > Regards, >> > > > > > Lucas >> > > > > > >> > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <lindon...@gmail.com> >> > > wrote: >> > > > > > >> > > > > >> Hey Jun, >> > > > > >> >> > > > > >> Thanks much for the comments. It is good point. So the feature >> may >> > > be >> > > > > >> useful for JBOD use-case. I have one question below. >> > > > > >> >> > > > > >> Hey Lucas, >> > > > > >> >> > > > > >> Do you think this feature is also useful for non-JBOD setup or >> it >> > is >> > > > > only >> > > > > >> useful for the JBOD setup? It may be useful to understand this. >> > > > > >> >> > > > > >> When the broker is setup using JBOD, in order to move leaders >> on >> > the >> > > > > >> failed >> > > > > >> disk to other disks, the system operator first needs to get the >> > list >> > > > of >> > > > > >> partitions on the failed disk. This is currently achieved using >> > > > > >> AdminClient.describeLogDirs(), which sends >> DescribeLogDirsRequest >> > to >> > > > the >> > > > > >> broker. If we only prioritize the controller requests, then the >> > > > > >> DescribeLogDirsRequest >> > > > > >> may still take a long time to be processed by the broker. So >> the >> > > > overall >> > > > > >> time to move leaders away from the failed disk may still be >> long >> > > even >> > > > > with >> > > > > >> this KIP. What do you think? >> > > > > >> >> > > > > >> Thanks, >> > > > > >> Dong >> > > > > >> >> > > > > >> >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang < >> lucasatu...@gmail.com >> > > >> > > > > wrote: >> > > > > >> >> > > > > >> > Thanks for the insightful comment, Jun. >> > > > > >> > >> > > > > >> > @Dong, >> > > > > >> > Since both of the two comments in your previous email are >> about >> > > the >> > > > > >> > benefits of this KIP and whether it's useful, >> > > > > >> > in light of Jun's last comment, do you agree that this KIP >> can >> > be >> > > > > >> > beneficial in the case mentioned by Jun? >> > > > > >> > Please let me know, thanks! >> > > > > >> > >> > > > > >> > Regards, >> > > > > >> > Lucas >> > > > > >> > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <j...@confluent.io> >> > wrote: >> > > > > >> > >> > > > > >> > > Hi, Lucas, Dong, >> > > > > >> > > >> > > > > >> > > If all disks on a broker are slow, one probably should just >> > kill >> > > > the >> > > > > >> > > broker. In that case, this KIP may not help. If only one of >> > the >> > > > > disks >> > > > > >> on >> > > > > >> > a >> > > > > >> > > broker is slow, one may want to fail that disk and move the >> > > > leaders >> > > > > on >> > > > > >> > that >> > > > > >> > > disk to other brokers. In that case, being able to process >> the >> > > > > >> > LeaderAndIsr >> > > > > >> > > requests faster will potentially help the producers recover >> > > > quicker. >> > > > > >> > > >> > > > > >> > > Thanks, >> > > > > >> > > >> > > > > >> > > Jun >> > > > > >> > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin < >> lindon...@gmail.com >> > > >> > > > > wrote: >> > > > > >> > > >> > > > > >> > > > Hey Lucas, >> > > > > >> > > > >> > > > > >> > > > Thanks for the reply. Some follow up questions below. >> > > > > >> > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20 partitions >> > that >> > > > are >> > > > > >> > > randomly >> > > > > >> > > > distributed across all partitions, then each >> ProduceRequest >> > > will >> > > > > >> likely >> > > > > >> > > > cover some partitions for which the broker is still >> leader >> > > after >> > > > > it >> > > > > >> > > quickly >> > > > > >> > > > processes the >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be slow in >> > > > processing >> > > > > >> these >> > > > > >> > > > ProduceRequest and request will still be very high with >> this >> > > > KIP. >> > > > > It >> > > > > >> > > seems >> > > > > >> > > > that most ProduceRequest will still timeout after 30 >> > seconds. >> > > Is >> > > > > >> this >> > > > > >> > > > understanding correct? >> > > > > >> > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will still timeout >> after >> > > 30 >> > > > > >> > seconds, >> > > > > >> > > > then it is less clear how this KIP reduces average >> produce >> > > > > latency. >> > > > > >> Can >> > > > > >> > > you >> > > > > >> > > > clarify what metrics can be improved by this KIP? >> > > > > >> > > > >> > > > > >> > > > Not sure why system operator directly cares number of >> > > truncated >> > > > > >> > messages. >> > > > > >> > > > Do you mean this KIP can improve average throughput or >> > reduce >> > > > > >> message >> > > > > >> > > > duplication? It will be good to understand this. >> > > > > >> > > > >> > > > > >> > > > Thanks, >> > > > > >> > > > Dong >> > > > > >> > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang < >> > > lucasatu...@gmail.com >> > > > > >> > > > > >> > wrote: >> > > > > >> > > > >> > > > > >> > > > > Hi Dong, >> > > > > >> > > > > >> > > > > >> > > > > Thanks for your valuable comments. Please see my reply >> > > below. >> > > > > >> > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 partition. Now let's >> > > consider >> > > > a >> > > > > >> more >> > > > > >> > > > common >> > > > > >> > > > > scenario >> > > > > >> > > > > where broker0 is the leader of many partitions. And >> let's >> > > say >> > > > > for >> > > > > >> > some >> > > > > >> > > > > reason its IO becomes slow. >> > > > > >> > > > > The number of leader partitions on broker0 is so large, >> > say >> > > > 10K, >> > > > > >> that >> > > > > >> > > the >> > > > > >> > > > > cluster is skewed, >> > > > > >> > > > > and the operator would like to shift the leadership >> for a >> > > lot >> > > > of >> > > > > >> > > > > partitions, say 9K, to other brokers, >> > > > > >> > > > > either manually or through some service like cruise >> > control. >> > > > > >> > > > > With this KIP, not only will the leadership transitions >> > > finish >> > > > > >> more >> > > > > >> > > > > quickly, helping the cluster itself becoming more >> > balanced, >> > > > > >> > > > > but all existing producers corresponding to the 9K >> > > partitions >> > > > > will >> > > > > >> > get >> > > > > >> > > > the >> > > > > >> > > > > errors relatively quickly >> > > > > >> > > > > rather than relying on their timeout, thanks to the >> > batched >> > > > > async >> > > > > >> ZK >> > > > > >> > > > > operations. >> > > > > >> > > > > To me it's a useful feature to have during such >> > troublesome >> > > > > times. >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have shown that >> with >> > > this >> > > > > KIP >> > > > > >> > many >> > > > > >> > > > > producers >> > > > > >> > > > > receive an explicit error NotLeaderForPartition, based >> on >> > > > which >> > > > > >> they >> > > > > >> > > > retry >> > > > > >> > > > > immediately. >> > > > > >> > > > > Therefore the latency (~14 seconds+quick retry) for >> their >> > > > single >> > > > > >> > > message >> > > > > >> > > > is >> > > > > >> > > > > much smaller >> > > > > >> > > > > compared with the case of timing out without the KIP >> (30 >> > > > seconds >> > > > > >> for >> > > > > >> > > > timing >> > > > > >> > > > > out + quick retry). >> > > > > >> > > > > One might argue that reducing the timing out on the >> > producer >> > > > > side >> > > > > >> can >> > > > > >> > > > > achieve the same result, >> > > > > >> > > > > yet reducing the timeout has its own drawbacks[1]. >> > > > > >> > > > > >> > > > > >> > > > > Also *IF* there were a metric to show the number of >> > > truncated >> > > > > >> > messages >> > > > > >> > > on >> > > > > >> > > > > brokers, >> > > > > >> > > > > with the experiments done in the Google Doc, it should >> be >> > > easy >> > > > > to >> > > > > >> see >> > > > > >> > > > that >> > > > > >> > > > > a lot fewer messages need >> > > > > >> > > > > to be truncated on broker0 since the up-to-date >> metadata >> > > > avoids >> > > > > >> > > appending >> > > > > >> > > > > of messages >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk to a system >> > > > operator >> > > > > >> and >> > > > > >> > ask >> > > > > >> > > > > whether >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most likely the >> > answer >> > > > is >> > > > > >> yes. >> > > > > >> > > > > >> > > > > >> > > > > 3. To answer your question, I think it might be >> helpful to >> > > > > >> construct >> > > > > >> > > some >> > > > > >> > > > > formulas. >> > > > > >> > > > > To simplify the modeling, I'm going back to the case >> where >> > > > there >> > > > > >> is >> > > > > >> > > only >> > > > > >> > > > > ONE partition involved. >> > > > > >> > > > > Following the experiments in the Google Doc, let's say >> > > broker0 >> > > > > >> > becomes >> > > > > >> > > > the >> > > > > >> > > > > follower at time t0, >> > > > > >> > > > > and after t0 there were still N produce requests in its >> > > > request >> > > > > >> > queue. >> > > > > >> > > > > With the up-to-date metadata brought by this KIP, >> broker0 >> > > can >> > > > > >> reply >> > > > > >> > > with >> > > > > >> > > > an >> > > > > >> > > > > NotLeaderForPartition exception, >> > > > > >> > > > > let's use M1 to denote the average processing time of >> > > replying >> > > > > >> with >> > > > > >> > > such >> > > > > >> > > > an >> > > > > >> > > > > error message. >> > > > > >> > > > > Without this KIP, the broker will need to append >> messages >> > to >> > > > > >> > segments, >> > > > > >> > > > > which may trigger a flush to disk, >> > > > > >> > > > > let's use M2 to denote the average processing time for >> > such >> > > > > logic. >> > > > > >> > > > > Then the average extra latency incurred without this >> KIP >> > is >> > > N >> > > > * >> > > > > >> (M2 - >> > > > > >> > > > M1) / >> > > > > >> > > > > 2. >> > > > > >> > > > > >> > > > > >> > > > > In practice, M2 should always be larger than M1, which >> > means >> > > > as >> > > > > >> long >> > > > > >> > > as N >> > > > > >> > > > > is positive, >> > > > > >> > > > > we would see improvements on the average latency. >> > > > > >> > > > > There does not need to be significant backlog of >> requests >> > in >> > > > the >> > > > > >> > > request >> > > > > >> > > > > queue, >> > > > > >> > > > > or severe degradation of disk performance to have the >> > > > > improvement. >> > > > > >> > > > > >> > > > > >> > > > > Regards, >> > > > > >> > > > > Lucas >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on the producer >> > side >> > > > can >> > > > > >> > trigger >> > > > > >> > > > > unnecessary duplicate requests >> > > > > >> > > > > when the corresponding leader broker is overloaded, >> > > > exacerbating >> > > > > >> the >> > > > > >> > > > > situation. >> > > > > >> > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin < >> > > lindon...@gmail.com >> > > > > >> > > > > >> > wrote: >> > > > > >> > > > > >> > > > > >> > > > > > Hey Lucas, >> > > > > >> > > > > > >> > > > > >> > > > > > Thanks much for the detailed documentation of the >> > > > experiment. >> > > > > >> > > > > > >> > > > > >> > > > > > Initially I also think having a separate queue for >> > > > controller >> > > > > >> > > requests >> > > > > >> > > > is >> > > > > >> > > > > > useful because, as you mentioned in the summary >> section >> > of >> > > > the >> > > > > >> > Google >> > > > > >> > > > > doc, >> > > > > >> > > > > > controller requests are generally more important than >> > data >> > > > > >> requests >> > > > > >> > > and >> > > > > >> > > > > we >> > > > > >> > > > > > probably want controller requests to be processed >> > sooner. >> > > > But >> > > > > >> then >> > > > > >> > > Eno >> > > > > >> > > > > has >> > > > > >> > > > > > two very good questions which I am not sure the >> Google >> > doc >> > > > has >> > > > > >> > > answered >> > > > > >> > > > > > explicitly. Could you help with the following >> questions? >> > > > > >> > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the actual benefit of >> > > > KIP-291 >> > > > > to >> > > > > >> > > users. >> > > > > >> > > > > The >> > > > > >> > > > > > experiment setup in the Google doc simulates the >> > scenario >> > > > that >> > > > > >> > broker >> > > > > >> > > > is >> > > > > >> > > > > > very slow handling ProduceRequest due to e.g. slow >> disk. >> > > It >> > > > > >> > currently >> > > > > >> > > > > > assumes that there is only 1 partition. But in the >> > common >> > > > > >> scenario, >> > > > > >> > > it >> > > > > >> > > > is >> > > > > >> > > > > > probably reasonable to assume that there are many >> other >> > > > > >> partitions >> > > > > >> > > that >> > > > > >> > > > > are >> > > > > >> > > > > > also actively produced to and ProduceRequest to these >> > > > > partition >> > > > > >> > also >> > > > > >> > > > > takes >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if broker0 >> can >> > > > become >> > > > > >> > > follower >> > > > > >> > > > > for >> > > > > >> > > > > > the partition 0 soon, it probably still needs to >> process >> > > the >> > > > > >> > > > > ProduceRequest >> > > > > >> > > > > > slowly t in the queue because these ProduceRequests >> > cover >> > > > > other >> > > > > >> > > > > partitions. >> > > > > >> > > > > > Thus most ProduceRequest will still timeout after 30 >> > > seconds >> > > > > and >> > > > > >> > most >> > > > > >> > > > > > clients will still likely timeout after 30 seconds. >> Then >> > > it >> > > > is >> > > > > >> not >> > > > > >> > > > > > obviously what is the benefit to client since client >> > will >> > > > > >> timeout >> > > > > >> > > after >> > > > > >> > > > > 30 >> > > > > >> > > > > > seconds before possibly re-connecting to broker1, >> with >> > or >> > > > > >> without >> > > > > >> > > > > KIP-291. >> > > > > >> > > > > > Did I miss something here? >> > > > > >> > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the specific benefits >> of >> > > this >> > > > > >> KIP to >> > > > > >> > > > user >> > > > > >> > > > > or >> > > > > >> > > > > > system administrator, e.g. whether this KIP decreases >> > > > average >> > > > > >> > > latency, >> > > > > >> > > > > > 999th percentile latency, probably of exception >> exposed >> > to >> > > > > >> client >> > > > > >> > > etc. >> > > > > >> > > > It >> > > > > >> > > > > > is probably useful to clarify this. >> > > > > >> > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user experience only >> when >> > > > there >> > > > > is >> > > > > >> > > issue >> > > > > >> > > > > with >> > > > > >> > > > > > broker, e.g. significant backlog in the request queue >> > due >> > > to >> > > > > >> slow >> > > > > >> > > disk >> > > > > >> > > > as >> > > > > >> > > > > > described in the Google doc? Or is this KIP also >> useful >> > > when >> > > > > >> there >> > > > > >> > is >> > > > > >> > > > no >> > > > > >> > > > > > ongoing issue in the cluster? It might be helpful to >> > > clarify >> > > > > >> this >> > > > > >> > to >> > > > > >> > > > > > understand the benefit of this KIP. >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > Thanks much, >> > > > > >> > > > > > Dong >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang < >> > > > > >> lucasatu...@gmail.com >> > > > > >> > > >> > > > > >> > > > > wrote: >> > > > > >> > > > > > >> > > > > >> > > > > > > Hi Eno, >> > > > > >> > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the experiment >> results. >> > > > > >> > > > > > > Here is a link to the positive impact achieved by >> > > > > implementing >> > > > > >> > the >> > > > > >> > > > > > proposed >> > > > > >> > > > > > > change: >> > > > > >> > > > > > > https://docs.google.com/document/d/ >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing >> > > > > >> > > > > > > Please take a look when you have time and let me >> know >> > > your >> > > > > >> > > feedback. >> > > > > >> > > > > > > >> > > > > >> > > > > > > Regards, >> > > > > >> > > > > > > Lucas >> > > > > >> > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha < >> > > ka...@harsha.io> >> > > > > >> wrote: >> > > > > >> > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a look might >> suit >> > > our >> > > > > >> > > > requirements >> > > > > >> > > > > > > > better. >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > Thanks, >> > > > > >> > > > > > > > Harsha >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang < >> > > > > >> > > > lucasatu...@gmail.com >> > > > > >> > > > > > >> > > > > >> > > > > > > > wrote: >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha, >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the replication >> quota >> > > > > mechanism >> > > > > >> > > > proposed >> > > > > >> > > > > > in >> > > > > >> > > > > > > > > KIP-73 can be helpful in that scenario. >> > > > > >> > > > > > > > > Have you tried it out? >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > Thanks, >> > > > > >> > > > > > > > > Lucas >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < >> > > > > ka...@harsha.io >> > > > > >> > >> > > > > >> > > > wrote: >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > Hi Lucas, >> > > > > >> > > > > > > > > > One more question, any thoughts on making >> this >> > > > > >> configurable >> > > > > >> > > > > > > > > > and also allowing subset of data requests to >> be >> > > > > >> > prioritized. >> > > > > >> > > > For >> > > > > >> > > > > > > > example >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when we take out a >> > > broker >> > > > > and >> > > > > >> > bring >> > > > > >> > > > new >> > > > > >> > > > > > one >> > > > > >> > > > > > > > it >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > will try to become follower and have lot of >> > fetch >> > > > > >> requests >> > > > > >> > to >> > > > > >> > > > > other >> > > > > >> > > > > > > > > leaders >> > > > > >> > > > > > > > > > in clusters. This will negatively effect the >> > > > > >> > > application/client >> > > > > >> > > > > > > > > requests. >> > > > > >> > > > > > > > > > We are also exploring the similar solution to >> > > > > >> de-prioritize >> > > > > >> > > if >> > > > > >> > > > a >> > > > > >> > > > > > new >> > > > > >> > > > > > > > > > replica comes in for fetch requests, we are >> ok >> > > with >> > > > > the >> > > > > >> > > replica >> > > > > >> > > > > to >> > > > > >> > > > > > be >> > > > > >> > > > > > > > > > taking time but the leaders should prioritize >> > the >> > > > > client >> > > > > >> > > > > requests. >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > > Thanks, >> > > > > >> > > > > > > > > > Harsha >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang >> > > wrote: >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > Hi Eno, >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed response. >> > > > > >> > > > > > > > > > > - I haven't implemented the feature yet, >> so no >> > > > > >> > experimental >> > > > > >> > > > > > results >> > > > > >> > > > > > > > so >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > > far. >> > > > > >> > > > > > > > > > > And I plan to test in out in the following >> > days. >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right that the >> priority >> > > queue >> > > > > >> does >> > > > > >> > not >> > > > > >> > > > > > > > completely >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > > prevent >> > > > > >> > > > > > > > > > > data requests being processed ahead of >> > > controller >> > > > > >> > requests. >> > > > > >> > > > > > > > > > > That being said, I expect it to greatly >> > mitigate >> > > > the >> > > > > >> > effect >> > > > > >> > > > of >> > > > > >> > > > > > > stable >> > > > > >> > > > > > > > > > > metadata. >> > > > > >> > > > > > > > > > > In any case, I'll try it out and post the >> > > results >> > > > > >> when I >> > > > > >> > > have >> > > > > >> > > > > it. >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > Regards, >> > > > > >> > > > > > > > > > > Lucas >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno >> Thereska >> > < >> > > > > >> > > > > > > > eno.there...@gmail.com >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > > > wrote: >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas, >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a look at >> > this. >> > > A >> > > > > >> couple >> > > > > >> > of >> > > > > >> > > > > > > > questions: >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > > > - did you notice any positive change >> after >> > > > > >> implementing >> > > > > >> > > > this >> > > > > >> > > > > > KIP? >> > > > > >> > > > > > > > > I'm >> > > > > >> > > > > > > > > > > > wondering if you have any experimental >> > results >> > > > > that >> > > > > >> > show >> > > > > >> > > > the >> > > > > >> > > > > > > > benefit >> > > > > >> > > > > > > > > of >> > > > > >> > > > > > > > > > > the >> > > > > >> > > > > > > > > > > > two queues. >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not sufficient in >> > > > addressing >> > > > > >> the >> > > > > >> > > > > problem >> > > > > >> > > > > > > the >> > > > > >> > > > > > > > > KIP >> > > > > >> > > > > > > > > > > > identifies. Even with priority queues, >> you >> > > will >> > > > > >> > sometimes >> > > > > >> > > > > > > (often?) >> > > > > >> > > > > > > > > have >> > > > > >> > > > > > > > > > > the >> > > > > >> > > > > > > > > > > > case that data plane requests will be >> ahead >> > of >> > > > the >> > > > > >> > > control >> > > > > >> > > > > > plane >> > > > > >> > > > > > > > > > > requests. >> > > > > >> > > > > > > > > > > > This happens because the system might >> have >> > > > already >> > > > > >> > > started >> > > > > >> > > > > > > > > processing >> > > > > >> > > > > > > > > > > the >> > > > > >> > > > > > > > > > > > data plane requests before the control >> plane >> > > > ones >> > > > > >> > > arrived. >> > > > > >> > > > So >> > > > > >> > > > > > it >> > > > > >> > > > > > > > > would >> > > > > >> > > > > > > > > > > be >> > > > > >> > > > > > > > > > > > good to know what % of the problem this >> KIP >> > > > > >> addresses. >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > Thanks >> > > > > >> > > > > > > > > > > > Eno >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu < >> > > > > >> > > > > yuzhih...@gmail.com >> > > > > >> > > > > > > >> > > > > >> > > > > > > > > wrote: >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > Change looks good. >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > Thanks >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas >> > Wang >> > > < >> > > > > >> > > > > > > > lucasatu...@gmail.com >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > wrote: >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted, >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've >> updated >> > > the >> > > > > KIP. >> > > > > >> > > Please >> > > > > >> > > > > > take >> > > > > >> > > > > > > > > > another >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > look. >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Lucas >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted >> Yu >> > < >> > > > > >> > > > > > > yuzhih...@gmail.com >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > > wrote: >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala : >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500 >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if you can include >> > the >> > > > > >> default >> > > > > >> > > value >> > > > > >> > > > > for >> > > > > >> > > > > > > > this >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > new >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > config >> > > > > >> > > > > > > > > > > > > > > in the KIP. >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > Thanks >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, >> Lucas >> > > > Wang >> > > > > < >> > > > > >> > > > > > > > > > lucasatu...@gmail.com >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > wrote: >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP by adding a >> new >> > > > > config, >> > > > > >> > > > instead >> > > > > >> > > > > of >> > > > > >> > > > > > > > > reusing >> > > > > >> > > > > > > > > > > the >> > > > > >> > > > > > > > > > > > > > > > existing one. >> > > > > >> > > > > > > > > > > > > > > > Please take another look when you >> > have >> > > > > time. >> > > > > >> > > > Thanks a >> > > > > >> > > > > > > lot! >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > Lucas >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, >> Ted >> > > Yu >> > > > < >> > > > > >> > > > > > > > yuzhih...@gmail.com >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > wrote: >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of resource >> if >> > > > > control >> > > > > >> > > request >> > > > > >> > > > > > rate >> > > > > >> > > > > > > is >> > > > > >> > > > > > > > > low >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > I don't know if control request >> > rate >> > > > can >> > > > > >> get >> > > > > >> > to >> > > > > >> > > > > > > 100,000, >> > > > > >> > > > > > > > > > > likely >> > > > > >> > > > > > > > > > > > > not. >> > > > > >> > > > > > > > > > > > > > > Then >> > > > > >> > > > > > > > > > > > > > > > > using the same bound as that >> for >> > > data >> > > > > >> > requests >> > > > > >> > > > > seems >> > > > > >> > > > > > > > high. >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 >> PM, >> > > > Lucas >> > > > > >> Wang >> > > > > >> > < >> > > > > >> > > > > > > > > > > > > lucasatu...@gmail.com > >> > > > > >> > > > > > > > > > > > > > > > > wrote: >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted, >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at >> this >> > > > KIP. >> > > > > >> > > > > > > > > > > > > > > > > > Let's say today the setting >> of >> > > > > >> > > > > > "queued.max.requests" >> > > > > >> > > > > > > in >> > > > > >> > > > > > > > > > > > cluster A >> > > > > >> > > > > > > > > > > > > > is >> > > > > >> > > > > > > > > > > > > > > > > 1000, >> > > > > >> > > > > > > > > > > > > > > > > > while the setting in cluster >> B >> > is >> > > > > >> 100,000. >> > > > > >> > > > > > > > > > > > > > > > > > The 100 times difference >> might >> > > have >> > > > > >> > indicated >> > > > > >> > > > > that >> > > > > >> > > > > > > > > machines >> > > > > >> > > > > > > > > > > in >> > > > > >> > > > > > > > > > > > > > > cluster >> > > > > >> > > > > > > > > > > > > > > > B >> > > > > >> > > > > > > > > > > > > > > > > > have larger memory. >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > By reusing the >> > > > "queued.max.requests", >> > > > > >> the >> > > > > >> > > > > > > > > > > controlRequestQueue >> > > > > >> > > > > > > > > > > > in >> > > > > >> > > > > > > > > > > > > > > > cluster >> > > > > >> > > > > > > > > > > > > > > > > B >> > > > > >> > > > > > > > > > > > > > > > > > automatically >> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity without >> > > > > explicitly >> > > > > >> > > > bothering >> > > > > >> > > > > > the >> > > > > >> > > > > > > > > > > > operators. >> > > > > >> > > > > > > > > > > > > > > > > > I understand the counter >> > argument >> > > > can >> > > > > be >> > > > > >> > that >> > > > > >> > > > > maybe >> > > > > >> > > > > > > > > that's >> > > > > >> > > > > > > > > > a >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > waste >> > > > > >> > > > > > > > > > > > > > of >> > > > > >> > > > > > > > > > > > > > > > > > resource if control request >> > > > > >> > > > > > > > > > > > > > > > > > rate is low and operators may >> > want >> > > > to >> > > > > >> fine >> > > > > >> > > tune >> > > > > >> > > > > the >> > > > > >> > > > > > > > > > capacity >> > > > > >> > > > > > > > > > > of >> > > > > >> > > > > > > > > > > > > the >> > > > > >> > > > > > > > > > > > > > > > > > controlRequestQueue. >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with either approach, >> and >> > > can >> > > > > >> change >> > > > > >> > > it >> > > > > >> > > > if >> > > > > >> > > > > > you >> > > > > >> > > > > > > > or >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > > anyone >> > > > > >> > > > > > > > > > > > > > else >> > > > > >> > > > > > > > > > > > > > > > > feels >> > > > > >> > > > > > > > > > > > > > > > > > strong about adding the extra >> > > > config. >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks, >> > > > > >> > > > > > > > > > > > > > > > > > Lucas >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 >> PM, >> > > Ted >> > > > > Yu >> > > > > >> < >> > > > > >> > > > > > > > > > yuzhih...@gmail.com >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > wrote: >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Lucas: >> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected >> Alternatives, >> > #2, >> > > > can >> > > > > >> you >> > > > > >> > > > > > elaborate >> > > > > >> > > > > > > a >> > > > > >> > > > > > > > > bit >> > > > > >> > > > > > > > > > > more >> > > > > >> > > > > > > > > > > > > on >> > > > > >> > > > > > > > > > > > > > > why >> > > > > >> > > > > > > > > > > > > > > > > the >> > > > > >> > > > > > > > > > > > > > > > > > > separate config has bigger >> > > impact >> > > > ? >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Thanks >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at >> 2:00 >> > PM, >> > > > > Dong >> > > > > >> > Lin < >> > > > > >> > > > > > > > > > > > lindon...@gmail.com >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > wrote: >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca, >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks >> > good >> > > > > >> overall. >> > > > > >> > > > Some >> > > > > >> > > > > > > > > comments >> > > > > >> > > > > > > > > > > > below: >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually specify the >> > full >> > > > > mbean >> > > > > >> for >> > > > > >> > > the >> > > > > >> > > > > new >> > > > > >> > > > > > > > > metrics >> > > > > >> > > > > > > > > > > in >> > > > > >> > > > > > > > > > > > > the >> > > > > >> > > > > > > > > > > > > > > KIP. >> > > > > >> > > > > > > > > > > > > > > > > Can >> > > > > >> > > > > > > > > > > > > > > > > > > you >> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in the Public >> > > > Interface >> > > > > >> > > section >> > > > > >> > > > > > > similar >> > > > > >> > > > > > > > > to >> > > > > >> > > > > > > > > > > > KIP-237 >> > > > > >> > > > > > > > > > > > > > > > > > > > < >> https://cwiki.apache.org/ >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP- >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > 237%3A+More+Controller+Health+ >> > > > > >> Metrics> >> > > > > >> > > > > > > > > > > > > > > > > > > > ? >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow >> the >> > > same >> > > > > >> > pattern >> > > > > >> > > as >> > > > > >> > > > > > > KIP-153 >> > > > > >> > > > > > > > > > > > > > > > > > > > < >> https://cwiki.apache.org/ >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP- >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > >> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+ >> > > > > >> > > > > > > > > > > > > metric>, >> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep the >> existing >> > > > sensor >> > > > > >> name >> > > > > >> > > > > > > > > "BytesInPerSec" >> > > > > >> > > > > > > > > > > and >> > > > > >> > > > > > > > > > > > > add >> > > > > >> > > > > > > > > > > > > > a >> > > > > >> > > > > > > > > > > > > > > > new >> > > > > >> > > > > > > > > > > > > > > > > > > sensor >> > > > > >> > > > > > > > > > > > > > > > > > > > >> "ReplicationBytesInPerSec", >> > > > rather >> > > > > >> than >> > > > > >> > > > > > replacing >> > > > > >> > > > > > > > > the >> > > > > >> > > > > > > > > > > > sensor >> > > > > >> > > > > > > > > > > > > > > name " >> > > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g. >> > > > > >> > > > > "ClientBytesInPerSec". >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP >> > > changes >> > > > > the >> > > > > >> > > > semantics >> > > > > >> > > > > > of >> > > > > >> > > > > > > > the >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > > broker >> > > > > >> > > > > > > > > > > > > > > config >> > > > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests" >> > because >> > > > the >> > > > > >> > number >> > > > > >> > > of >> > > > > >> > > > > > total >> > > > > >> > > > > > > > > > > requests >> > > > > >> > > > > > > > > > > > > > queued >> > > > > >> > > > > > > > > > > > > > > > in >> > > > > >> > > > > > > > > > > > > > > > > > the >> > > > > >> > > > > > > > > > > > > > > > > > > > broker will be no longer >> > > bounded >> > > > > by >> > > > > >> > > > > > > > > > > "queued.max.requests". >> > > > > >> > > > > > > > > > > > > This >> > > > > >> > > > > > > > > > > > > > > > > > probably >> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be specified in >> the >> > > > > Public >> > > > > >> > > > > Interfaces >> > > > > >> > > > > > > > > section >> > > > > >> > > > > > > > > > > for >> > > > > >> > > > > > > > > > > > > > > > > discussion. >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks, >> > > > > >> > > > > > > > > > > > > > > > > > > > Dong >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at >> > 12:45 >> > > > PM, >> > > > > >> Lucas >> > > > > >> > > > Wang >> > > > > >> > > > > < >> > > > > >> > > > > > > > > > > > > > > > lucasatu...@gmail.com > >> > > > > >> > > > > > > > > > > > > > > > > > > > wrote: >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts, >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to >> add a >> > > > > >> separate >> > > > > >> > > queue >> > > > > >> > > > > for >> > > > > >> > > > > > > > > > > controller >> > > > > >> > > > > > > > > > > > > > > > requests: >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> https://cwiki.apache.org/ >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP- >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 291% >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > 3A+Have+separate+queues+for+ >> > > > > >> > > > > > > > > > control+requests+and+data+ >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > requests >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you please take a >> look >> > > and >> > > > > >> let me >> > > > > >> > > > know >> > > > > >> > > > > > your >> > > > > >> > > > > > > > > > > feedback? >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your >> > time! >> > > > > >> > > > > > > > > > > > > > > > > > > > > Regards, >> > > > > >> > > > > > > > > > > > > > > > > > > > > Lucas >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > >> > > > > >> > > >> > > > > >> > >> > > > > >> >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >> > -- >> > -Regards, >> > Mayuresh R. Gharat >> > (862) 250-7125 >> > >> > >