Actually nvm, correlationId is reset in case of connection loss, I think. Thanks,
Mayuresh On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <gharatmayures...@gmail.com> wrote: > I agree with Dong that out-of-order processing can happen with having 2 > separate queues as well and it can even happen today. > Can we use the correlationId in the request from the controller to the > broker to handle ordering ? > > Thanks, > > Mayuresh > > > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <becket....@gmail.com> wrote: > >> Good point, Joel. I agree that a dedicated controller request handling >> thread would be a better isolation. It also solves the reordering issue. >> >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jjkosh...@gmail.com> wrote: >> >> > Good example. I think this scenario can occur in the current code as >> well >> > but with even lower probability given that there are other >> non-controller >> > requests interleaved. It is still sketchy though and I think a safer >> > approach would be separate queues and pinning controller request >> handling >> > to one handler thread. >> > >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <lindon...@gmail.com> wrote: >> > >> > > Hey Becket, >> > > >> > > I think you are right that there may be out-of-order processing. >> However, >> > > it seems that out-of-order processing may also happen even if we use a >> > > separate queue. >> > > >> > > Here is the example: >> > > >> > > - Controller sends R1 and got disconnected before receiving response. >> > Then >> > > it reconnects and sends R2. Both requests now stay in the controller >> > > request queue in the order they are sent. >> > > - thread1 takes R1_a from the request queue and then thread2 takes R2 >> > from >> > > the request queue almost at the same time. >> > > - So R1_a and R2 are processed in parallel. There is chance that R2's >> > > processing is completed before R1. >> > > >> > > If out-of-order processing can happen for both approaches with very >> low >> > > probability, it may not be worthwhile to add the extra queue. What do >> you >> > > think? >> > > >> > > Thanks, >> > > Dong >> > > >> > > >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <becket....@gmail.com> >> > wrote: >> > > >> > > > Hi Mayuresh/Joel, >> > > > >> > > > Using the request channel as a dequeue was bright up some time ago >> when >> > > we >> > > > initially thinking of prioritizing the request. The concern was that >> > the >> > > > controller requests are supposed to be processed in order. If we can >> > > ensure >> > > > that there is one controller request in the request channel, the >> order >> > is >> > > > not a concern. But in cases that there are more than one controller >> > > request >> > > > inserted into the queue, the controller request order may change and >> > > cause >> > > > problem. For example, think about the following sequence: >> > > > 1. Controller successfully sent a request R1 to broker >> > > > 2. Broker receives R1 and put the request to the head of the request >> > > queue. >> > > > 3. Controller to broker connection failed and the controller >> > reconnected >> > > to >> > > > the broker. >> > > > 4. Controller sends a request R2 to the broker >> > > > 5. Broker receives R2 and add it to the head of the request queue. >> > > > Now on the broker side, R2 will be processed before R1 is processed, >> > > which >> > > > may cause problem. >> > > > >> > > > Thanks, >> > > > >> > > > Jiangjie (Becket) Qin >> > > > >> > > > >> > > > >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jjkosh...@gmail.com> >> > wrote: >> > > > >> > > > > @Mayuresh - I like your idea. It appears to be a simpler less >> > invasive >> > > > > alternative and it should work. Jun/Becket/others, do you see any >> > > > pitfalls >> > > > > with this approach? >> > > > > >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang < >> lucasatu...@gmail.com> >> > > > > wrote: >> > > > > >> > > > > > @Mayuresh, >> > > > > > That's a very interesting idea that I haven't thought before. >> > > > > > It seems to solve our problem at hand pretty well, and also >> > > > > > avoids the need to have a new size metric and capacity config >> > > > > > for the controller request queue. In fact, if we were to adopt >> > > > > > this design, there is no public interface change, and we >> > > > > > probably don't need a KIP. >> > > > > > Also implementation wise, it seems >> > > > > > the java class LinkedBlockingQueue can readily satisfy the >> > > requirement >> > > > > > by supporting a capacity, and also allowing inserting at both >> ends. >> > > > > > >> > > > > > My only concern is that this design is tied to the coincidence >> that >> > > > > > we have two request priorities and there are two ends to a >> deque. >> > > > > > Hence by using the proposed design, it seems the network layer >> is >> > > > > > more tightly coupled with upper layer logic, e.g. if we were to >> add >> > > > > > an extra priority level in the future for some reason, we would >> > > > probably >> > > > > > need to go back to the design of separate queues, one for each >> > > priority >> > > > > > level. >> > > > > > >> > > > > > In summary, I'm ok with both designs and lean toward your >> suggested >> > > > > > approach. >> > > > > > Let's hear what others think. >> > > > > > >> > > > > > @Becket, >> > > > > > In light of Mayuresh's suggested new design, I'm answering your >> > > > question >> > > > > > only in the context >> > > > > > of the current KIP design: I think your suggestion makes sense, >> and >> > > I'm >> > > > > ok >> > > > > > with removing the capacity config and >> > > > > > just relying on the default value of 20 being sufficient enough. >> > > > > > >> > > > > > Thanks, >> > > > > > Lucas >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat < >> > > > > > gharatmayures...@gmail.com >> > > > > > > wrote: >> > > > > > >> > > > > > > Hi Lucas, >> > > > > > > >> > > > > > > Seems like the main intent here is to prioritize the >> controller >> > > > request >> > > > > > > over any other requests. >> > > > > > > In that case, we can change the request queue to a dequeue, >> where >> > > you >> > > > > > > always insert the normal requests (produce, consume,..etc) to >> the >> > > end >> > > > > of >> > > > > > > the dequeue, but if its a controller request, you insert it to >> > the >> > > > head >> > > > > > of >> > > > > > > the queue. This ensures that the controller request will be >> given >> > > > > higher >> > > > > > > priority over other requests. >> > > > > > > >> > > > > > > Also since we only read one request from the socket and mute >> it >> > and >> > > > > only >> > > > > > > unmute it after handling the request, this would ensure that >> we >> > > don't >> > > > > > > handle controller requests out of order. >> > > > > > > >> > > > > > > With this approach we can avoid the second queue and the >> > additional >> > > > > > config >> > > > > > > for the size of the queue. >> > > > > > > >> > > > > > > What do you think ? >> > > > > > > >> > > > > > > Thanks, >> > > > > > > >> > > > > > > Mayuresh >> > > > > > > >> > > > > > > >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin < >> becket....@gmail.com >> > > >> > > > > wrote: >> > > > > > > >> > > > > > > > Hey Joel, >> > > > > > > > >> > > > > > > > Thank for the detail explanation. I agree the current design >> > > makes >> > > > > > sense. >> > > > > > > > My confusion is about whether the new config for the >> controller >> > > > queue >> > > > > > > > capacity is necessary. I cannot think of a case in which >> users >> > > > would >> > > > > > > change >> > > > > > > > it. >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin >> > > > > > > > >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin < >> > > becket....@gmail.com> >> > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi Lucas, >> > > > > > > > > >> > > > > > > > > I guess my question can be rephrased to "do we expect >> user to >> > > > ever >> > > > > > > change >> > > > > > > > > the controller request queue capacity"? If we agree that >> 20 >> > is >> > > > > > already >> > > > > > > a >> > > > > > > > > very generous default number and we do not expect user to >> > > change >> > > > > it, >> > > > > > is >> > > > > > > > it >> > > > > > > > > still necessary to expose this as a config? >> > > > > > > > > >> > > > > > > > > Thanks, >> > > > > > > > > >> > > > > > > > > Jiangjie (Becket) Qin >> > > > > > > > > >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang < >> > > > lucasatu...@gmail.com >> > > > > > >> > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > >> @Becket >> > > > > > > > >> 1. Thanks for the comment. You are right that normally >> there >> > > > > should >> > > > > > be >> > > > > > > > >> just >> > > > > > > > >> one controller request because of muting, >> > > > > > > > >> and I had NOT intended to say there would be many >> enqueued >> > > > > > controller >> > > > > > > > >> requests. >> > > > > > > > >> I went through the KIP again, and I'm not sure which part >> > > > conveys >> > > > > > that >> > > > > > > > >> info. >> > > > > > > > >> I'd be happy to revise if you point it out the section. >> > > > > > > > >> >> > > > > > > > >> 2. Though it should not happen in normal conditions, the >> > > current >> > > > > > > design >> > > > > > > > >> does not preclude multiple controllers running >> > > > > > > > >> at the same time, hence if we don't have the controller >> > queue >> > > > > > capacity >> > > > > > > > >> config and simply make its capacity to be 1, >> > > > > > > > >> network threads handling requests from different >> controllers >> > > > will >> > > > > be >> > > > > > > > >> blocked during those troublesome times, >> > > > > > > > >> which is probably not what we want. On the other hand, >> > adding >> > > > the >> > > > > > > extra >> > > > > > > > >> config with a default value, say 20, guards us from >> issues >> > in >> > > > > those >> > > > > > > > >> troublesome times, and IMO there isn't much downside of >> > adding >> > > > the >> > > > > > > extra >> > > > > > > > >> config. >> > > > > > > > >> >> > > > > > > > >> @Mayuresh >> > > > > > > > >> Good catch, this sentence is an obsolete statement based >> on >> > a >> > > > > > previous >> > > > > > > > >> design. I've revised the wording in the KIP. >> > > > > > > > >> >> > > > > > > > >> Thanks, >> > > > > > > > >> Lucas >> > > > > > > > >> >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat < >> > > > > > > > >> gharatmayures...@gmail.com> wrote: >> > > > > > > > >> >> > > > > > > > >> > Hi Lucas, >> > > > > > > > >> > >> > > > > > > > >> > Thanks for the KIP. >> > > > > > > > >> > I am trying to understand why you think "The memory >> > > > consumption >> > > > > > can >> > > > > > > > rise >> > > > > > > > >> > given the total number of queued requests can go up to >> 2x" >> > > in >> > > > > the >> > > > > > > > impact >> > > > > > > > >> > section. Normally the requests from controller to a >> Broker >> > > are >> > > > > not >> > > > > > > > high >> > > > > > > > >> > volume, right ? >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > Thanks, >> > > > > > > > >> > >> > > > > > > > >> > Mayuresh >> > > > > > > > >> > >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin < >> > > > > becket....@gmail.com> >> > > > > > > > >> wrote: >> > > > > > > > >> > >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control >> plane >> > > from >> > > > > the >> > > > > > > > data >> > > > > > > > >> > plane >> > > > > > > > >> > > makes a lot of sense. >> > > > > > > > >> > > >> > > > > > > > >> > > In the KIP you mentioned that the controller request >> > queue >> > > > may >> > > > > > > have >> > > > > > > > >> many >> > > > > > > > >> > > requests in it. Will this be a common case? The >> > controller >> > > > > > > requests >> > > > > > > > >> still >> > > > > > > > >> > > goes through the SocketServer. The SocketServer will >> > mute >> > > > the >> > > > > > > > channel >> > > > > > > > >> > once >> > > > > > > > >> > > a request is read and put into the request channel. >> So >> > > > > assuming >> > > > > > > > there >> > > > > > > > >> is >> > > > > > > > >> > > only one connection between controller and each >> broker, >> > on >> > > > the >> > > > > > > > broker >> > > > > > > > >> > side, >> > > > > > > > >> > > there should be only one controller request in the >> > > > controller >> > > > > > > > request >> > > > > > > > >> > queue >> > > > > > > > >> > > at any given time. If that is the case, do we need a >> > > > separate >> > > > > > > > >> controller >> > > > > > > > >> > > request queue capacity config? The default value 20 >> > means >> > > > that >> > > > > > we >> > > > > > > > >> expect >> > > > > > > > >> > > there are 20 controller switches to happen in a short >> > > period >> > > > > of >> > > > > > > > time. >> > > > > > > > >> I >> > > > > > > > >> > am >> > > > > > > > >> > > not sure whether someone should increase the >> controller >> > > > > request >> > > > > > > > queue >> > > > > > > > >> > > capacity to handle such case, as it seems indicating >> > > > something >> > > > > > > very >> > > > > > > > >> wrong >> > > > > > > > >> > > has happened. >> > > > > > > > >> > > >> > > > > > > > >> > > Thanks, >> > > > > > > > >> > > >> > > > > > > > >> > > Jiangjie (Becket) Qin >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin < >> > > > > lindon...@gmail.com> >> > > > > > > > >> wrote: >> > > > > > > > >> > > >> > > > > > > > >> > > > Thanks for the update Lucas. >> > > > > > > > >> > > > >> > > > > > > > >> > > > I think the motivation section is intuitive. It >> will >> > be >> > > > good >> > > > > > to >> > > > > > > > >> learn >> > > > > > > > >> > > more >> > > > > > > > >> > > > about the comments from other reviewers. >> > > > > > > > >> > > > >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang < >> > > > > > > > lucasatu...@gmail.com> >> > > > > > > > >> > > wrote: >> > > > > > > > >> > > > >> > > > > > > > >> > > > > Hi Dong, >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > I've updated the motivation section of the KIP by >> > > > > explaining >> > > > > > > the >> > > > > > > > >> > cases >> > > > > > > > >> > > > that >> > > > > > > > >> > > > > would have user impacts. >> > > > > > > > >> > > > > Please take a look at let me know your comments. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > Thanks, >> > > > > > > > >> > > > > Lucas >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang < >> > > > > > > > lucasatu...@gmail.com >> > > > > > > > >> > >> > > > > > > > >> > > > wrote: >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > Hi Dong, >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > The simulation of disk being slow is merely >> for me >> > > to >> > > > > > easily >> > > > > > > > >> > > construct >> > > > > > > > >> > > > a >> > > > > > > > >> > > > > > testing scenario >> > > > > > > > >> > > > > > with a backlog of produce requests. In >> production, >> > > > other >> > > > > > > than >> > > > > > > > >> the >> > > > > > > > >> > > disk >> > > > > > > > >> > > > > > being slow, a backlog of >> > > > > > > > >> > > > > > produce requests may also be caused by high >> > produce >> > > > QPS. >> > > > > > > > >> > > > > > In that case, we may not want to kill the >> broker >> > and >> > > > > > that's >> > > > > > > > when >> > > > > > > > >> > this >> > > > > > > > >> > > > KIP >> > > > > > > > >> > > > > > can be useful, both for JBOD >> > > > > > > > >> > > > > > and non-JBOD setup. >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > Going back to your previous question about each >> > > > > > > ProduceRequest >> > > > > > > > >> > > covering >> > > > > > > > >> > > > > 20 >> > > > > > > > >> > > > > > partitions that are randomly >> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request >> is >> > > > > enqueued >> > > > > > > that >> > > > > > > > >> > tries >> > > > > > > > >> > > to >> > > > > > > > >> > > > > > switch the current broker, say broker0, from >> > leader >> > > to >> > > > > > > > follower >> > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For >> the >> > > > sake >> > > > > of >> > > > > > > > >> > argument, >> > > > > > > > >> > > > > > let's also assume the other brokers, say >> broker1, >> > > have >> > > > > > > > *stopped* >> > > > > > > > >> > > > fetching >> > > > > > > > >> > > > > > from >> > > > > > > > >> > > > > > the current broker, i.e. broker0. >> > > > > > > > >> > > > > > 1. If the enqueued produce requests have acks = >> > -1 >> > > > > (ALL) >> > > > > > > > >> > > > > > 1.1 without this KIP, the ProduceRequests >> ahead >> > of >> > > > > > > > >> LeaderAndISR >> > > > > > > > >> > > will >> > > > > > > > >> > > > be >> > > > > > > > >> > > > > > put into the purgatory, >> > > > > > > > >> > > > > > and since they'll never be replicated >> to >> > > other >> > > > > > > brokers >> > > > > > > > >> > > (because >> > > > > > > > >> > > > > of >> > > > > > > > >> > > > > > the assumption made above), they will >> > > > > > > > >> > > > > > be completed either when the >> LeaderAndISR >> > > > > request >> > > > > > is >> > > > > > > > >> > > processed >> > > > > > > > >> > > > or >> > > > > > > > >> > > > > > when the timeout happens. >> > > > > > > > >> > > > > > 1.2 With this KIP, broker0 will immediately >> > > > transition >> > > > > > the >> > > > > > > > >> > > partition >> > > > > > > > >> > > > > > test-0 to become a follower, >> > > > > > > > >> > > > > > after the current broker sees the >> > > replication >> > > > of >> > > > > > the >> > > > > > > > >> > > remaining >> > > > > > > > >> > > > 19 >> > > > > > > > >> > > > > > partitions, it can send a response indicating >> that >> > > > > > > > >> > > > > > it's no longer the leader for the >> > "test-0". >> > > > > > > > >> > > > > > To see the latency difference between 1.1 and >> > 1.2, >> > > > > let's >> > > > > > > say >> > > > > > > > >> > there >> > > > > > > > >> > > > are >> > > > > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR, >> > and >> > > > > there >> > > > > > > are >> > > > > > > > 8 >> > > > > > > > >> io >> > > > > > > > >> > > > > threads, >> > > > > > > > >> > > > > > so each io thread will process approximately >> > 3000 >> > > > > > produce >> > > > > > > > >> > requests. >> > > > > > > > >> > > > Now >> > > > > > > > >> > > > > > let's investigate the io thread that finally >> > > processed >> > > > > the >> > > > > > > > >> > > > LeaderAndISR. >> > > > > > > > >> > > > > > For the 3000 produce requests, if we model >> the >> > > time >> > > > > when >> > > > > > > > their >> > > > > > > > >> > > > > remaining >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and >> > the >> > > > > > > > LeaderAndISR >> > > > > > > > >> > > > request >> > > > > > > > >> > > > > is >> > > > > > > > >> > > > > > processed at time t3000. >> > > > > > > > >> > > > > > Without this KIP, the 1st produce request >> would >> > > have >> > > > > > > waited >> > > > > > > > an >> > > > > > > > >> > > extra >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an >> extra >> > > > time >> > > > > of >> > > > > > > > >> t3000 - >> > > > > > > > >> > > t1, >> > > > > > > > >> > > > > etc. >> > > > > > > > >> > > > > > Roughly speaking, the latency difference is >> > bigger >> > > > for >> > > > > > the >> > > > > > > > >> > earlier >> > > > > > > > >> > > > > > produce requests than for the later ones. For >> the >> > > same >> > > > > > > reason, >> > > > > > > > >> the >> > > > > > > > >> > > more >> > > > > > > > >> > > > > > ProduceRequests queued >> > > > > > > > >> > > > > > before the LeaderAndISR, the bigger benefit >> we >> > get >> > > > > > (capped >> > > > > > > > by >> > > > > > > > >> the >> > > > > > > > >> > > > > > produce timeout). >> > > > > > > > >> > > > > > 2. If the enqueued produce requests have >> acks=0 or >> > > > > acks=1 >> > > > > > > > >> > > > > > There will be no latency differences in this >> > case, >> > > > but >> > > > > > > > >> > > > > > 2.1 without this KIP, the records of >> partition >> > > > test-0 >> > > > > in >> > > > > > > the >> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will >> be >> > > > > appended >> > > > > > > to >> > > > > > > > >> the >> > > > > > > > >> > > local >> > > > > > > > >> > > > > log, >> > > > > > > > >> > > > > > and eventually be truncated after >> > processing >> > > > the >> > > > > > > > >> > > LeaderAndISR. >> > > > > > > > >> > > > > > This is what's referred to as >> > > > > > > > >> > > > > > "some unofficial definition of data >> loss >> > in >> > > > > terms >> > > > > > of >> > > > > > > > >> > messages >> > > > > > > > >> > > > > > beyond the high watermark". >> > > > > > > > >> > > > > > 2.2 with this KIP, we can mitigate the effect >> > > since >> > > > if >> > > > > > the >> > > > > > > > >> > > > LeaderAndISR >> > > > > > > > >> > > > > > is immediately processed, the response to >> > producers >> > > > will >> > > > > > > have >> > > > > > > > >> > > > > > the NotLeaderForPartition error, >> causing >> > > > > producers >> > > > > > > to >> > > > > > > > >> retry >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > This explanation above is the benefit for >> reducing >> > > the >> > > > > > > latency >> > > > > > > > >> of a >> > > > > > > > >> > > > > broker >> > > > > > > > >> > > > > > becoming the follower, >> > > > > > > > >> > > > > > closely related is reducing the latency of a >> > broker >> > > > > > becoming >> > > > > > > > the >> > > > > > > > >> > > > leader. >> > > > > > > > >> > > > > > In this case, the benefit is even more >> obvious, if >> > > > other >> > > > > > > > brokers >> > > > > > > > >> > have >> > > > > > > > >> > > > > > resigned leadership, and the >> > > > > > > > >> > > > > > current broker should take leadership. Any >> delay >> > in >> > > > > > > processing >> > > > > > > > >> the >> > > > > > > > >> > > > > > LeaderAndISR will be perceived >> > > > > > > > >> > > > > > by clients as unavailability. In extreme cases, >> > this >> > > > can >> > > > > > > cause >> > > > > > > > >> > failed >> > > > > > > > >> > > > > > produce requests if the retries are >> > > > > > > > >> > > > > > exhausted. >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > Another two types of controller requests are >> > > > > > UpdateMetadata >> > > > > > > > and >> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as >> > follows: >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed processing >> > > means >> > > > > > > clients >> > > > > > > > >> > > receiving >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership >> > info >> > > > > > > > >> > > > > > for certain partitions, and the effect is more >> > > retries >> > > > > or >> > > > > > > even >> > > > > > > > >> > fatal >> > > > > > > > >> > > > > > failure if the retries are exhausted. >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > For StopReplica requests, a long queuing time >> may >> > > > > degrade >> > > > > > > the >> > > > > > > > >> > > > performance >> > > > > > > > >> > > > > > of topic deletion. >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > Regarding your last question of the delay for >> > > > > > > > >> > DescribeLogDirsRequest, >> > > > > > > > >> > > > you >> > > > > > > > >> > > > > > are right >> > > > > > > > >> > > > > > that this KIP cannot help with the latency in >> > > getting >> > > > > the >> > > > > > > log >> > > > > > > > >> dirs >> > > > > > > > >> > > > info, >> > > > > > > > >> > > > > > and it's only relevant >> > > > > > > > >> > > > > > when controller requests are involved. >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > Regards, >> > > > > > > > >> > > > > > Lucas >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin < >> > > > > > > lindon...@gmail.com >> > > > > > > > > >> > > > > > > > >> > > wrote: >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > >> Hey Jun, >> > > > > > > > >> > > > > >> >> > > > > > > > >> > > > > >> Thanks much for the comments. It is good >> point. >> > So >> > > > the >> > > > > > > > feature >> > > > > > > > >> may >> > > > > > > > >> > > be >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one question >> > > below. >> > > > > > > > >> > > > > >> >> > > > > > > > >> > > > > >> Hey Lucas, >> > > > > > > > >> > > > > >> >> > > > > > > > >> > > > > >> Do you think this feature is also useful for >> > > non-JBOD >> > > > > > setup >> > > > > > > > or >> > > > > > > > >> it >> > > > > > > > >> > is >> > > > > > > > >> > > > > only >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful to >> > > > > understand >> > > > > > > > this. >> > > > > > > > >> > > > > >> >> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in order >> to >> > > move >> > > > > > > leaders >> > > > > > > > >> on >> > > > > > > > >> > the >> > > > > > > > >> > > > > >> failed >> > > > > > > > >> > > > > >> disk to other disks, the system operator first >> > > needs >> > > > to >> > > > > > get >> > > > > > > > the >> > > > > > > > >> > list >> > > > > > > > >> > > > of >> > > > > > > > >> > > > > >> partitions on the failed disk. This is >> currently >> > > > > achieved >> > > > > > > > using >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends >> > > > > > > > >> DescribeLogDirsRequest >> > > > > > > > >> > to >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > >> broker. If we only prioritize the controller >> > > > requests, >> > > > > > then >> > > > > > > > the >> > > > > > > > >> > > > > >> DescribeLogDirsRequest >> > > > > > > > >> > > > > >> may still take a long time to be processed by >> the >> > > > > broker. >> > > > > > > So >> > > > > > > > >> the >> > > > > > > > >> > > > overall >> > > > > > > > >> > > > > >> time to move leaders away from the failed disk >> > may >> > > > > still >> > > > > > be >> > > > > > > > >> long >> > > > > > > > >> > > even >> > > > > > > > >> > > > > with >> > > > > > > > >> > > > > >> this KIP. What do you think? >> > > > > > > > >> > > > > >> >> > > > > > > > >> > > > > >> Thanks, >> > > > > > > > >> > > > > >> Dong >> > > > > > > > >> > > > > >> >> > > > > > > > >> > > > > >> >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang < >> > > > > > > > >> lucasatu...@gmail.com >> > > > > > > > >> > > >> > > > > > > > >> > > > > wrote: >> > > > > > > > >> > > > > >> >> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun. >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> > @Dong, >> > > > > > > > >> > > > > >> > Since both of the two comments in your >> previous >> > > > email >> > > > > > are >> > > > > > > > >> about >> > > > > > > > >> > > the >> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's >> useful, >> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you agree >> > that >> > > > > this >> > > > > > > KIP >> > > > > > > > >> can >> > > > > > > > >> > be >> > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun? >> > > > > > > > >> > > > > >> > Please let me know, thanks! >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> > Regards, >> > > > > > > > >> > > > > >> > Lucas >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao < >> > > > > > > j...@confluent.io> >> > > > > > > > >> > wrote: >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong, >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one >> > probably >> > > > > > should >> > > > > > > > just >> > > > > > > > >> > kill >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not >> help. >> > If >> > > > > only >> > > > > > > one >> > > > > > > > of >> > > > > > > > >> > the >> > > > > > > > >> > > > > disks >> > > > > > > > >> > > > > >> on >> > > > > > > > >> > > > > >> > a >> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail that >> > disk >> > > > and >> > > > > > move >> > > > > > > > the >> > > > > > > > >> > > > leaders >> > > > > > > > >> > > > > on >> > > > > > > > >> > > > > >> > that >> > > > > > > > >> > > > > >> > > disk to other brokers. In that case, being >> > able >> > > > to >> > > > > > > > process >> > > > > > > > >> the >> > > > > > > > >> > > > > >> > LeaderAndIsr >> > > > > > > > >> > > > > >> > > requests faster will potentially help the >> > > > producers >> > > > > > > > recover >> > > > > > > > >> > > > quicker. >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > > Thanks, >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > > Jun >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin < >> > > > > > > > >> lindon...@gmail.com >> > > > > > > > >> > > >> > > > > > > > >> > > > > wrote: >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > > > Hey Lucas, >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up >> > > questions >> > > > > > below. >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest >> covers >> > 20 >> > > > > > > > partitions >> > > > > > > > >> > that >> > > > > > > > >> > > > are >> > > > > > > > >> > > > > >> > > randomly >> > > > > > > > >> > > > > >> > > > distributed across all partitions, then >> > each >> > > > > > > > >> ProduceRequest >> > > > > > > > >> > > will >> > > > > > > > >> > > > > >> likely >> > > > > > > > >> > > > > >> > > > cover some partitions for which the >> broker >> > is >> > > > > still >> > > > > > > > >> leader >> > > > > > > > >> > > after >> > > > > > > > >> > > > > it >> > > > > > > > >> > > > > >> > > quickly >> > > > > > > > >> > > > > >> > > > processes the >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will >> still >> > > be >> > > > > slow >> > > > > > > in >> > > > > > > > >> > > > processing >> > > > > > > > >> > > > > >> these >> > > > > > > > >> > > > > >> > > > ProduceRequest and request will still be >> > very >> > > > > high >> > > > > > > with >> > > > > > > > >> this >> > > > > > > > >> > > > KIP. >> > > > > > > > >> > > > > It >> > > > > > > > >> > > > > >> > > seems >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will still >> timeout >> > > > after >> > > > > > 30 >> > > > > > > > >> > seconds. >> > > > > > > > >> > > Is >> > > > > > > > >> > > > > >> this >> > > > > > > > >> > > > > >> > > > understanding correct? >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will >> > > still >> > > > > > > timeout >> > > > > > > > >> after >> > > > > > > > >> > > 30 >> > > > > > > > >> > > > > >> > seconds, >> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP >> reduces >> > > > > average >> > > > > > > > >> produce >> > > > > > > > >> > > > > latency. >> > > > > > > > >> > > > > >> Can >> > > > > > > > >> > > > > >> > > you >> > > > > > > > >> > > > > >> > > > clarify what metrics can be improved by >> > this >> > > > KIP? >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > Not sure why system operator directly >> cares >> > > > > number >> > > > > > of >> > > > > > > > >> > > truncated >> > > > > > > > >> > > > > >> > messages. >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve average >> > > > > throughput >> > > > > > > or >> > > > > > > > >> > reduce >> > > > > > > > >> > > > > >> message >> > > > > > > > >> > > > > >> > > > duplication? It will be good to >> understand >> > > > this. >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > Thanks, >> > > > > > > > >> > > > > >> > > > Dong >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas >> Wang < >> > > > > > > > >> > > lucasatu...@gmail.com >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > wrote: >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > > Hi Dong, >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments. >> Please >> > > see >> > > > > my >> > > > > > > > reply >> > > > > > > > >> > > below. >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 >> > partition. >> > > > Now >> > > > > > > let's >> > > > > > > > >> > > consider >> > > > > > > > >> > > > a >> > > > > > > > >> > > > > >> more >> > > > > > > > >> > > > > >> > > > common >> > > > > > > > >> > > > > >> > > > > scenario >> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of many >> > > > partitions. >> > > > > > And >> > > > > > > > >> let's >> > > > > > > > >> > > say >> > > > > > > > >> > > > > for >> > > > > > > > >> > > > > >> > some >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow. >> > > > > > > > >> > > > > >> > > > > The number of leader partitions on >> > broker0 >> > > is >> > > > > so >> > > > > > > > large, >> > > > > > > > >> > say >> > > > > > > > >> > > > 10K, >> > > > > > > > >> > > > > >> that >> > > > > > > > >> > > > > >> > > the >> > > > > > > > >> > > > > >> > > > > cluster is skewed, >> > > > > > > > >> > > > > >> > > > > and the operator would like to shift >> the >> > > > > > leadership >> > > > > > > > >> for a >> > > > > > > > >> > > lot >> > > > > > > > >> > > > of >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers, >> > > > > > > > >> > > > > >> > > > > either manually or through some >> service >> > > like >> > > > > > cruise >> > > > > > > > >> > control. >> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the >> > leadership >> > > > > > > > transitions >> > > > > > > > >> > > finish >> > > > > > > > >> > > > > >> more >> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself >> > > becoming >> > > > > more >> > > > > > > > >> > balanced, >> > > > > > > > >> > > > > >> > > > > but all existing producers >> corresponding >> > to >> > > > the >> > > > > > 9K >> > > > > > > > >> > > partitions >> > > > > > > > >> > > > > will >> > > > > > > > >> > > > > >> > get >> > > > > > > > >> > > > > >> > > > the >> > > > > > > > >> > > > > >> > > > > errors relatively quickly >> > > > > > > > >> > > > > >> > > > > rather than relying on their timeout, >> > > thanks >> > > > to >> > > > > > the >> > > > > > > > >> > batched >> > > > > > > > >> > > > > async >> > > > > > > > >> > > > > >> ZK >> > > > > > > > >> > > > > >> > > > > operations. >> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to have >> > during >> > > > such >> > > > > > > > >> > troublesome >> > > > > > > > >> > > > > times. >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc >> have >> > > > shown >> > > > > > > that >> > > > > > > > >> with >> > > > > > > > >> > > this >> > > > > > > > >> > > > > KIP >> > > > > > > > >> > > > > >> > many >> > > > > > > > >> > > > > >> > > > > producers >> > > > > > > > >> > > > > >> > > > > receive an explicit error >> > > > > NotLeaderForPartition, >> > > > > > > > based >> > > > > > > > >> on >> > > > > > > > >> > > > which >> > > > > > > > >> > > > > >> they >> > > > > > > > >> > > > > >> > > > retry >> > > > > > > > >> > > > > >> > > > > immediately. >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14 >> seconds+quick >> > > > retry) >> > > > > > for >> > > > > > > > >> their >> > > > > > > > >> > > > single >> > > > > > > > >> > > > > >> > > message >> > > > > > > > >> > > > > >> > > > is >> > > > > > > > >> > > > > >> > > > > much smaller >> > > > > > > > >> > > > > >> > > > > compared with the case of timing out >> > > without >> > > > > the >> > > > > > > KIP >> > > > > > > > >> (30 >> > > > > > > > >> > > > seconds >> > > > > > > > >> > > > > >> for >> > > > > > > > >> > > > > >> > > > timing >> > > > > > > > >> > > > > >> > > > > out + quick retry). >> > > > > > > > >> > > > > >> > > > > One might argue that reducing the >> timing >> > > out >> > > > on >> > > > > > the >> > > > > > > > >> > producer >> > > > > > > > >> > > > > side >> > > > > > > > >> > > > > >> can >> > > > > > > > >> > > > > >> > > > > achieve the same result, >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own >> > > > > > drawbacks[1]. >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show >> the >> > > > > number >> > > > > > of >> > > > > > > > >> > > truncated >> > > > > > > > >> > > > > >> > messages >> > > > > > > > >> > > > > >> > > on >> > > > > > > > >> > > > > >> > > > > brokers, >> > > > > > > > >> > > > > >> > > > > with the experiments done in the >> Google >> > > Doc, >> > > > it >> > > > > > > > should >> > > > > > > > >> be >> > > > > > > > >> > > easy >> > > > > > > > >> > > > > to >> > > > > > > > >> > > > > >> see >> > > > > > > > >> > > > > >> > > > that >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the >> > > > up-to-date >> > > > > > > > >> metadata >> > > > > > > > >> > > > avoids >> > > > > > > > >> > > > > >> > > appending >> > > > > > > > >> > > > > >> > > > > of messages >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we >> > talk >> > > > to a >> > > > > > > > system >> > > > > > > > >> > > > operator >> > > > > > > > >> > > > > >> and >> > > > > > > > >> > > > > >> > ask >> > > > > > > > >> > > > > >> > > > > whether >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet >> > most >> > > > > likely >> > > > > > > the >> > > > > > > > >> > answer >> > > > > > > > >> > > > is >> > > > > > > > >> > > > > >> yes. >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I think it >> > > might >> > > > be >> > > > > > > > >> helpful to >> > > > > > > > >> > > > > >> construct >> > > > > > > > >> > > > > >> > > some >> > > > > > > > >> > > > > >> > > > > formulas. >> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going >> back >> > to >> > > > the >> > > > > > > case >> > > > > > > > >> where >> > > > > > > > >> > > > there >> > > > > > > > >> > > > > >> is >> > > > > > > > >> > > > > >> > > only >> > > > > > > > >> > > > > >> > > > > ONE partition involved. >> > > > > > > > >> > > > > >> > > > > Following the experiments in the >> Google >> > > Doc, >> > > > > > let's >> > > > > > > > say >> > > > > > > > >> > > broker0 >> > > > > > > > >> > > > > >> > becomes >> > > > > > > > >> > > > > >> > > > the >> > > > > > > > >> > > > > >> > > > > follower at time t0, >> > > > > > > > >> > > > > >> > > > > and after t0 there were still N >> produce >> > > > > requests >> > > > > > in >> > > > > > > > its >> > > > > > > > >> > > > request >> > > > > > > > >> > > > > >> > queue. >> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought >> by >> > > this >> > > > > KIP, >> > > > > > > > >> broker0 >> > > > > > > > >> > > can >> > > > > > > > >> > > > > >> reply >> > > > > > > > >> > > > > >> > > with >> > > > > > > > >> > > > > >> > > > an >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception, >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average >> > > processing >> > > > > > time >> > > > > > > of >> > > > > > > > >> > > replying >> > > > > > > > >> > > > > >> with >> > > > > > > > >> > > > > >> > > such >> > > > > > > > >> > > > > >> > > > an >> > > > > > > > >> > > > > >> > > > > error message. >> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will >> need to >> > > > > append >> > > > > > > > >> messages >> > > > > > > > >> > to >> > > > > > > > >> > > > > >> > segments, >> > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk, >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average >> > > processing >> > > > > > time >> > > > > > > > for >> > > > > > > > >> > such >> > > > > > > > >> > > > > logic. >> > > > > > > > >> > > > > >> > > > > Then the average extra latency >> incurred >> > > > without >> > > > > > > this >> > > > > > > > >> KIP >> > > > > > > > >> > is >> > > > > > > > >> > > N >> > > > > > > > >> > > > * >> > > > > > > > >> > > > > >> (M2 - >> > > > > > > > >> > > > > >> > > > M1) / >> > > > > > > > >> > > > > >> > > > > 2. >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be >> larger >> > > than >> > > > > M1, >> > > > > > > > which >> > > > > > > > >> > means >> > > > > > > > >> > > > as >> > > > > > > > >> > > > > >> long >> > > > > > > > >> > > > > >> > > as N >> > > > > > > > >> > > > > >> > > > > is positive, >> > > > > > > > >> > > > > >> > > > > we would see improvements on the >> average >> > > > > latency. >> > > > > > > > >> > > > > >> > > > > There does not need to be significant >> > > backlog >> > > > > of >> > > > > > > > >> requests >> > > > > > > > >> > in >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > >> > > request >> > > > > > > > >> > > > > >> > > > > queue, >> > > > > > > > >> > > > > >> > > > > or severe degradation of disk >> performance >> > > to >> > > > > have >> > > > > > > the >> > > > > > > > >> > > > > improvement. >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > Regards, >> > > > > > > > >> > > > > >> > > > > Lucas >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the >> timeout on >> > > the >> > > > > > > > producer >> > > > > > > > >> > side >> > > > > > > > >> > > > can >> > > > > > > > >> > > > > >> > trigger >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests >> > > > > > > > >> > > > > >> > > > > when the corresponding leader broker >> is >> > > > > > overloaded, >> > > > > > > > >> > > > exacerbating >> > > > > > > > >> > > > > >> the >> > > > > > > > >> > > > > >> > > > > situation. >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong >> Lin >> > < >> > > > > > > > >> > > lindon...@gmail.com >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > wrote: >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > Hey Lucas, >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed >> > > documentation >> > > > of >> > > > > > the >> > > > > > > > >> > > > experiment. >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > Initially I also think having a >> > separate >> > > > > queue >> > > > > > > for >> > > > > > > > >> > > > controller >> > > > > > > > >> > > > > >> > > requests >> > > > > > > > >> > > > > >> > > > is >> > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned in >> the >> > > > > summary >> > > > > > > > >> section >> > > > > > > > >> > of >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > >> > Google >> > > > > > > > >> > > > > >> > > > > doc, >> > > > > > > > >> > > > > >> > > > > > controller requests are generally >> more >> > > > > > important >> > > > > > > > than >> > > > > > > > >> > data >> > > > > > > > >> > > > > >> requests >> > > > > > > > >> > > > > >> > > and >> > > > > > > > >> > > > > >> > > > > we >> > > > > > > > >> > > > > >> > > > > > probably want controller requests >> to be >> > > > > > processed >> > > > > > > > >> > sooner. >> > > > > > > > >> > > > But >> > > > > > > > >> > > > > >> then >> > > > > > > > >> > > > > >> > > Eno >> > > > > > > > >> > > > > >> > > > > has >> > > > > > > > >> > > > > >> > > > > > two very good questions which I am >> not >> > > sure >> > > > > the >> > > > > > > > >> Google >> > > > > > > > >> > doc >> > > > > > > > >> > > > has >> > > > > > > > >> > > > > >> > > answered >> > > > > > > > >> > > > > >> > > > > > explicitly. Could you help with the >> > > > following >> > > > > > > > >> questions? >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the >> > > actual >> > > > > > > benefit >> > > > > > > > of >> > > > > > > > >> > > > KIP-291 >> > > > > > > > >> > > > > to >> > > > > > > > >> > > > > >> > > users. >> > > > > > > > >> > > > > >> > > > > The >> > > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc >> > > > simulates >> > > > > > the >> > > > > > > > >> > scenario >> > > > > > > > >> > > > that >> > > > > > > > >> > > > > >> > broker >> > > > > > > > >> > > > > >> > > > is >> > > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest >> due >> > to >> > > > e.g. >> > > > > > > slow >> > > > > > > > >> disk. >> > > > > > > > >> > > It >> > > > > > > > >> > > > > >> > currently >> > > > > > > > >> > > > > >> > > > > > assumes that there is only 1 >> partition. >> > > But >> > > > > in >> > > > > > > the >> > > > > > > > >> > common >> > > > > > > > >> > > > > >> scenario, >> > > > > > > > >> > > > > >> > > it >> > > > > > > > >> > > > > >> > > > is >> > > > > > > > >> > > > > >> > > > > > probably reasonable to assume that >> > there >> > > > are >> > > > > > many >> > > > > > > > >> other >> > > > > > > > >> > > > > >> partitions >> > > > > > > > >> > > > > >> > > that >> > > > > > > > >> > > > > >> > > > > are >> > > > > > > > >> > > > > >> > > > > > also actively produced to and >> > > > ProduceRequest >> > > > > to >> > > > > > > > these >> > > > > > > > >> > > > > partition >> > > > > > > > >> > > > > >> > also >> > > > > > > > >> > > > > >> > > > > takes >> > > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So >> even >> > > if >> > > > > > > broker0 >> > > > > > > > >> can >> > > > > > > > >> > > > become >> > > > > > > > >> > > > > >> > > follower >> > > > > > > > >> > > > > >> > > > > for >> > > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably >> still >> > > > needs >> > > > > > to >> > > > > > > > >> process >> > > > > > > > >> > > the >> > > > > > > > >> > > > > >> > > > > ProduceRequest >> > > > > > > > >> > > > > >> > > > > > slowly t in the queue because these >> > > > > > > ProduceRequests >> > > > > > > > >> > cover >> > > > > > > > >> > > > > other >> > > > > > > > >> > > > > >> > > > > partitions. >> > > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still >> > > timeout >> > > > > > after >> > > > > > > > 30 >> > > > > > > > >> > > seconds >> > > > > > > > >> > > > > and >> > > > > > > > >> > > > > >> > most >> > > > > > > > >> > > > > >> > > > > > clients will still likely timeout >> after >> > > 30 >> > > > > > > seconds. >> > > > > > > > >> Then >> > > > > > > > >> > > it >> > > > > > > > >> > > > is >> > > > > > > > >> > > > > >> not >> > > > > > > > >> > > > > >> > > > > > obviously what is the benefit to >> client >> > > > since >> > > > > > > > client >> > > > > > > > >> > will >> > > > > > > > >> > > > > >> timeout >> > > > > > > > >> > > > > >> > > after >> > > > > > > > >> > > > > >> > > > > 30 >> > > > > > > > >> > > > > >> > > > > > seconds before possibly >> re-connecting >> > to >> > > > > > broker1, >> > > > > > > > >> with >> > > > > > > > >> > or >> > > > > > > > >> > > > > >> without >> > > > > > > > >> > > > > >> > > > > KIP-291. >> > > > > > > > >> > > > > >> > > > > > Did I miss something here? >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the >> > > specific >> > > > > > > > benefits >> > > > > > > > >> of >> > > > > > > > >> > > this >> > > > > > > > >> > > > > >> KIP to >> > > > > > > > >> > > > > >> > > > user >> > > > > > > > >> > > > > >> > > > > or >> > > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether >> this >> > > KIP >> > > > > > > > decreases >> > > > > > > > >> > > > average >> > > > > > > > >> > > > > >> > > latency, >> > > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably >> of >> > > > > exception >> > > > > > > > >> exposed >> > > > > > > > >> > to >> > > > > > > > >> > > > > >> client >> > > > > > > > >> > > > > >> > > etc. >> > > > > > > > >> > > > > >> > > > It >> > > > > > > > >> > > > > >> > > > > > is probably useful to clarify this. >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user >> > > > experience >> > > > > > > only >> > > > > > > > >> when >> > > > > > > > >> > > > there >> > > > > > > > >> > > > > is >> > > > > > > > >> > > > > >> > > issue >> > > > > > > > >> > > > > >> > > > > with >> > > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in >> the >> > > > > request >> > > > > > > > queue >> > > > > > > > >> > due >> > > > > > > > >> > > to >> > > > > > > > >> > > > > >> slow >> > > > > > > > >> > > > > >> > > disk >> > > > > > > > >> > > > > >> > > > as >> > > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is >> this >> > > KIP >> > > > > > also >> > > > > > > > >> useful >> > > > > > > > >> > > when >> > > > > > > > >> > > > > >> there >> > > > > > > > >> > > > > >> > is >> > > > > > > > >> > > > > >> > > > no >> > > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It >> might >> > be >> > > > > > helpful >> > > > > > > > to >> > > > > > > > >> > > clarify >> > > > > > > > >> > > > > >> this >> > > > > > > > >> > > > > >> > to >> > > > > > > > >> > > > > >> > > > > > understand the benefit of this KIP. >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > Thanks much, >> > > > > > > > >> > > > > >> > > > > > Dong >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, >> Lucas >> > > > Wang < >> > > > > > > > >> > > > > >> lucasatu...@gmail.com >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > > > > wrote: >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > > Hi Eno, >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the >> > > > > experiment >> > > > > > > > >> results. >> > > > > > > > >> > > > > >> > > > > > > Here is a link to the positive >> impact >> > > > > > achieved >> > > > > > > by >> > > > > > > > >> > > > > implementing >> > > > > > > > >> > > > > >> > the >> > > > > > > > >> > > > > >> > > > > > proposed >> > > > > > > > >> > > > > >> > > > > > > change: >> > > > > > > > >> > > > > >> > > > > > > >> https://docs.google.com/document/d/ >> > > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW >> > > > > > > > >> > > > > >> > > > > > > >> FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing >> > > > > > > > >> > > > > >> > > > > > > Please take a look when you have >> time >> > > and >> > > > > let >> > > > > > > me >> > > > > > > > >> know >> > > > > > > > >> > > your >> > > > > > > > >> > > > > >> > > feedback. >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > Regards, >> > > > > > > > >> > > > > >> > > > > > > Lucas >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, >> > > Harsha < >> > > > > > > > >> > > ka...@harsha.io> >> > > > > > > > >> > > > > >> wrote: >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will >> take a >> > > > look >> > > > > > > might >> > > > > > > > >> suit >> > > > > > > > >> > > our >> > > > > > > > >> > > > > >> > > > requirements >> > > > > > > > >> > > > > >> > > > > > > > better. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > Thanks, >> > > > > > > > >> > > > > >> > > > > > > > Harsha >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 >> PM, >> > > > Lucas >> > > > > > > Wang < >> > > > > > > > >> > > > > >> > > > lucasatu...@gmail.com >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > > > wrote: >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha, >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the >> > > > > > replication >> > > > > > > > >> quota >> > > > > > > > >> > > > > mechanism >> > > > > > > > >> > > > > >> > > > proposed >> > > > > > > > >> > > > > >> > > > > > in >> > > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that >> > > > scenario. >> > > > > > > > >> > > > > >> > > > > > > > > Have you tried it out? >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > Thanks, >> > > > > > > > >> > > > > >> > > > > > > > > Lucas >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 > -- -Regards, Mayuresh R. Gharat (862) 250-7125