@Dong, Great example and explanation, thanks! @All Regarding the example given by Dong, it seems even if we use a queue, and a dedicated controller request handling thread, the same result can still happen because R1_a will be sent on one connection, and R1_b & R2 will be sent on a different connection, and there is no ordering between different connections on the broker side. I was discussing with Mayuresh offline, and it seems correlation id within the same NetworkClient object is monotonically increasing and never reset, hence a broker can leverage that to properly reject obsolete requests. Thoughts?
Thanks, Lucas On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat < gharatmayures...@gmail.com> wrote: > Actually nvm, correlationId is reset in case of connection loss, I think. > > Thanks, > > Mayuresh > > On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat < > gharatmayures...@gmail.com> > wrote: > > > I agree with Dong that out-of-order processing can happen with having 2 > > separate queues as well and it can even happen today. > > Can we use the correlationId in the request from the controller to the > > broker to handle ordering ? > > > > Thanks, > > > > Mayuresh > > > > > > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <becket....@gmail.com> wrote: > > > >> Good point, Joel. I agree that a dedicated controller request handling > >> thread would be a better isolation. It also solves the reordering issue. > >> > >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jjkosh...@gmail.com> > wrote: > >> > >> > Good example. I think this scenario can occur in the current code as > >> well > >> > but with even lower probability given that there are other > >> non-controller > >> > requests interleaved. It is still sketchy though and I think a safer > >> > approach would be separate queues and pinning controller request > >> handling > >> > to one handler thread. > >> > > >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <lindon...@gmail.com> > wrote: > >> > > >> > > Hey Becket, > >> > > > >> > > I think you are right that there may be out-of-order processing. > >> However, > >> > > it seems that out-of-order processing may also happen even if we > use a > >> > > separate queue. > >> > > > >> > > Here is the example: > >> > > > >> > > - Controller sends R1 and got disconnected before receiving > response. > >> > Then > >> > > it reconnects and sends R2. Both requests now stay in the controller > >> > > request queue in the order they are sent. > >> > > - thread1 takes R1_a from the request queue and then thread2 takes > R2 > >> > from > >> > > the request queue almost at the same time. > >> > > - So R1_a and R2 are processed in parallel. There is chance that > R2's > >> > > processing is completed before R1. > >> > > > >> > > If out-of-order processing can happen for both approaches with very > >> low > >> > > probability, it may not be worthwhile to add the extra queue. What > do > >> you > >> > > think? > >> > > > >> > > Thanks, > >> > > Dong > >> > > > >> > > > >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <becket....@gmail.com> > >> > wrote: > >> > > > >> > > > Hi Mayuresh/Joel, > >> > > > > >> > > > Using the request channel as a dequeue was bright up some time ago > >> when > >> > > we > >> > > > initially thinking of prioritizing the request. The concern was > that > >> > the > >> > > > controller requests are supposed to be processed in order. If we > can > >> > > ensure > >> > > > that there is one controller request in the request channel, the > >> order > >> > is > >> > > > not a concern. But in cases that there are more than one > controller > >> > > request > >> > > > inserted into the queue, the controller request order may change > and > >> > > cause > >> > > > problem. For example, think about the following sequence: > >> > > > 1. Controller successfully sent a request R1 to broker > >> > > > 2. Broker receives R1 and put the request to the head of the > request > >> > > queue. > >> > > > 3. Controller to broker connection failed and the controller > >> > reconnected > >> > > to > >> > > > the broker. > >> > > > 4. Controller sends a request R2 to the broker > >> > > > 5. Broker receives R2 and add it to the head of the request queue. > >> > > > Now on the broker side, R2 will be processed before R1 is > processed, > >> > > which > >> > > > may cause problem. > >> > > > > >> > > > Thanks, > >> > > > > >> > > > Jiangjie (Becket) Qin > >> > > > > >> > > > > >> > > > > >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jjkosh...@gmail.com> > >> > wrote: > >> > > > > >> > > > > @Mayuresh - I like your idea. It appears to be a simpler less > >> > invasive > >> > > > > alternative and it should work. Jun/Becket/others, do you see > any > >> > > > pitfalls > >> > > > > with this approach? > >> > > > > > >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang < > >> lucasatu...@gmail.com> > >> > > > > wrote: > >> > > > > > >> > > > > > @Mayuresh, > >> > > > > > That's a very interesting idea that I haven't thought before. > >> > > > > > It seems to solve our problem at hand pretty well, and also > >> > > > > > avoids the need to have a new size metric and capacity config > >> > > > > > for the controller request queue. In fact, if we were to adopt > >> > > > > > this design, there is no public interface change, and we > >> > > > > > probably don't need a KIP. > >> > > > > > Also implementation wise, it seems > >> > > > > > the java class LinkedBlockingQueue can readily satisfy the > >> > > requirement > >> > > > > > by supporting a capacity, and also allowing inserting at both > >> ends. > >> > > > > > > >> > > > > > My only concern is that this design is tied to the coincidence > >> that > >> > > > > > we have two request priorities and there are two ends to a > >> deque. > >> > > > > > Hence by using the proposed design, it seems the network layer > >> is > >> > > > > > more tightly coupled with upper layer logic, e.g. if we were > to > >> add > >> > > > > > an extra priority level in the future for some reason, we > would > >> > > > probably > >> > > > > > need to go back to the design of separate queues, one for each > >> > > priority > >> > > > > > level. > >> > > > > > > >> > > > > > In summary, I'm ok with both designs and lean toward your > >> suggested > >> > > > > > approach. > >> > > > > > Let's hear what others think. > >> > > > > > > >> > > > > > @Becket, > >> > > > > > In light of Mayuresh's suggested new design, I'm answering > your > >> > > > question > >> > > > > > only in the context > >> > > > > > of the current KIP design: I think your suggestion makes > sense, > >> and > >> > > I'm > >> > > > > ok > >> > > > > > with removing the capacity config and > >> > > > > > just relying on the default value of 20 being sufficient > enough. > >> > > > > > > >> > > > > > Thanks, > >> > > > > > Lucas > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat < > >> > > > > > gharatmayures...@gmail.com > >> > > > > > > wrote: > >> > > > > > > >> > > > > > > Hi Lucas, > >> > > > > > > > >> > > > > > > Seems like the main intent here is to prioritize the > >> controller > >> > > > request > >> > > > > > > over any other requests. > >> > > > > > > In that case, we can change the request queue to a dequeue, > >> where > >> > > you > >> > > > > > > always insert the normal requests (produce, consume,..etc) > to > >> the > >> > > end > >> > > > > of > >> > > > > > > the dequeue, but if its a controller request, you insert it > to > >> > the > >> > > > head > >> > > > > > of > >> > > > > > > the queue. This ensures that the controller request will be > >> given > >> > > > > higher > >> > > > > > > priority over other requests. > >> > > > > > > > >> > > > > > > Also since we only read one request from the socket and mute > >> it > >> > and > >> > > > > only > >> > > > > > > unmute it after handling the request, this would ensure that > >> we > >> > > don't > >> > > > > > > handle controller requests out of order. > >> > > > > > > > >> > > > > > > With this approach we can avoid the second queue and the > >> > additional > >> > > > > > config > >> > > > > > > for the size of the queue. > >> > > > > > > > >> > > > > > > What do you think ? > >> > > > > > > > >> > > > > > > Thanks, > >> > > > > > > > >> > > > > > > Mayuresh > >> > > > > > > > >> > > > > > > > >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin < > >> becket....@gmail.com > >> > > > >> > > > > wrote: > >> > > > > > > > >> > > > > > > > Hey Joel, > >> > > > > > > > > >> > > > > > > > Thank for the detail explanation. I agree the current > design > >> > > makes > >> > > > > > sense. > >> > > > > > > > My confusion is about whether the new config for the > >> controller > >> > > > queue > >> > > > > > > > capacity is necessary. I cannot think of a case in which > >> users > >> > > > would > >> > > > > > > change > >> > > > > > > > it. > >> > > > > > > > > >> > > > > > > > Thanks, > >> > > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin > >> > > > > > > > > >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin < > >> > > becket....@gmail.com> > >> > > > > > > wrote: > >> > > > > > > > > >> > > > > > > > > Hi Lucas, > >> > > > > > > > > > >> > > > > > > > > I guess my question can be rephrased to "do we expect > >> user to > >> > > > ever > >> > > > > > > change > >> > > > > > > > > the controller request queue capacity"? If we agree that > >> 20 > >> > is > >> > > > > > already > >> > > > > > > a > >> > > > > > > > > very generous default number and we do not expect user > to > >> > > change > >> > > > > it, > >> > > > > > is > >> > > > > > > > it > >> > > > > > > > > still necessary to expose this as a config? > >> > > > > > > > > > >> > > > > > > > > Thanks, > >> > > > > > > > > > >> > > > > > > > > Jiangjie (Becket) Qin > >> > > > > > > > > > >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang < > >> > > > lucasatu...@gmail.com > >> > > > > > > >> > > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > >> @Becket > >> > > > > > > > >> 1. Thanks for the comment. You are right that normally > >> there > >> > > > > should > >> > > > > > be > >> > > > > > > > >> just > >> > > > > > > > >> one controller request because of muting, > >> > > > > > > > >> and I had NOT intended to say there would be many > >> enqueued > >> > > > > > controller > >> > > > > > > > >> requests. > >> > > > > > > > >> I went through the KIP again, and I'm not sure which > part > >> > > > conveys > >> > > > > > that > >> > > > > > > > >> info. > >> > > > > > > > >> I'd be happy to revise if you point it out the section. > >> > > > > > > > >> > >> > > > > > > > >> 2. Though it should not happen in normal conditions, > the > >> > > current > >> > > > > > > design > >> > > > > > > > >> does not preclude multiple controllers running > >> > > > > > > > >> at the same time, hence if we don't have the controller > >> > queue > >> > > > > > capacity > >> > > > > > > > >> config and simply make its capacity to be 1, > >> > > > > > > > >> network threads handling requests from different > >> controllers > >> > > > will > >> > > > > be > >> > > > > > > > >> blocked during those troublesome times, > >> > > > > > > > >> which is probably not what we want. On the other hand, > >> > adding > >> > > > the > >> > > > > > > extra > >> > > > > > > > >> config with a default value, say 20, guards us from > >> issues > >> > in > >> > > > > those > >> > > > > > > > >> troublesome times, and IMO there isn't much downside of > >> > adding > >> > > > the > >> > > > > > > extra > >> > > > > > > > >> config. > >> > > > > > > > >> > >> > > > > > > > >> @Mayuresh > >> > > > > > > > >> Good catch, this sentence is an obsolete statement > based > >> on > >> > a > >> > > > > > previous > >> > > > > > > > >> design. I've revised the wording in the KIP. > >> > > > > > > > >> > >> > > > > > > > >> Thanks, > >> > > > > > > > >> Lucas > >> > > > > > > > >> > >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat < > >> > > > > > > > >> gharatmayures...@gmail.com> wrote: > >> > > > > > > > >> > >> > > > > > > > >> > Hi Lucas, > >> > > > > > > > >> > > >> > > > > > > > >> > Thanks for the KIP. > >> > > > > > > > >> > I am trying to understand why you think "The memory > >> > > > consumption > >> > > > > > can > >> > > > > > > > rise > >> > > > > > > > >> > given the total number of queued requests can go up > to > >> 2x" > >> > > in > >> > > > > the > >> > > > > > > > impact > >> > > > > > > > >> > section. Normally the requests from controller to a > >> Broker > >> > > are > >> > > > > not > >> > > > > > > > high > >> > > > > > > > >> > volume, right ? > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > Thanks, > >> > > > > > > > >> > > >> > > > > > > > >> > Mayuresh > >> > > > > > > > >> > > >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin < > >> > > > > becket....@gmail.com> > >> > > > > > > > >> wrote: > >> > > > > > > > >> > > >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control > >> plane > >> > > from > >> > > > > the > >> > > > > > > > data > >> > > > > > > > >> > plane > >> > > > > > > > >> > > makes a lot of sense. > >> > > > > > > > >> > > > >> > > > > > > > >> > > In the KIP you mentioned that the controller > request > >> > queue > >> > > > may > >> > > > > > > have > >> > > > > > > > >> many > >> > > > > > > > >> > > requests in it. Will this be a common case? The > >> > controller > >> > > > > > > requests > >> > > > > > > > >> still > >> > > > > > > > >> > > goes through the SocketServer. The SocketServer > will > >> > mute > >> > > > the > >> > > > > > > > channel > >> > > > > > > > >> > once > >> > > > > > > > >> > > a request is read and put into the request channel. > >> So > >> > > > > assuming > >> > > > > > > > there > >> > > > > > > > >> is > >> > > > > > > > >> > > only one connection between controller and each > >> broker, > >> > on > >> > > > the > >> > > > > > > > broker > >> > > > > > > > >> > side, > >> > > > > > > > >> > > there should be only one controller request in the > >> > > > controller > >> > > > > > > > request > >> > > > > > > > >> > queue > >> > > > > > > > >> > > at any given time. If that is the case, do we need > a > >> > > > separate > >> > > > > > > > >> controller > >> > > > > > > > >> > > request queue capacity config? The default value 20 > >> > means > >> > > > that > >> > > > > > we > >> > > > > > > > >> expect > >> > > > > > > > >> > > there are 20 controller switches to happen in a > short > >> > > period > >> > > > > of > >> > > > > > > > time. > >> > > > > > > > >> I > >> > > > > > > > >> > am > >> > > > > > > > >> > > not sure whether someone should increase the > >> controller > >> > > > > request > >> > > > > > > > queue > >> > > > > > > > >> > > capacity to handle such case, as it seems > indicating > >> > > > something > >> > > > > > > very > >> > > > > > > > >> wrong > >> > > > > > > > >> > > has happened. > >> > > > > > > > >> > > > >> > > > > > > > >> > > Thanks, > >> > > > > > > > >> > > > >> > > > > > > > >> > > Jiangjie (Becket) Qin > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin < > >> > > > > lindon...@gmail.com> > >> > > > > > > > >> wrote: > >> > > > > > > > >> > > > >> > > > > > > > >> > > > Thanks for the update Lucas. > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > I think the motivation section is intuitive. It > >> will > >> > be > >> > > > good > >> > > > > > to > >> > > > > > > > >> learn > >> > > > > > > > >> > > more > >> > > > > > > > >> > > > about the comments from other reviewers. > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang < > >> > > > > > > > lucasatu...@gmail.com> > >> > > > > > > > >> > > wrote: > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > Hi Dong, > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > I've updated the motivation section of the KIP > by > >> > > > > explaining > >> > > > > > > the > >> > > > > > > > >> > cases > >> > > > > > > > >> > > > that > >> > > > > > > > >> > > > > would have user impacts. > >> > > > > > > > >> > > > > Please take a look at let me know your > comments. > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > Thanks, > >> > > > > > > > >> > > > > Lucas > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang < > >> > > > > > > > lucasatu...@gmail.com > >> > > > > > > > >> > > >> > > > > > > > >> > > > wrote: > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > Hi Dong, > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > The simulation of disk being slow is merely > >> for me > >> > > to > >> > > > > > easily > >> > > > > > > > >> > > construct > >> > > > > > > > >> > > > a > >> > > > > > > > >> > > > > > testing scenario > >> > > > > > > > >> > > > > > with a backlog of produce requests. In > >> production, > >> > > > other > >> > > > > > > than > >> > > > > > > > >> the > >> > > > > > > > >> > > disk > >> > > > > > > > >> > > > > > being slow, a backlog of > >> > > > > > > > >> > > > > > produce requests may also be caused by high > >> > produce > >> > > > QPS. > >> > > > > > > > >> > > > > > In that case, we may not want to kill the > >> broker > >> > and > >> > > > > > that's > >> > > > > > > > when > >> > > > > > > > >> > this > >> > > > > > > > >> > > > KIP > >> > > > > > > > >> > > > > > can be useful, both for JBOD > >> > > > > > > > >> > > > > > and non-JBOD setup. > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > Going back to your previous question about > each > >> > > > > > > ProduceRequest > >> > > > > > > > >> > > covering > >> > > > > > > > >> > > > > 20 > >> > > > > > > > >> > > > > > partitions that are randomly > >> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request > >> is > >> > > > > enqueued > >> > > > > > > that > >> > > > > > > > >> > tries > >> > > > > > > > >> > > to > >> > > > > > > > >> > > > > > switch the current broker, say broker0, from > >> > leader > >> > > to > >> > > > > > > > follower > >> > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. > For > >> the > >> > > > sake > >> > > > > of > >> > > > > > > > >> > argument, > >> > > > > > > > >> > > > > > let's also assume the other brokers, say > >> broker1, > >> > > have > >> > > > > > > > *stopped* > >> > > > > > > > >> > > > fetching > >> > > > > > > > >> > > > > > from > >> > > > > > > > >> > > > > > the current broker, i.e. broker0. > >> > > > > > > > >> > > > > > 1. If the enqueued produce requests have > acks = > >> > -1 > >> > > > > (ALL) > >> > > > > > > > >> > > > > > 1.1 without this KIP, the ProduceRequests > >> ahead > >> > of > >> > > > > > > > >> LeaderAndISR > >> > > > > > > > >> > > will > >> > > > > > > > >> > > > be > >> > > > > > > > >> > > > > > put into the purgatory, > >> > > > > > > > >> > > > > > and since they'll never be replicated > >> to > >> > > other > >> > > > > > > brokers > >> > > > > > > > >> > > (because > >> > > > > > > > >> > > > > of > >> > > > > > > > >> > > > > > the assumption made above), they will > >> > > > > > > > >> > > > > > be completed either when the > >> LeaderAndISR > >> > > > > request > >> > > > > > is > >> > > > > > > > >> > > processed > >> > > > > > > > >> > > > or > >> > > > > > > > >> > > > > > when the timeout happens. > >> > > > > > > > >> > > > > > 1.2 With this KIP, broker0 will immediately > >> > > > transition > >> > > > > > the > >> > > > > > > > >> > > partition > >> > > > > > > > >> > > > > > test-0 to become a follower, > >> > > > > > > > >> > > > > > after the current broker sees the > >> > > replication > >> > > > of > >> > > > > > the > >> > > > > > > > >> > > remaining > >> > > > > > > > >> > > > 19 > >> > > > > > > > >> > > > > > partitions, it can send a response indicating > >> that > >> > > > > > > > >> > > > > > it's no longer the leader for the > >> > "test-0". > >> > > > > > > > >> > > > > > To see the latency difference between 1.1 > and > >> > 1.2, > >> > > > > let's > >> > > > > > > say > >> > > > > > > > >> > there > >> > > > > > > > >> > > > are > >> > > > > > > > >> > > > > > 24K produce requests ahead of the > LeaderAndISR, > >> > and > >> > > > > there > >> > > > > > > are > >> > > > > > > > 8 > >> > > > > > > > >> io > >> > > > > > > > >> > > > > threads, > >> > > > > > > > >> > > > > > so each io thread will process > approximately > >> > 3000 > >> > > > > > produce > >> > > > > > > > >> > requests. > >> > > > > > > > >> > > > Now > >> > > > > > > > >> > > > > > let's investigate the io thread that finally > >> > > processed > >> > > > > the > >> > > > > > > > >> > > > LeaderAndISR. > >> > > > > > > > >> > > > > > For the 3000 produce requests, if we model > >> the > >> > > time > >> > > > > when > >> > > > > > > > their > >> > > > > > > > >> > > > > remaining > >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, > and > >> > the > >> > > > > > > > LeaderAndISR > >> > > > > > > > >> > > > request > >> > > > > > > > >> > > > > is > >> > > > > > > > >> > > > > > processed at time t3000. > >> > > > > > > > >> > > > > > Without this KIP, the 1st produce request > >> would > >> > > have > >> > > > > > > waited > >> > > > > > > > an > >> > > > > > > > >> > > extra > >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an > >> extra > >> > > > time > >> > > > > of > >> > > > > > > > >> t3000 - > >> > > > > > > > >> > > t1, > >> > > > > > > > >> > > > > etc. > >> > > > > > > > >> > > > > > Roughly speaking, the latency difference is > >> > bigger > >> > > > for > >> > > > > > the > >> > > > > > > > >> > earlier > >> > > > > > > > >> > > > > > produce requests than for the later ones. For > >> the > >> > > same > >> > > > > > > reason, > >> > > > > > > > >> the > >> > > > > > > > >> > > more > >> > > > > > > > >> > > > > > ProduceRequests queued > >> > > > > > > > >> > > > > > before the LeaderAndISR, the bigger benefit > >> we > >> > get > >> > > > > > (capped > >> > > > > > > > by > >> > > > > > > > >> the > >> > > > > > > > >> > > > > > produce timeout). > >> > > > > > > > >> > > > > > 2. If the enqueued produce requests have > >> acks=0 or > >> > > > > acks=1 > >> > > > > > > > >> > > > > > There will be no latency differences in > this > >> > case, > >> > > > but > >> > > > > > > > >> > > > > > 2.1 without this KIP, the records of > >> partition > >> > > > test-0 > >> > > > > in > >> > > > > > > the > >> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR > will > >> be > >> > > > > appended > >> > > > > > > to > >> > > > > > > > >> the > >> > > > > > > > >> > > local > >> > > > > > > > >> > > > > log, > >> > > > > > > > >> > > > > > and eventually be truncated after > >> > processing > >> > > > the > >> > > > > > > > >> > > LeaderAndISR. > >> > > > > > > > >> > > > > > This is what's referred to as > >> > > > > > > > >> > > > > > "some unofficial definition of data > >> loss > >> > in > >> > > > > terms > >> > > > > > of > >> > > > > > > > >> > messages > >> > > > > > > > >> > > > > > beyond the high watermark". > >> > > > > > > > >> > > > > > 2.2 with this KIP, we can mitigate the > effect > >> > > since > >> > > > if > >> > > > > > the > >> > > > > > > > >> > > > LeaderAndISR > >> > > > > > > > >> > > > > > is immediately processed, the response to > >> > producers > >> > > > will > >> > > > > > > have > >> > > > > > > > >> > > > > > the NotLeaderForPartition error, > >> causing > >> > > > > producers > >> > > > > > > to > >> > > > > > > > >> retry > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > This explanation above is the benefit for > >> reducing > >> > > the > >> > > > > > > latency > >> > > > > > > > >> of a > >> > > > > > > > >> > > > > broker > >> > > > > > > > >> > > > > > becoming the follower, > >> > > > > > > > >> > > > > > closely related is reducing the latency of a > >> > broker > >> > > > > > becoming > >> > > > > > > > the > >> > > > > > > > >> > > > leader. > >> > > > > > > > >> > > > > > In this case, the benefit is even more > >> obvious, if > >> > > > other > >> > > > > > > > brokers > >> > > > > > > > >> > have > >> > > > > > > > >> > > > > > resigned leadership, and the > >> > > > > > > > >> > > > > > current broker should take leadership. Any > >> delay > >> > in > >> > > > > > > processing > >> > > > > > > > >> the > >> > > > > > > > >> > > > > > LeaderAndISR will be perceived > >> > > > > > > > >> > > > > > by clients as unavailability. In extreme > cases, > >> > this > >> > > > can > >> > > > > > > cause > >> > > > > > > > >> > failed > >> > > > > > > > >> > > > > > produce requests if the retries are > >> > > > > > > > >> > > > > > exhausted. > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > Another two types of controller requests are > >> > > > > > UpdateMetadata > >> > > > > > > > and > >> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as > >> > follows: > >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed > processing > >> > > means > >> > > > > > > clients > >> > > > > > > > >> > > receiving > >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong > leadership > >> > info > >> > > > > > > > >> > > > > > for certain partitions, and the effect is > more > >> > > retries > >> > > > > or > >> > > > > > > even > >> > > > > > > > >> > fatal > >> > > > > > > > >> > > > > > failure if the retries are exhausted. > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > For StopReplica requests, a long queuing time > >> may > >> > > > > degrade > >> > > > > > > the > >> > > > > > > > >> > > > performance > >> > > > > > > > >> > > > > > of topic deletion. > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > Regarding your last question of the delay for > >> > > > > > > > >> > DescribeLogDirsRequest, > >> > > > > > > > >> > > > you > >> > > > > > > > >> > > > > > are right > >> > > > > > > > >> > > > > > that this KIP cannot help with the latency in > >> > > getting > >> > > > > the > >> > > > > > > log > >> > > > > > > > >> dirs > >> > > > > > > > >> > > > info, > >> > > > > > > > >> > > > > > and it's only relevant > >> > > > > > > > >> > > > > > when controller requests are involved. > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > Regards, > >> > > > > > > > >> > > > > > Lucas > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin < > >> > > > > > > lindon...@gmail.com > >> > > > > > > > > > >> > > > > > > > >> > > wrote: > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > >> Hey Jun, > >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> Thanks much for the comments. It is good > >> point. > >> > So > >> > > > the > >> > > > > > > > feature > >> > > > > > > > >> may > >> > > > > > > > >> > > be > >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one > question > >> > > below. > >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> Hey Lucas, > >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> Do you think this feature is also useful for > >> > > non-JBOD > >> > > > > > setup > >> > > > > > > > or > >> > > > > > > > >> it > >> > > > > > > > >> > is > >> > > > > > > > >> > > > > only > >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful > to > >> > > > > understand > >> > > > > > > > this. > >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in > order > >> to > >> > > move > >> > > > > > > leaders > >> > > > > > > > >> on > >> > > > > > > > >> > the > >> > > > > > > > >> > > > > >> failed > >> > > > > > > > >> > > > > >> disk to other disks, the system operator > first > >> > > needs > >> > > > to > >> > > > > > get > >> > > > > > > > the > >> > > > > > > > >> > list > >> > > > > > > > >> > > > of > >> > > > > > > > >> > > > > >> partitions on the failed disk. This is > >> currently > >> > > > > achieved > >> > > > > > > > using > >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends > >> > > > > > > > >> DescribeLogDirsRequest > >> > > > > > > > >> > to > >> > > > > > > > >> > > > the > >> > > > > > > > >> > > > > >> broker. If we only prioritize the controller > >> > > > requests, > >> > > > > > then > >> > > > > > > > the > >> > > > > > > > >> > > > > >> DescribeLogDirsRequest > >> > > > > > > > >> > > > > >> may still take a long time to be processed > by > >> the > >> > > > > broker. > >> > > > > > > So > >> > > > > > > > >> the > >> > > > > > > > >> > > > overall > >> > > > > > > > >> > > > > >> time to move leaders away from the failed > disk > >> > may > >> > > > > still > >> > > > > > be > >> > > > > > > > >> long > >> > > > > > > > >> > > even > >> > > > > > > > >> > > > > with > >> > > > > > > > >> > > > > >> this KIP. What do you think? > >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> Thanks, > >> > > > > > > > >> > > > > >> Dong > >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang < > >> > > > > > > > >> lucasatu...@gmail.com > >> > > > > > > > >> > > > >> > > > > > > > >> > > > > wrote: > >> > > > > > > > >> > > > > >> > >> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun. > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > @Dong, > >> > > > > > > > >> > > > > >> > Since both of the two comments in your > >> previous > >> > > > email > >> > > > > > are > >> > > > > > > > >> about > >> > > > > > > > >> > > the > >> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's > >> useful, > >> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you > agree > >> > that > >> > > > > this > >> > > > > > > KIP > >> > > > > > > > >> can > >> > > > > > > > >> > be > >> > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun? > >> > > > > > > > >> > > > > >> > Please let me know, thanks! > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > Regards, > >> > > > > > > > >> > > > > >> > Lucas > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao < > >> > > > > > > j...@confluent.io> > >> > > > > > > > >> > wrote: > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong, > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one > >> > probably > >> > > > > > should > >> > > > > > > > just > >> > > > > > > > >> > kill > >> > > > > > > > >> > > > the > >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not > >> help. > >> > If > >> > > > > only > >> > > > > > > one > >> > > > > > > > of > >> > > > > > > > >> > the > >> > > > > > > > >> > > > > disks > >> > > > > > > > >> > > > > >> on > >> > > > > > > > >> > > > > >> > a > >> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail > that > >> > disk > >> > > > and > >> > > > > > move > >> > > > > > > > the > >> > > > > > > > >> > > > leaders > >> > > > > > > > >> > > > > on > >> > > > > > > > >> > > > > >> > that > >> > > > > > > > >> > > > > >> > > disk to other brokers. In that case, > being > >> > able > >> > > > to > >> > > > > > > > process > >> > > > > > > > >> the > >> > > > > > > > >> > > > > >> > LeaderAndIsr > >> > > > > > > > >> > > > > >> > > requests faster will potentially help > the > >> > > > producers > >> > > > > > > > recover > >> > > > > > > > >> > > > quicker. > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > Thanks, > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > Jun > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong > Lin < > >> > > > > > > > >> lindon...@gmail.com > >> > > > > > > > >> > > > >> > > > > > > > >> > > > > wrote: > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > Hey Lucas, > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up > >> > > questions > >> > > > > > below. > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest > >> covers > >> > 20 > >> > > > > > > > partitions > >> > > > > > > > >> > that > >> > > > > > > > >> > > > are > >> > > > > > > > >> > > > > >> > > randomly > >> > > > > > > > >> > > > > >> > > > distributed across all partitions, > then > >> > each > >> > > > > > > > >> ProduceRequest > >> > > > > > > > >> > > will > >> > > > > > > > >> > > > > >> likely > >> > > > > > > > >> > > > > >> > > > cover some partitions for which the > >> broker > >> > is > >> > > > > still > >> > > > > > > > >> leader > >> > > > > > > > >> > > after > >> > > > > > > > >> > > > > it > >> > > > > > > > >> > > > > >> > > quickly > >> > > > > > > > >> > > > > >> > > > processes the > >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will > >> still > >> > > be > >> > > > > slow > >> > > > > > > in > >> > > > > > > > >> > > > processing > >> > > > > > > > >> > > > > >> these > >> > > > > > > > >> > > > > >> > > > ProduceRequest and request will still > be > >> > very > >> > > > > high > >> > > > > > > with > >> > > > > > > > >> this > >> > > > > > > > >> > > > KIP. > >> > > > > > > > >> > > > > It > >> > > > > > > > >> > > > > >> > > seems > >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will still > >> timeout > >> > > > after > >> > > > > > 30 > >> > > > > > > > >> > seconds. > >> > > > > > > > >> > > Is > >> > > > > > > > >> > > > > >> this > >> > > > > > > > >> > > > > >> > > > understanding correct? > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest > will > >> > > still > >> > > > > > > timeout > >> > > > > > > > >> after > >> > > > > > > > >> > > 30 > >> > > > > > > > >> > > > > >> > seconds, > >> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP > >> reduces > >> > > > > average > >> > > > > > > > >> produce > >> > > > > > > > >> > > > > latency. > >> > > > > > > > >> > > > > >> Can > >> > > > > > > > >> > > > > >> > > you > >> > > > > > > > >> > > > > >> > > > clarify what metrics can be improved > by > >> > this > >> > > > KIP? > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > Not sure why system operator directly > >> cares > >> > > > > number > >> > > > > > of > >> > > > > > > > >> > > truncated > >> > > > > > > > >> > > > > >> > messages. > >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve > average > >> > > > > throughput > >> > > > > > > or > >> > > > > > > > >> > reduce > >> > > > > > > > >> > > > > >> message > >> > > > > > > > >> > > > > >> > > > duplication? It will be good to > >> understand > >> > > > this. > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > Thanks, > >> > > > > > > > >> > > > > >> > > > Dong > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas > >> Wang < > >> > > > > > > > >> > > lucasatu...@gmail.com > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > >> > wrote: > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > Hi Dong, > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments. > >> Please > >> > > see > >> > > > > my > >> > > > > > > > reply > >> > > > > > > > >> > > below. > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 > >> > partition. > >> > > > Now > >> > > > > > > let's > >> > > > > > > > >> > > consider > >> > > > > > > > >> > > > a > >> > > > > > > > >> > > > > >> more > >> > > > > > > > >> > > > > >> > > > common > >> > > > > > > > >> > > > > >> > > > > scenario > >> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of many > >> > > > partitions. > >> > > > > > And > >> > > > > > > > >> let's > >> > > > > > > > >> > > say > >> > > > > > > > >> > > > > for > >> > > > > > > > >> > > > > >> > some > >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow. > >> > > > > > > > >> > > > > >> > > > > The number of leader partitions on > >> > broker0 > >> > > is > >> > > > > so > >> > > > > > > > large, > >> > > > > > > > >> > say > >> > > > > > > > >> > > > 10K, > >> > > > > > > > >> > > > > >> that > >> > > > > > > > >> > > > > >> > > the > >> > > > > > > > >> > > > > >> > > > > cluster is skewed, > >> > > > > > > > >> > > > > >> > > > > and the operator would like to shift > >> the > >> > > > > > leadership > >> > > > > > > > >> for a > >> > > > > > > > >> > > lot > >> > > > > > > > >> > > > of > >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other > brokers, > >> > > > > > > > >> > > > > >> > > > > either manually or through some > >> service > >> > > like > >> > > > > > cruise > >> > > > > > > > >> > control. > >> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the > >> > leadership > >> > > > > > > > transitions > >> > > > > > > > >> > > finish > >> > > > > > > > >> > > > > >> more > >> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself > >> > > becoming > >> > > > > more > >> > > > > > > > >> > balanced, > >> > > > > > > > >> > > > > >> > > > > but all existing producers > >> corresponding > >> > to > >> > > > the > >> > > > > > 9K > >> > > > > > > > >> > > partitions > >> > > > > > > > >> > > > > will > >> > > > > > > > >> > > > > >> > get > >> > > > > > > > >> > > > > >> > > > the > >> > > > > > > > >> > > > > >> > > > > errors relatively quickly > >> > > > > > > > >> > > > > >> > > > > rather than relying on their > timeout, > >> > > thanks > >> > > > to > >> > > > > > the > >> > > > > > > > >> > batched > >> > > > > > > > >> > > > > async > >> > > > > > > > >> > > > > >> ZK > >> > > > > > > > >> > > > > >> > > > > operations. > >> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to have > >> > during > >> > > > such > >> > > > > > > > >> > troublesome > >> > > > > > > > >> > > > > times. > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc > >> have > >> > > > shown > >> > > > > > > that > >> > > > > > > > >> with > >> > > > > > > > >> > > this > >> > > > > > > > >> > > > > KIP > >> > > > > > > > >> > > > > >> > many > >> > > > > > > > >> > > > > >> > > > > producers > >> > > > > > > > >> > > > > >> > > > > receive an explicit error > >> > > > > NotLeaderForPartition, > >> > > > > > > > based > >> > > > > > > > >> on > >> > > > > > > > >> > > > which > >> > > > > > > > >> > > > > >> they > >> > > > > > > > >> > > > > >> > > > retry > >> > > > > > > > >> > > > > >> > > > > immediately. > >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14 > >> seconds+quick > >> > > > retry) > >> > > > > > for > >> > > > > > > > >> their > >> > > > > > > > >> > > > single > >> > > > > > > > >> > > > > >> > > message > >> > > > > > > > >> > > > > >> > > > is > >> > > > > > > > >> > > > > >> > > > > much smaller > >> > > > > > > > >> > > > > >> > > > > compared with the case of timing out > >> > > without > >> > > > > the > >> > > > > > > KIP > >> > > > > > > > >> (30 > >> > > > > > > > >> > > > seconds > >> > > > > > > > >> > > > > >> for > >> > > > > > > > >> > > > > >> > > > timing > >> > > > > > > > >> > > > > >> > > > > out + quick retry). > >> > > > > > > > >> > > > > >> > > > > One might argue that reducing the > >> timing > >> > > out > >> > > > on > >> > > > > > the > >> > > > > > > > >> > producer > >> > > > > > > > >> > > > > side > >> > > > > > > > >> > > > > >> can > >> > > > > > > > >> > > > > >> > > > > achieve the same result, > >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own > >> > > > > > drawbacks[1]. > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to > show > >> the > >> > > > > number > >> > > > > > of > >> > > > > > > > >> > > truncated > >> > > > > > > > >> > > > > >> > messages > >> > > > > > > > >> > > > > >> > > on > >> > > > > > > > >> > > > > >> > > > > brokers, > >> > > > > > > > >> > > > > >> > > > > with the experiments done in the > >> Google > >> > > Doc, > >> > > > it > >> > > > > > > > should > >> > > > > > > > >> be > >> > > > > > > > >> > > easy > >> > > > > > > > >> > > > > to > >> > > > > > > > >> > > > > >> see > >> > > > > > > > >> > > > > >> > > > that > >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need > >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the > >> > > > up-to-date > >> > > > > > > > >> metadata > >> > > > > > > > >> > > > avoids > >> > > > > > > > >> > > > > >> > > appending > >> > > > > > > > >> > > > > >> > > > > of messages > >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If > we > >> > talk > >> > > > to a > >> > > > > > > > system > >> > > > > > > > >> > > > operator > >> > > > > > > > >> > > > > >> and > >> > > > > > > > >> > > > > >> > ask > >> > > > > > > > >> > > > > >> > > > > whether > >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I > bet > >> > most > >> > > > > likely > >> > > > > > > the > >> > > > > > > > >> > answer > >> > > > > > > > >> > > > is > >> > > > > > > > >> > > > > >> yes. > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I think > it > >> > > might > >> > > > be > >> > > > > > > > >> helpful to > >> > > > > > > > >> > > > > >> construct > >> > > > > > > > >> > > > > >> > > some > >> > > > > > > > >> > > > > >> > > > > formulas. > >> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going > >> back > >> > to > >> > > > the > >> > > > > > > case > >> > > > > > > > >> where > >> > > > > > > > >> > > > there > >> > > > > > > > >> > > > > >> is > >> > > > > > > > >> > > > > >> > > only > >> > > > > > > > >> > > > > >> > > > > ONE partition involved. > >> > > > > > > > >> > > > > >> > > > > Following the experiments in the > >> Google > >> > > Doc, > >> > > > > > let's > >> > > > > > > > say > >> > > > > > > > >> > > broker0 > >> > > > > > > > >> > > > > >> > becomes > >> > > > > > > > >> > > > > >> > > > the > >> > > > > > > > >> > > > > >> > > > > follower at time t0, > >> > > > > > > > >> > > > > >> > > > > and after t0 there were still N > >> produce > >> > > > > requests > >> > > > > > in > >> > > > > > > > its > >> > > > > > > > >> > > > request > >> > > > > > > > >> > > > > >> > queue. > >> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought > >> by > >> > > this > >> > > > > KIP, > >> > > > > > > > >> broker0 > >> > > > > > > > >> > > can > >> > > > > > > > >> > > > > >> reply > >> > > > > > > > >> > > > > >> > > with > >> > > > > > > > >> > > > > >> > > > an > >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception, > >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average > >> > > processing > >> > > > > > time > >> > > > > > > of > >> > > > > > > > >> > > replying > >> > > > > > > > >> > > > > >> with > >> > > > > > > > >> > > > > >> > > such > >> > > > > > > > >> > > > > >> > > > an > >> > > > > > > > >> > > > > >> > > > > error message. > >> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will > >> need to > >> > > > > append > >> > > > > > > > >> messages > >> > > > > > > > >> > to > >> > > > > > > > >> > > > > >> > segments, > >> > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk, > >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average > >> > > processing > >> > > > > > time > >> > > > > > > > for > >> > > > > > > > >> > such > >> > > > > > > > >> > > > > logic. > >> > > > > > > > >> > > > > >> > > > > Then the average extra latency > >> incurred > >> > > > without > >> > > > > > > this > >> > > > > > > > >> KIP > >> > > > > > > > >> > is > >> > > > > > > > >> > > N > >> > > > > > > > >> > > > * > >> > > > > > > > >> > > > > >> (M2 - > >> > > > > > > > >> > > > > >> > > > M1) / > >> > > > > > > > >> > > > > >> > > > > 2. > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be > >> larger > >> > > than > >> > > > > M1, > >> > > > > > > > which > >> > > > > > > > >> > means > >> > > > > > > > >> > > > as > >> > > > > > > > >> > > > > >> long > >> > > > > > > > >> > > > > >> > > as N > >> > > > > > > > >> > > > > >> > > > > is positive, > >> > > > > > > > >> > > > > >> > > > > we would see improvements on the > >> average > >> > > > > latency. > >> > > > > > > > >> > > > > >> > > > > There does not need to be > significant > >> > > backlog > >> > > > > of > >> > > > > > > > >> requests > >> > > > > > > > >> > in > >> > > > > > > > >> > > > the > >> > > > > > > > >> > > > > >> > > request > >> > > > > > > > >> > > > > >> > > > > queue, > >> > > > > > > > >> > > > > >> > > > > or severe degradation of disk > >> performance > >> > > to > >> > > > > have > >> > > > > > > the > >> > > > > > > > >> > > > > improvement. > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > Regards, > >> > > > > > > > >> > > > > >> > > > > Lucas > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the > >> timeout on > >> > > the > >> > > > > > > > producer > >> > > > > > > > >> > side > >> > > > > > > > >> > > > can > >> > > > > > > > >> > > > > >> > trigger > >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests > >> > > > > > > > >> > > > > >> > > > > when the corresponding leader broker > >> is > >> > > > > > overloaded, > >> > > > > > > > >> > > > exacerbating > >> > > > > > > > >> > > > > >> the > >> > > > > > > > >> > > > > >> > > > > situation. > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong > >> Lin > >> > < > >> > > > > > > > >> > > lindon...@gmail.com > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > >> > wrote: > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > Hey Lucas, > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed > >> > > documentation > >> > > > of > >> > > > > > the > >> > > > > > > > >> > > > experiment. > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > Initially I also think having a > >> > separate > >> > > > > queue > >> > > > > > > for > >> > > > > > > > >> > > > controller > >> > > > > > > > >> > > > > >> > > requests > >> > > > > > > > >> > > > > >> > > > is > >> > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned > in > >> the > >> > > > > summary > >> > > > > > > > >> section > >> > > > > > > > >> > of > >> > > > > > > > >> > > > the > >> > > > > > > > >> > > > > >> > Google > >> > > > > > > > >> > > > > >> > > > > doc, > >> > > > > > > > >> > > > > >> > > > > > controller requests are generally > >> more > >> > > > > > important > >> > > > > > > > than > >> > > > > > > > >> > data > >> > > > > > > > >> > > > > >> requests > >> > > > > > > > >> > > > > >> > > and > >> > > > > > > > >> > > > > >> > > > > we > >> > > > > > > > >> > > > > >> > > > > > probably want controller requests > >> to be > >> > > > > > processed > >> > > > > > > > >> > sooner. > >> > > > > > > > >> > > > But > >> > > > > > > > >> > > > > >> then > >> > > > > > > > >> > > > > >> > > Eno > >> > > > > > > > >> > > > > >> > > > > has > >> > > > > > > > >> > > > > >> > > > > > two very good questions which I am > >> not > >> > > sure > >> > > > > the > >> > > > > > > > >> Google > >> > > > > > > > >> > doc > >> > > > > > > > >> > > > has > >> > > > > > > > >> > > > > >> > > answered > >> > > > > > > > >> > > > > >> > > > > > explicitly. Could you help with > the > >> > > > following > >> > > > > > > > >> questions? > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is > the > >> > > actual > >> > > > > > > benefit > >> > > > > > > > of > >> > > > > > > > >> > > > KIP-291 > >> > > > > > > > >> > > > > to > >> > > > > > > > >> > > > > >> > > users. > >> > > > > > > > >> > > > > >> > > > > The > >> > > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc > >> > > > simulates > >> > > > > > the > >> > > > > > > > >> > scenario > >> > > > > > > > >> > > > that > >> > > > > > > > >> > > > > >> > broker > >> > > > > > > > >> > > > > >> > > > is > >> > > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest > >> due > >> > to > >> > > > e.g. > >> > > > > > > slow > >> > > > > > > > >> disk. > >> > > > > > > > >> > > It > >> > > > > > > > >> > > > > >> > currently > >> > > > > > > > >> > > > > >> > > > > > assumes that there is only 1 > >> partition. > >> > > But > >> > > > > in > >> > > > > > > the > >> > > > > > > > >> > common > >> > > > > > > > >> > > > > >> scenario, > >> > > > > > > > >> > > > > >> > > it > >> > > > > > > > >> > > > > >> > > > is > >> > > > > > > > >> > > > > >> > > > > > probably reasonable to assume that > >> > there > >> > > > are > >> > > > > > many > >> > > > > > > > >> other > >> > > > > > > > >> > > > > >> partitions > >> > > > > > > > >> > > > > >> > > that > >> > > > > > > > >> > > > > >> > > > > are > >> > > > > > > > >> > > > > >> > > > > > also actively produced to and > >> > > > ProduceRequest > >> > > > > to > >> > > > > > > > these > >> > > > > > > > >> > > > > partition > >> > > > > > > > >> > > > > >> > also > >> > > > > > > > >> > > > > >> > > > > takes > >> > > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So > >> even > >> > > if > >> > > > > > > broker0 > >> > > > > > > > >> can > >> > > > > > > > >> > > > become > >> > > > > > > > >> > > > > >> > > follower > >> > > > > > > > >> > > > > >> > > > > for > >> > > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably > >> still > >> > > > needs > >> > > > > > to > >> > > > > > > > >> process > >> > > > > > > > >> > > the > >> > > > > > > > >> > > > > >> > > > > ProduceRequest > >> > > > > > > > >> > > > > >> > > > > > slowly t in the queue because > these > >> > > > > > > ProduceRequests > >> > > > > > > > >> > cover > >> > > > > > > > >> > > > > other > >> > > > > > > > >> > > > > >> > > > > partitions. > >> > > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will > still > >> > > timeout > >> > > > > > after > >> > > > > > > > 30 > >> > > > > > > > >> > > seconds > >> > > > > > > > >> > > > > and > >> > > > > > > > >> > > > > >> > most > >> > > > > > > > >> > > > > >> > > > > > clients will still likely timeout > >> after > >> > > 30 > >> > > > > > > seconds. > >> > > > > > > > >> Then > >> > > > > > > > >> > > it > >> > > > > > > > >> > > > is > >> > > > > > > > >> > > > > >> not > >> > > > > > > > >> > > > > >> > > > > > obviously what is the benefit to > >> client > >> > > > since > >> > > > > > > > client > >> > > > > > > > >> > will > >> > > > > > > > >> > > > > >> timeout > >> > > > > > > > >> > > > > >> > > after > >> > > > > > > > >> > > > > >> > > > > 30 > >> > > > > > > > >> > > > > >> > > > > > seconds before possibly > >> re-connecting > >> > to > >> > > > > > broker1, > >> > > > > > > > >> with > >> > > > > > > > >> > or > >> > > > > > > > >> > > > > >> without > >> > > > > > > > >> > > > > >> > > > > KIP-291. > >> > > > > > > > >> > > > > >> > > > > > Did I miss something here? > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the > >> > > specific > >> > > > > > > > benefits > >> > > > > > > > >> of > >> > > > > > > > >> > > this > >> > > > > > > > >> > > > > >> KIP to > >> > > > > > > > >> > > > > >> > > > user > >> > > > > > > > >> > > > > >> > > > > or > >> > > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether > >> this > >> > > KIP > >> > > > > > > > decreases > >> > > > > > > > >> > > > average > >> > > > > > > > >> > > > > >> > > latency, > >> > > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably > >> of > >> > > > > exception > >> > > > > > > > >> exposed > >> > > > > > > > >> > to > >> > > > > > > > >> > > > > >> client > >> > > > > > > > >> > > > > >> > > etc. > >> > > > > > > > >> > > > > >> > > > It > >> > > > > > > > >> > > > > >> > > > > > is probably useful to clarify > this. > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user > >> > > > experience > >> > > > > > > only > >> > > > > > > > >> when > >> > > > > > > > >> > > > there > >> > > > > > > > >> > > > > is > >> > > > > > > > >> > > > > >> > > issue > >> > > > > > > > >> > > > > >> > > > > with > >> > > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog > in > >> the > >> > > > > request > >> > > > > > > > queue > >> > > > > > > > >> > due > >> > > > > > > > >> > > to > >> > > > > > > > >> > > > > >> slow > >> > > > > > > > >> > > > > >> > > disk > >> > > > > > > > >> > > > > >> > > > as > >> > > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is > >> this > >> > > KIP > >> > > > > > also > >> > > > > > > > >> useful > >> > > > > > > > >> > > when > >> > > > > > > > >> > > > > >> there > >> > > > > > > > >> > > > > >> > is > >> > > > > > > > >> > > > > >> > > > no > >> > > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It > >> might > >> > be > >> > > > > > helpful > >> > > > > > > > to > >> > > > > > > > >> > > clarify > >> > > > > > > > >> > > > > >> this > >> > > > > > > > >> > > > > >> > to > >> > > > > > > > >> > > > > >> > > > > > understand the benefit of this > KIP. > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > Thanks much, > >> > > > > > > > >> > > > > >> > > > > > Dong > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, > >> Lucas > >> > > > Wang < > >> > > > > > > > >> > > > > >> lucasatu...@gmail.com > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > > > wrote: > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > Hi Eno, > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting > the > >> > > > > experiment > >> > > > > > > > >> results. > >> > > > > > > > >> > > > > >> > > > > > > Here is a link to the positive > >> impact > >> > > > > > achieved > >> > > > > > > by > >> > > > > > > > >> > > > > implementing > >> > > > > > > > >> > > > > >> > the > >> > > > > > > > >> > > > > >> > > > > > proposed > >> > > > > > > > >> > > > > >> > > > > > > change: > >> > > > > > > > >> > > > > >> > > > > > > > >> https://docs.google.com/document/d/ > >> > > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW > >> > > > > > > > >> > > > > >> > > > > > > > >> FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing > >> > > > > > > > >> > > > > >> > > > > > > Please take a look when you have > >> time > >> > > and > >> > > > > let > >> > > > > > > me > >> > > > > > > > >> know > >> > > > > > > > >> > > your > >> > > > > > > > >> > > > > >> > > feedback. > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > Regards, > >> > > > > > > > >> > > > > >> > > > > > > Lucas > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, > >> > > Harsha < > >> > > > > > > > >> > > ka...@harsha.io> > >> > > > > > > > >> > > > > >> wrote: > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will > >> take a > >> > > > look > >> > > > > > > might > >> > > > > > > > >> suit > >> > > > > > > > >> > > our > >> > > > > > > > >> > > > > >> > > > requirements > >> > > > > > > > >> > > > > >> > > > > > > > better. > >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > Thanks, > >> > > > > > > > >> > > > > >> > > > > > > > Harsha > >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 > >> PM, > >> > > > Lucas > >> > > > > > > Wang < > >> > > > > > > > >> > > > > >> > > > lucasatu...@gmail.com > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > wrote: > >> > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha, > >> > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly, > the > >> > > > > > replication > >> > > > > > > > >> quota > >> > > > > > > > >> > > > > mechanism > >> > > > > > > > >> > > > > >> > > > proposed > >> > > > > > > > >> > > > > >> > > > > > in > >> > > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in > that > >> > > > scenario. > >> > > > > > > > >> > > > > >> > > > > > > > > Have you tried it out? > >> > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > Thanks, > >> > > > > > > > >> > > > > >> > > > > > > > > Lucas > >> > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > -- > > -Regards, > > Mayuresh R. Gharat > > (862) 250-7125 > > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 >