Yea, the correlationId is only set to 0 in the NetworkClient constructor. Since we reuse the same NetworkClient between Controller and the broker, a disconnection should not cause it to reset to 0, in which case it can be used to reject obsolete requests.
Thanks, Mayuresh On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lucasatu...@gmail.com> wrote: > @Dong, > Great example and explanation, thanks! > > @All > Regarding the example given by Dong, it seems even if we use a queue, and a > dedicated controller request handling thread, > the same result can still happen because R1_a will be sent on one > connection, and R1_b & R2 will be sent on a different connection, > and there is no ordering between different connections on the broker side. > I was discussing with Mayuresh offline, and it seems correlation id within > the same NetworkClient object is monotonically increasing and never reset, > hence a broker can leverage that to properly reject obsolete requests. > Thoughts? > > Thanks, > Lucas > > On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat < > gharatmayures...@gmail.com> wrote: > > > Actually nvm, correlationId is reset in case of connection loss, I think. > > > > Thanks, > > > > Mayuresh > > > > On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat < > > gharatmayures...@gmail.com> > > wrote: > > > > > I agree with Dong that out-of-order processing can happen with having 2 > > > separate queues as well and it can even happen today. > > > Can we use the correlationId in the request from the controller to the > > > broker to handle ordering ? > > > > > > Thanks, > > > > > > Mayuresh > > > > > > > > > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <becket....@gmail.com> > wrote: > > > > > >> Good point, Joel. I agree that a dedicated controller request handling > > >> thread would be a better isolation. It also solves the reordering > issue. > > >> > > >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jjkosh...@gmail.com> > > wrote: > > >> > > >> > Good example. I think this scenario can occur in the current code as > > >> well > > >> > but with even lower probability given that there are other > > >> non-controller > > >> > requests interleaved. It is still sketchy though and I think a safer > > >> > approach would be separate queues and pinning controller request > > >> handling > > >> > to one handler thread. > > >> > > > >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <lindon...@gmail.com> > > wrote: > > >> > > > >> > > Hey Becket, > > >> > > > > >> > > I think you are right that there may be out-of-order processing. > > >> However, > > >> > > it seems that out-of-order processing may also happen even if we > > use a > > >> > > separate queue. > > >> > > > > >> > > Here is the example: > > >> > > > > >> > > - Controller sends R1 and got disconnected before receiving > > response. > > >> > Then > > >> > > it reconnects and sends R2. Both requests now stay in the > controller > > >> > > request queue in the order they are sent. > > >> > > - thread1 takes R1_a from the request queue and then thread2 takes > > R2 > > >> > from > > >> > > the request queue almost at the same time. > > >> > > - So R1_a and R2 are processed in parallel. There is chance that > > R2's > > >> > > processing is completed before R1. > > >> > > > > >> > > If out-of-order processing can happen for both approaches with > very > > >> low > > >> > > probability, it may not be worthwhile to add the extra queue. What > > do > > >> you > > >> > > think? > > >> > > > > >> > > Thanks, > > >> > > Dong > > >> > > > > >> > > > > >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <becket....@gmail.com > > > > >> > wrote: > > >> > > > > >> > > > Hi Mayuresh/Joel, > > >> > > > > > >> > > > Using the request channel as a dequeue was bright up some time > ago > > >> when > > >> > > we > > >> > > > initially thinking of prioritizing the request. The concern was > > that > > >> > the > > >> > > > controller requests are supposed to be processed in order. If we > > can > > >> > > ensure > > >> > > > that there is one controller request in the request channel, the > > >> order > > >> > is > > >> > > > not a concern. But in cases that there are more than one > > controller > > >> > > request > > >> > > > inserted into the queue, the controller request order may change > > and > > >> > > cause > > >> > > > problem. For example, think about the following sequence: > > >> > > > 1. Controller successfully sent a request R1 to broker > > >> > > > 2. Broker receives R1 and put the request to the head of the > > request > > >> > > queue. > > >> > > > 3. Controller to broker connection failed and the controller > > >> > reconnected > > >> > > to > > >> > > > the broker. > > >> > > > 4. Controller sends a request R2 to the broker > > >> > > > 5. Broker receives R2 and add it to the head of the request > queue. > > >> > > > Now on the broker side, R2 will be processed before R1 is > > processed, > > >> > > which > > >> > > > may cause problem. > > >> > > > > > >> > > > Thanks, > > >> > > > > > >> > > > Jiangjie (Becket) Qin > > >> > > > > > >> > > > > > >> > > > > > >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy < > jjkosh...@gmail.com> > > >> > wrote: > > >> > > > > > >> > > > > @Mayuresh - I like your idea. It appears to be a simpler less > > >> > invasive > > >> > > > > alternative and it should work. Jun/Becket/others, do you see > > any > > >> > > > pitfalls > > >> > > > > with this approach? > > >> > > > > > > >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang < > > >> lucasatu...@gmail.com> > > >> > > > > wrote: > > >> > > > > > > >> > > > > > @Mayuresh, > > >> > > > > > That's a very interesting idea that I haven't thought > before. > > >> > > > > > It seems to solve our problem at hand pretty well, and also > > >> > > > > > avoids the need to have a new size metric and capacity > config > > >> > > > > > for the controller request queue. In fact, if we were to > adopt > > >> > > > > > this design, there is no public interface change, and we > > >> > > > > > probably don't need a KIP. > > >> > > > > > Also implementation wise, it seems > > >> > > > > > the java class LinkedBlockingQueue can readily satisfy the > > >> > > requirement > > >> > > > > > by supporting a capacity, and also allowing inserting at > both > > >> ends. > > >> > > > > > > > >> > > > > > My only concern is that this design is tied to the > coincidence > > >> that > > >> > > > > > we have two request priorities and there are two ends to a > > >> deque. > > >> > > > > > Hence by using the proposed design, it seems the network > layer > > >> is > > >> > > > > > more tightly coupled with upper layer logic, e.g. if we were > > to > > >> add > > >> > > > > > an extra priority level in the future for some reason, we > > would > > >> > > > probably > > >> > > > > > need to go back to the design of separate queues, one for > each > > >> > > priority > > >> > > > > > level. > > >> > > > > > > > >> > > > > > In summary, I'm ok with both designs and lean toward your > > >> suggested > > >> > > > > > approach. > > >> > > > > > Let's hear what others think. > > >> > > > > > > > >> > > > > > @Becket, > > >> > > > > > In light of Mayuresh's suggested new design, I'm answering > > your > > >> > > > question > > >> > > > > > only in the context > > >> > > > > > of the current KIP design: I think your suggestion makes > > sense, > > >> and > > >> > > I'm > > >> > > > > ok > > >> > > > > > with removing the capacity config and > > >> > > > > > just relying on the default value of 20 being sufficient > > enough. > > >> > > > > > > > >> > > > > > Thanks, > > >> > > > > > Lucas > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat < > > >> > > > > > gharatmayures...@gmail.com > > >> > > > > > > wrote: > > >> > > > > > > > >> > > > > > > Hi Lucas, > > >> > > > > > > > > >> > > > > > > Seems like the main intent here is to prioritize the > > >> controller > > >> > > > request > > >> > > > > > > over any other requests. > > >> > > > > > > In that case, we can change the request queue to a > dequeue, > > >> where > > >> > > you > > >> > > > > > > always insert the normal requests (produce, consume,..etc) > > to > > >> the > > >> > > end > > >> > > > > of > > >> > > > > > > the dequeue, but if its a controller request, you insert > it > > to > > >> > the > > >> > > > head > > >> > > > > > of > > >> > > > > > > the queue. This ensures that the controller request will > be > > >> given > > >> > > > > higher > > >> > > > > > > priority over other requests. > > >> > > > > > > > > >> > > > > > > Also since we only read one request from the socket and > mute > > >> it > > >> > and > > >> > > > > only > > >> > > > > > > unmute it after handling the request, this would ensure > that > > >> we > > >> > > don't > > >> > > > > > > handle controller requests out of order. > > >> > > > > > > > > >> > > > > > > With this approach we can avoid the second queue and the > > >> > additional > > >> > > > > > config > > >> > > > > > > for the size of the queue. > > >> > > > > > > > > >> > > > > > > What do you think ? > > >> > > > > > > > > >> > > > > > > Thanks, > > >> > > > > > > > > >> > > > > > > Mayuresh > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin < > > >> becket....@gmail.com > > >> > > > > >> > > > > wrote: > > >> > > > > > > > > >> > > > > > > > Hey Joel, > > >> > > > > > > > > > >> > > > > > > > Thank for the detail explanation. I agree the current > > design > > >> > > makes > > >> > > > > > sense. > > >> > > > > > > > My confusion is about whether the new config for the > > >> controller > > >> > > > queue > > >> > > > > > > > capacity is necessary. I cannot think of a case in which > > >> users > > >> > > > would > > >> > > > > > > change > > >> > > > > > > > it. > > >> > > > > > > > > > >> > > > > > > > Thanks, > > >> > > > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin > > >> > > > > > > > > > >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin < > > >> > > becket....@gmail.com> > > >> > > > > > > wrote: > > >> > > > > > > > > > >> > > > > > > > > Hi Lucas, > > >> > > > > > > > > > > >> > > > > > > > > I guess my question can be rephrased to "do we expect > > >> user to > > >> > > > ever > > >> > > > > > > change > > >> > > > > > > > > the controller request queue capacity"? If we agree > that > > >> 20 > > >> > is > > >> > > > > > already > > >> > > > > > > a > > >> > > > > > > > > very generous default number and we do not expect user > > to > > >> > > change > > >> > > > > it, > > >> > > > > > is > > >> > > > > > > > it > > >> > > > > > > > > still necessary to expose this as a config? > > >> > > > > > > > > > > >> > > > > > > > > Thanks, > > >> > > > > > > > > > > >> > > > > > > > > Jiangjie (Becket) Qin > > >> > > > > > > > > > > >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang < > > >> > > > lucasatu...@gmail.com > > >> > > > > > > > >> > > > > > > > wrote: > > >> > > > > > > > > > > >> > > > > > > > >> @Becket > > >> > > > > > > > >> 1. Thanks for the comment. You are right that > normally > > >> there > > >> > > > > should > > >> > > > > > be > > >> > > > > > > > >> just > > >> > > > > > > > >> one controller request because of muting, > > >> > > > > > > > >> and I had NOT intended to say there would be many > > >> enqueued > > >> > > > > > controller > > >> > > > > > > > >> requests. > > >> > > > > > > > >> I went through the KIP again, and I'm not sure which > > part > > >> > > > conveys > > >> > > > > > that > > >> > > > > > > > >> info. > > >> > > > > > > > >> I'd be happy to revise if you point it out the > section. > > >> > > > > > > > >> > > >> > > > > > > > >> 2. Though it should not happen in normal conditions, > > the > > >> > > current > > >> > > > > > > design > > >> > > > > > > > >> does not preclude multiple controllers running > > >> > > > > > > > >> at the same time, hence if we don't have the > controller > > >> > queue > > >> > > > > > capacity > > >> > > > > > > > >> config and simply make its capacity to be 1, > > >> > > > > > > > >> network threads handling requests from different > > >> controllers > > >> > > > will > > >> > > > > be > > >> > > > > > > > >> blocked during those troublesome times, > > >> > > > > > > > >> which is probably not what we want. On the other > hand, > > >> > adding > > >> > > > the > > >> > > > > > > extra > > >> > > > > > > > >> config with a default value, say 20, guards us from > > >> issues > > >> > in > > >> > > > > those > > >> > > > > > > > >> troublesome times, and IMO there isn't much downside > of > > >> > adding > > >> > > > the > > >> > > > > > > extra > > >> > > > > > > > >> config. > > >> > > > > > > > >> > > >> > > > > > > > >> @Mayuresh > > >> > > > > > > > >> Good catch, this sentence is an obsolete statement > > based > > >> on > > >> > a > > >> > > > > > previous > > >> > > > > > > > >> design. I've revised the wording in the KIP. > > >> > > > > > > > >> > > >> > > > > > > > >> Thanks, > > >> > > > > > > > >> Lucas > > >> > > > > > > > >> > > >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat < > > >> > > > > > > > >> gharatmayures...@gmail.com> wrote: > > >> > > > > > > > >> > > >> > > > > > > > >> > Hi Lucas, > > >> > > > > > > > >> > > > >> > > > > > > > >> > Thanks for the KIP. > > >> > > > > > > > >> > I am trying to understand why you think "The memory > > >> > > > consumption > > >> > > > > > can > > >> > > > > > > > rise > > >> > > > > > > > >> > given the total number of queued requests can go up > > to > > >> 2x" > > >> > > in > > >> > > > > the > > >> > > > > > > > impact > > >> > > > > > > > >> > section. Normally the requests from controller to a > > >> Broker > > >> > > are > > >> > > > > not > > >> > > > > > > > high > > >> > > > > > > > >> > volume, right ? > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > Thanks, > > >> > > > > > > > >> > > > >> > > > > > > > >> > Mayuresh > > >> > > > > > > > >> > > > >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin < > > >> > > > > becket....@gmail.com> > > >> > > > > > > > >> wrote: > > >> > > > > > > > >> > > > >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control > > >> plane > > >> > > from > > >> > > > > the > > >> > > > > > > > data > > >> > > > > > > > >> > plane > > >> > > > > > > > >> > > makes a lot of sense. > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > In the KIP you mentioned that the controller > > request > > >> > queue > > >> > > > may > > >> > > > > > > have > > >> > > > > > > > >> many > > >> > > > > > > > >> > > requests in it. Will this be a common case? The > > >> > controller > > >> > > > > > > requests > > >> > > > > > > > >> still > > >> > > > > > > > >> > > goes through the SocketServer. The SocketServer > > will > > >> > mute > > >> > > > the > > >> > > > > > > > channel > > >> > > > > > > > >> > once > > >> > > > > > > > >> > > a request is read and put into the request > channel. > > >> So > > >> > > > > assuming > > >> > > > > > > > there > > >> > > > > > > > >> is > > >> > > > > > > > >> > > only one connection between controller and each > > >> broker, > > >> > on > > >> > > > the > > >> > > > > > > > broker > > >> > > > > > > > >> > side, > > >> > > > > > > > >> > > there should be only one controller request in > the > > >> > > > controller > > >> > > > > > > > request > > >> > > > > > > > >> > queue > > >> > > > > > > > >> > > at any given time. If that is the case, do we > need > > a > > >> > > > separate > > >> > > > > > > > >> controller > > >> > > > > > > > >> > > request queue capacity config? The default value > 20 > > >> > means > > >> > > > that > > >> > > > > > we > > >> > > > > > > > >> expect > > >> > > > > > > > >> > > there are 20 controller switches to happen in a > > short > > >> > > period > > >> > > > > of > > >> > > > > > > > time. > > >> > > > > > > > >> I > > >> > > > > > > > >> > am > > >> > > > > > > > >> > > not sure whether someone should increase the > > >> controller > > >> > > > > request > > >> > > > > > > > queue > > >> > > > > > > > >> > > capacity to handle such case, as it seems > > indicating > > >> > > > something > > >> > > > > > > very > > >> > > > > > > > >> wrong > > >> > > > > > > > >> > > has happened. > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > Thanks, > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > Jiangjie (Becket) Qin > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin < > > >> > > > > lindon...@gmail.com> > > >> > > > > > > > >> wrote: > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > Thanks for the update Lucas. > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > I think the motivation section is intuitive. It > > >> will > > >> > be > > >> > > > good > > >> > > > > > to > > >> > > > > > > > >> learn > > >> > > > > > > > >> > > more > > >> > > > > > > > >> > > > about the comments from other reviewers. > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang < > > >> > > > > > > > lucasatu...@gmail.com> > > >> > > > > > > > >> > > wrote: > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > Hi Dong, > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > I've updated the motivation section of the > KIP > > by > > >> > > > > explaining > > >> > > > > > > the > > >> > > > > > > > >> > cases > > >> > > > > > > > >> > > > that > > >> > > > > > > > >> > > > > would have user impacts. > > >> > > > > > > > >> > > > > Please take a look at let me know your > > comments. > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > Thanks, > > >> > > > > > > > >> > > > > Lucas > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang < > > >> > > > > > > > lucasatu...@gmail.com > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > wrote: > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > Hi Dong, > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > The simulation of disk being slow is merely > > >> for me > > >> > > to > > >> > > > > > easily > > >> > > > > > > > >> > > construct > > >> > > > > > > > >> > > > a > > >> > > > > > > > >> > > > > > testing scenario > > >> > > > > > > > >> > > > > > with a backlog of produce requests. In > > >> production, > > >> > > > other > > >> > > > > > > than > > >> > > > > > > > >> the > > >> > > > > > > > >> > > disk > > >> > > > > > > > >> > > > > > being slow, a backlog of > > >> > > > > > > > >> > > > > > produce requests may also be caused by high > > >> > produce > > >> > > > QPS. > > >> > > > > > > > >> > > > > > In that case, we may not want to kill the > > >> broker > > >> > and > > >> > > > > > that's > > >> > > > > > > > when > > >> > > > > > > > >> > this > > >> > > > > > > > >> > > > KIP > > >> > > > > > > > >> > > > > > can be useful, both for JBOD > > >> > > > > > > > >> > > > > > and non-JBOD setup. > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > Going back to your previous question about > > each > > >> > > > > > > ProduceRequest > > >> > > > > > > > >> > > covering > > >> > > > > > > > >> > > > > 20 > > >> > > > > > > > >> > > > > > partitions that are randomly > > >> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr > request > > >> is > > >> > > > > enqueued > > >> > > > > > > that > > >> > > > > > > > >> > tries > > >> > > > > > > > >> > > to > > >> > > > > > > > >> > > > > > switch the current broker, say broker0, > from > > >> > leader > > >> > > to > > >> > > > > > > > follower > > >> > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. > > For > > >> the > > >> > > > sake > > >> > > > > of > > >> > > > > > > > >> > argument, > > >> > > > > > > > >> > > > > > let's also assume the other brokers, say > > >> broker1, > > >> > > have > > >> > > > > > > > *stopped* > > >> > > > > > > > >> > > > fetching > > >> > > > > > > > >> > > > > > from > > >> > > > > > > > >> > > > > > the current broker, i.e. broker0. > > >> > > > > > > > >> > > > > > 1. If the enqueued produce requests have > > acks = > > >> > -1 > > >> > > > > (ALL) > > >> > > > > > > > >> > > > > > 1.1 without this KIP, the ProduceRequests > > >> ahead > > >> > of > > >> > > > > > > > >> LeaderAndISR > > >> > > > > > > > >> > > will > > >> > > > > > > > >> > > > be > > >> > > > > > > > >> > > > > > put into the purgatory, > > >> > > > > > > > >> > > > > > and since they'll never be > replicated > > >> to > > >> > > other > > >> > > > > > > brokers > > >> > > > > > > > >> > > (because > > >> > > > > > > > >> > > > > of > > >> > > > > > > > >> > > > > > the assumption made above), they will > > >> > > > > > > > >> > > > > > be completed either when the > > >> LeaderAndISR > > >> > > > > request > > >> > > > > > is > > >> > > > > > > > >> > > processed > > >> > > > > > > > >> > > > or > > >> > > > > > > > >> > > > > > when the timeout happens. > > >> > > > > > > > >> > > > > > 1.2 With this KIP, broker0 will > immediately > > >> > > > transition > > >> > > > > > the > > >> > > > > > > > >> > > partition > > >> > > > > > > > >> > > > > > test-0 to become a follower, > > >> > > > > > > > >> > > > > > after the current broker sees the > > >> > > replication > > >> > > > of > > >> > > > > > the > > >> > > > > > > > >> > > remaining > > >> > > > > > > > >> > > > 19 > > >> > > > > > > > >> > > > > > partitions, it can send a response > indicating > > >> that > > >> > > > > > > > >> > > > > > it's no longer the leader for the > > >> > "test-0". > > >> > > > > > > > >> > > > > > To see the latency difference between 1.1 > > and > > >> > 1.2, > > >> > > > > let's > > >> > > > > > > say > > >> > > > > > > > >> > there > > >> > > > > > > > >> > > > are > > >> > > > > > > > >> > > > > > 24K produce requests ahead of the > > LeaderAndISR, > > >> > and > > >> > > > > there > > >> > > > > > > are > > >> > > > > > > > 8 > > >> > > > > > > > >> io > > >> > > > > > > > >> > > > > threads, > > >> > > > > > > > >> > > > > > so each io thread will process > > approximately > > >> > 3000 > > >> > > > > > produce > > >> > > > > > > > >> > requests. > > >> > > > > > > > >> > > > Now > > >> > > > > > > > >> > > > > > let's investigate the io thread that > finally > > >> > > processed > > >> > > > > the > > >> > > > > > > > >> > > > LeaderAndISR. > > >> > > > > > > > >> > > > > > For the 3000 produce requests, if we > model > > >> the > > >> > > time > > >> > > > > when > > >> > > > > > > > their > > >> > > > > > > > >> > > > > remaining > > >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, > > and > > >> > the > > >> > > > > > > > LeaderAndISR > > >> > > > > > > > >> > > > request > > >> > > > > > > > >> > > > > is > > >> > > > > > > > >> > > > > > processed at time t3000. > > >> > > > > > > > >> > > > > > Without this KIP, the 1st produce request > > >> would > > >> > > have > > >> > > > > > > waited > > >> > > > > > > > an > > >> > > > > > > > >> > > extra > > >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd > an > > >> extra > > >> > > > time > > >> > > > > of > > >> > > > > > > > >> t3000 - > > >> > > > > > > > >> > > t1, > > >> > > > > > > > >> > > > > etc. > > >> > > > > > > > >> > > > > > Roughly speaking, the latency difference > is > > >> > bigger > > >> > > > for > > >> > > > > > the > > >> > > > > > > > >> > earlier > > >> > > > > > > > >> > > > > > produce requests than for the later ones. > For > > >> the > > >> > > same > > >> > > > > > > reason, > > >> > > > > > > > >> the > > >> > > > > > > > >> > > more > > >> > > > > > > > >> > > > > > ProduceRequests queued > > >> > > > > > > > >> > > > > > before the LeaderAndISR, the bigger > benefit > > >> we > > >> > get > > >> > > > > > (capped > > >> > > > > > > > by > > >> > > > > > > > >> the > > >> > > > > > > > >> > > > > > produce timeout). > > >> > > > > > > > >> > > > > > 2. If the enqueued produce requests have > > >> acks=0 or > > >> > > > > acks=1 > > >> > > > > > > > >> > > > > > There will be no latency differences in > > this > > >> > case, > > >> > > > but > > >> > > > > > > > >> > > > > > 2.1 without this KIP, the records of > > >> partition > > >> > > > test-0 > > >> > > > > in > > >> > > > > > > the > > >> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR > > will > > >> be > > >> > > > > appended > > >> > > > > > > to > > >> > > > > > > > >> the > > >> > > > > > > > >> > > local > > >> > > > > > > > >> > > > > log, > > >> > > > > > > > >> > > > > > and eventually be truncated after > > >> > processing > > >> > > > the > > >> > > > > > > > >> > > LeaderAndISR. > > >> > > > > > > > >> > > > > > This is what's referred to as > > >> > > > > > > > >> > > > > > "some unofficial definition of data > > >> loss > > >> > in > > >> > > > > terms > > >> > > > > > of > > >> > > > > > > > >> > messages > > >> > > > > > > > >> > > > > > beyond the high watermark". > > >> > > > > > > > >> > > > > > 2.2 with this KIP, we can mitigate the > > effect > > >> > > since > > >> > > > if > > >> > > > > > the > > >> > > > > > > > >> > > > LeaderAndISR > > >> > > > > > > > >> > > > > > is immediately processed, the response to > > >> > producers > > >> > > > will > > >> > > > > > > have > > >> > > > > > > > >> > > > > > the NotLeaderForPartition error, > > >> causing > > >> > > > > producers > > >> > > > > > > to > > >> > > > > > > > >> retry > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > This explanation above is the benefit for > > >> reducing > > >> > > the > > >> > > > > > > latency > > >> > > > > > > > >> of a > > >> > > > > > > > >> > > > > broker > > >> > > > > > > > >> > > > > > becoming the follower, > > >> > > > > > > > >> > > > > > closely related is reducing the latency of > a > > >> > broker > > >> > > > > > becoming > > >> > > > > > > > the > > >> > > > > > > > >> > > > leader. > > >> > > > > > > > >> > > > > > In this case, the benefit is even more > > >> obvious, if > > >> > > > other > > >> > > > > > > > brokers > > >> > > > > > > > >> > have > > >> > > > > > > > >> > > > > > resigned leadership, and the > > >> > > > > > > > >> > > > > > current broker should take leadership. Any > > >> delay > > >> > in > > >> > > > > > > processing > > >> > > > > > > > >> the > > >> > > > > > > > >> > > > > > LeaderAndISR will be perceived > > >> > > > > > > > >> > > > > > by clients as unavailability. In extreme > > cases, > > >> > this > > >> > > > can > > >> > > > > > > cause > > >> > > > > > > > >> > failed > > >> > > > > > > > >> > > > > > produce requests if the retries are > > >> > > > > > > > >> > > > > > exhausted. > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > Another two types of controller requests > are > > >> > > > > > UpdateMetadata > > >> > > > > > > > and > > >> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as > > >> > follows: > > >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed > > processing > > >> > > means > > >> > > > > > > clients > > >> > > > > > > > >> > > receiving > > >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong > > leadership > > >> > info > > >> > > > > > > > >> > > > > > for certain partitions, and the effect is > > more > > >> > > retries > > >> > > > > or > > >> > > > > > > even > > >> > > > > > > > >> > fatal > > >> > > > > > > > >> > > > > > failure if the retries are exhausted. > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > For StopReplica requests, a long queuing > time > > >> may > > >> > > > > degrade > > >> > > > > > > the > > >> > > > > > > > >> > > > performance > > >> > > > > > > > >> > > > > > of topic deletion. > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > Regarding your last question of the delay > for > > >> > > > > > > > >> > DescribeLogDirsRequest, > > >> > > > > > > > >> > > > you > > >> > > > > > > > >> > > > > > are right > > >> > > > > > > > >> > > > > > that this KIP cannot help with the latency > in > > >> > > getting > > >> > > > > the > > >> > > > > > > log > > >> > > > > > > > >> dirs > > >> > > > > > > > >> > > > info, > > >> > > > > > > > >> > > > > > and it's only relevant > > >> > > > > > > > >> > > > > > when controller requests are involved. > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > Regards, > > >> > > > > > > > >> > > > > > Lucas > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin < > > >> > > > > > > lindon...@gmail.com > > >> > > > > > > > > > > >> > > > > > > > >> > > wrote: > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > >> Hey Jun, > > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> Thanks much for the comments. It is good > > >> point. > > >> > So > > >> > > > the > > >> > > > > > > > feature > > >> > > > > > > > >> may > > >> > > > > > > > >> > > be > > >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one > > question > > >> > > below. > > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> Hey Lucas, > > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> Do you think this feature is also useful > for > > >> > > non-JBOD > > >> > > > > > setup > > >> > > > > > > > or > > >> > > > > > > > >> it > > >> > > > > > > > >> > is > > >> > > > > > > > >> > > > > only > > >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be > useful > > to > > >> > > > > understand > > >> > > > > > > > this. > > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in > > order > > >> to > > >> > > move > > >> > > > > > > leaders > > >> > > > > > > > >> on > > >> > > > > > > > >> > the > > >> > > > > > > > >> > > > > >> failed > > >> > > > > > > > >> > > > > >> disk to other disks, the system operator > > first > > >> > > needs > > >> > > > to > > >> > > > > > get > > >> > > > > > > > the > > >> > > > > > > > >> > list > > >> > > > > > > > >> > > > of > > >> > > > > > > > >> > > > > >> partitions on the failed disk. This is > > >> currently > > >> > > > > achieved > > >> > > > > > > > using > > >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends > > >> > > > > > > > >> DescribeLogDirsRequest > > >> > > > > > > > >> > to > > >> > > > > > > > >> > > > the > > >> > > > > > > > >> > > > > >> broker. If we only prioritize the > controller > > >> > > > requests, > > >> > > > > > then > > >> > > > > > > > the > > >> > > > > > > > >> > > > > >> DescribeLogDirsRequest > > >> > > > > > > > >> > > > > >> may still take a long time to be processed > > by > > >> the > > >> > > > > broker. > > >> > > > > > > So > > >> > > > > > > > >> the > > >> > > > > > > > >> > > > overall > > >> > > > > > > > >> > > > > >> time to move leaders away from the failed > > disk > > >> > may > > >> > > > > still > > >> > > > > > be > > >> > > > > > > > >> long > > >> > > > > > > > >> > > even > > >> > > > > > > > >> > > > > with > > >> > > > > > > > >> > > > > >> this KIP. What do you think? > > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> Thanks, > > >> > > > > > > > >> > > > > >> Dong > > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas > Wang < > > >> > > > > > > > >> lucasatu...@gmail.com > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > wrote: > > >> > > > > > > > >> > > > > >> > > >> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun. > > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > @Dong, > > >> > > > > > > > >> > > > > >> > Since both of the two comments in your > > >> previous > > >> > > > email > > >> > > > > > are > > >> > > > > > > > >> about > > >> > > > > > > > >> > > the > > >> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's > > >> useful, > > >> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you > > agree > > >> > that > > >> > > > > this > > >> > > > > > > KIP > > >> > > > > > > > >> can > > >> > > > > > > > >> > be > > >> > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun? > > >> > > > > > > > >> > > > > >> > Please let me know, thanks! > > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > Regards, > > >> > > > > > > > >> > > > > >> > Lucas > > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao > < > > >> > > > > > > j...@confluent.io> > > >> > > > > > > > >> > wrote: > > >> > > > > > > > >> > > > > >> > > > >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong, > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one > > >> > probably > > >> > > > > > should > > >> > > > > > > > just > > >> > > > > > > > >> > kill > > >> > > > > > > > >> > > > the > > >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not > > >> help. > > >> > If > > >> > > > > only > > >> > > > > > > one > > >> > > > > > > > of > > >> > > > > > > > >> > the > > >> > > > > > > > >> > > > > disks > > >> > > > > > > > >> > > > > >> on > > >> > > > > > > > >> > > > > >> > a > > >> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail > > that > > >> > disk > > >> > > > and > > >> > > > > > move > > >> > > > > > > > the > > >> > > > > > > > >> > > > leaders > > >> > > > > > > > >> > > > > on > > >> > > > > > > > >> > > > > >> > that > > >> > > > > > > > >> > > > > >> > > disk to other brokers. In that case, > > being > > >> > able > > >> > > > to > > >> > > > > > > > process > > >> > > > > > > > >> the > > >> > > > > > > > >> > > > > >> > LeaderAndIsr > > >> > > > > > > > >> > > > > >> > > requests faster will potentially help > > the > > >> > > > producers > > >> > > > > > > > recover > > >> > > > > > > > >> > > > quicker. > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > Thanks, > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > Jun > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong > > Lin < > > >> > > > > > > > >> lindon...@gmail.com > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > wrote: > > >> > > > > > > > >> > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > Hey Lucas, > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up > > >> > > questions > > >> > > > > > below. > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest > > >> covers > > >> > 20 > > >> > > > > > > > partitions > > >> > > > > > > > >> > that > > >> > > > > > > > >> > > > are > > >> > > > > > > > >> > > > > >> > > randomly > > >> > > > > > > > >> > > > > >> > > > distributed across all partitions, > > then > > >> > each > > >> > > > > > > > >> ProduceRequest > > >> > > > > > > > >> > > will > > >> > > > > > > > >> > > > > >> likely > > >> > > > > > > > >> > > > > >> > > > cover some partitions for which the > > >> broker > > >> > is > > >> > > > > still > > >> > > > > > > > >> leader > > >> > > > > > > > >> > > after > > >> > > > > > > > >> > > > > it > > >> > > > > > > > >> > > > > >> > > quickly > > >> > > > > > > > >> > > > > >> > > > processes the > > >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker > will > > >> still > > >> > > be > > >> > > > > slow > > >> > > > > > > in > > >> > > > > > > > >> > > > processing > > >> > > > > > > > >> > > > > >> these > > >> > > > > > > > >> > > > > >> > > > ProduceRequest and request will > still > > be > > >> > very > > >> > > > > high > > >> > > > > > > with > > >> > > > > > > > >> this > > >> > > > > > > > >> > > > KIP. > > >> > > > > > > > >> > > > > It > > >> > > > > > > > >> > > > > >> > > seems > > >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will still > > >> timeout > > >> > > > after > > >> > > > > > 30 > > >> > > > > > > > >> > seconds. > > >> > > > > > > > >> > > Is > > >> > > > > > > > >> > > > > >> this > > >> > > > > > > > >> > > > > >> > > > understanding correct? > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest > > will > > >> > > still > > >> > > > > > > timeout > > >> > > > > > > > >> after > > >> > > > > > > > >> > > 30 > > >> > > > > > > > >> > > > > >> > seconds, > > >> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP > > >> reduces > > >> > > > > average > > >> > > > > > > > >> produce > > >> > > > > > > > >> > > > > latency. > > >> > > > > > > > >> > > > > >> Can > > >> > > > > > > > >> > > > > >> > > you > > >> > > > > > > > >> > > > > >> > > > clarify what metrics can be improved > > by > > >> > this > > >> > > > KIP? > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > Not sure why system operator > directly > > >> cares > > >> > > > > number > > >> > > > > > of > > >> > > > > > > > >> > > truncated > > >> > > > > > > > >> > > > > >> > messages. > > >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve > > average > > >> > > > > throughput > > >> > > > > > > or > > >> > > > > > > > >> > reduce > > >> > > > > > > > >> > > > > >> message > > >> > > > > > > > >> > > > > >> > > > duplication? It will be good to > > >> understand > > >> > > > this. > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > Thanks, > > >> > > > > > > > >> > > > > >> > > > Dong > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas > > >> Wang < > > >> > > > > > > > >> > > lucasatu...@gmail.com > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > wrote: > > >> > > > > > > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > Hi Dong, > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments. > > >> Please > > >> > > see > > >> > > > > my > > >> > > > > > > > reply > > >> > > > > > > > >> > > below. > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 > > >> > partition. > > >> > > > Now > > >> > > > > > > let's > > >> > > > > > > > >> > > consider > > >> > > > > > > > >> > > > a > > >> > > > > > > > >> > > > > >> more > > >> > > > > > > > >> > > > > >> > > > common > > >> > > > > > > > >> > > > > >> > > > > scenario > > >> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of > many > > >> > > > partitions. > > >> > > > > > And > > >> > > > > > > > >> let's > > >> > > > > > > > >> > > say > > >> > > > > > > > >> > > > > for > > >> > > > > > > > >> > > > > >> > some > > >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow. > > >> > > > > > > > >> > > > > >> > > > > The number of leader partitions on > > >> > broker0 > > >> > > is > > >> > > > > so > > >> > > > > > > > large, > > >> > > > > > > > >> > say > > >> > > > > > > > >> > > > 10K, > > >> > > > > > > > >> > > > > >> that > > >> > > > > > > > >> > > > > >> > > the > > >> > > > > > > > >> > > > > >> > > > > cluster is skewed, > > >> > > > > > > > >> > > > > >> > > > > and the operator would like to > shift > > >> the > > >> > > > > > leadership > > >> > > > > > > > >> for a > > >> > > > > > > > >> > > lot > > >> > > > > > > > >> > > > of > > >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other > > brokers, > > >> > > > > > > > >> > > > > >> > > > > either manually or through some > > >> service > > >> > > like > > >> > > > > > cruise > > >> > > > > > > > >> > control. > > >> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the > > >> > leadership > > >> > > > > > > > transitions > > >> > > > > > > > >> > > finish > > >> > > > > > > > >> > > > > >> more > > >> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster > itself > > >> > > becoming > > >> > > > > more > > >> > > > > > > > >> > balanced, > > >> > > > > > > > >> > > > > >> > > > > but all existing producers > > >> corresponding > > >> > to > > >> > > > the > > >> > > > > > 9K > > >> > > > > > > > >> > > partitions > > >> > > > > > > > >> > > > > will > > >> > > > > > > > >> > > > > >> > get > > >> > > > > > > > >> > > > > >> > > > the > > >> > > > > > > > >> > > > > >> > > > > errors relatively quickly > > >> > > > > > > > >> > > > > >> > > > > rather than relying on their > > timeout, > > >> > > thanks > > >> > > > to > > >> > > > > > the > > >> > > > > > > > >> > batched > > >> > > > > > > > >> > > > > async > > >> > > > > > > > >> > > > > >> ZK > > >> > > > > > > > >> > > > > >> > > > > operations. > > >> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to > have > > >> > during > > >> > > > such > > >> > > > > > > > >> > troublesome > > >> > > > > > > > >> > > > > times. > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google > Doc > > >> have > > >> > > > shown > > >> > > > > > > that > > >> > > > > > > > >> with > > >> > > > > > > > >> > > this > > >> > > > > > > > >> > > > > KIP > > >> > > > > > > > >> > > > > >> > many > > >> > > > > > > > >> > > > > >> > > > > producers > > >> > > > > > > > >> > > > > >> > > > > receive an explicit error > > >> > > > > NotLeaderForPartition, > > >> > > > > > > > based > > >> > > > > > > > >> on > > >> > > > > > > > >> > > > which > > >> > > > > > > > >> > > > > >> they > > >> > > > > > > > >> > > > > >> > > > retry > > >> > > > > > > > >> > > > > >> > > > > immediately. > > >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14 > > >> seconds+quick > > >> > > > retry) > > >> > > > > > for > > >> > > > > > > > >> their > > >> > > > > > > > >> > > > single > > >> > > > > > > > >> > > > > >> > > message > > >> > > > > > > > >> > > > > >> > > > is > > >> > > > > > > > >> > > > > >> > > > > much smaller > > >> > > > > > > > >> > > > > >> > > > > compared with the case of timing > out > > >> > > without > > >> > > > > the > > >> > > > > > > KIP > > >> > > > > > > > >> (30 > > >> > > > > > > > >> > > > seconds > > >> > > > > > > > >> > > > > >> for > > >> > > > > > > > >> > > > > >> > > > timing > > >> > > > > > > > >> > > > > >> > > > > out + quick retry). > > >> > > > > > > > >> > > > > >> > > > > One might argue that reducing the > > >> timing > > >> > > out > > >> > > > on > > >> > > > > > the > > >> > > > > > > > >> > producer > > >> > > > > > > > >> > > > > side > > >> > > > > > > > >> > > > > >> can > > >> > > > > > > > >> > > > > >> > > > > achieve the same result, > > >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its > own > > >> > > > > > drawbacks[1]. > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to > > show > > >> the > > >> > > > > number > > >> > > > > > of > > >> > > > > > > > >> > > truncated > > >> > > > > > > > >> > > > > >> > messages > > >> > > > > > > > >> > > > > >> > > on > > >> > > > > > > > >> > > > > >> > > > > brokers, > > >> > > > > > > > >> > > > > >> > > > > with the experiments done in the > > >> Google > > >> > > Doc, > > >> > > > it > > >> > > > > > > > should > > >> > > > > > > > >> be > > >> > > > > > > > >> > > easy > > >> > > > > > > > >> > > > > to > > >> > > > > > > > >> > > > > >> see > > >> > > > > > > > >> > > > > >> > > > that > > >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need > > >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since > the > > >> > > > up-to-date > > >> > > > > > > > >> metadata > > >> > > > > > > > >> > > > avoids > > >> > > > > > > > >> > > > > >> > > appending > > >> > > > > > > > >> > > > > >> > > > > of messages > > >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If > > we > > >> > talk > > >> > > > to a > > >> > > > > > > > system > > >> > > > > > > > >> > > > operator > > >> > > > > > > > >> > > > > >> and > > >> > > > > > > > >> > > > > >> > ask > > >> > > > > > > > >> > > > > >> > > > > whether > > >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I > > bet > > >> > most > > >> > > > > likely > > >> > > > > > > the > > >> > > > > > > > >> > answer > > >> > > > > > > > >> > > > is > > >> > > > > > > > >> > > > > >> yes. > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I > think > > it > > >> > > might > > >> > > > be > > >> > > > > > > > >> helpful to > > >> > > > > > > > >> > > > > >> construct > > >> > > > > > > > >> > > > > >> > > some > > >> > > > > > > > >> > > > > >> > > > > formulas. > > >> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm > going > > >> back > > >> > to > > >> > > > the > > >> > > > > > > case > > >> > > > > > > > >> where > > >> > > > > > > > >> > > > there > > >> > > > > > > > >> > > > > >> is > > >> > > > > > > > >> > > > > >> > > only > > >> > > > > > > > >> > > > > >> > > > > ONE partition involved. > > >> > > > > > > > >> > > > > >> > > > > Following the experiments in the > > >> Google > > >> > > Doc, > > >> > > > > > let's > > >> > > > > > > > say > > >> > > > > > > > >> > > broker0 > > >> > > > > > > > >> > > > > >> > becomes > > >> > > > > > > > >> > > > > >> > > > the > > >> > > > > > > > >> > > > > >> > > > > follower at time t0, > > >> > > > > > > > >> > > > > >> > > > > and after t0 there were still N > > >> produce > > >> > > > > requests > > >> > > > > > in > > >> > > > > > > > its > > >> > > > > > > > >> > > > request > > >> > > > > > > > >> > > > > >> > queue. > > >> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata > brought > > >> by > > >> > > this > > >> > > > > KIP, > > >> > > > > > > > >> broker0 > > >> > > > > > > > >> > > can > > >> > > > > > > > >> > > > > >> reply > > >> > > > > > > > >> > > > > >> > > with > > >> > > > > > > > >> > > > > >> > > > an > > >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception, > > >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average > > >> > > processing > > >> > > > > > time > > >> > > > > > > of > > >> > > > > > > > >> > > replying > > >> > > > > > > > >> > > > > >> with > > >> > > > > > > > >> > > > > >> > > such > > >> > > > > > > > >> > > > > >> > > > an > > >> > > > > > > > >> > > > > >> > > > > error message. > > >> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will > > >> need to > > >> > > > > append > > >> > > > > > > > >> messages > > >> > > > > > > > >> > to > > >> > > > > > > > >> > > > > >> > segments, > > >> > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk, > > >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average > > >> > > processing > > >> > > > > > time > > >> > > > > > > > for > > >> > > > > > > > >> > such > > >> > > > > > > > >> > > > > logic. > > >> > > > > > > > >> > > > > >> > > > > Then the average extra latency > > >> incurred > > >> > > > without > > >> > > > > > > this > > >> > > > > > > > >> KIP > > >> > > > > > > > >> > is > > >> > > > > > > > >> > > N > > >> > > > > > > > >> > > > * > > >> > > > > > > > >> > > > > >> (M2 - > > >> > > > > > > > >> > > > > >> > > > M1) / > > >> > > > > > > > >> > > > > >> > > > > 2. > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be > > >> larger > > >> > > than > > >> > > > > M1, > > >> > > > > > > > which > > >> > > > > > > > >> > means > > >> > > > > > > > >> > > > as > > >> > > > > > > > >> > > > > >> long > > >> > > > > > > > >> > > > > >> > > as N > > >> > > > > > > > >> > > > > >> > > > > is positive, > > >> > > > > > > > >> > > > > >> > > > > we would see improvements on the > > >> average > > >> > > > > latency. > > >> > > > > > > > >> > > > > >> > > > > There does not need to be > > significant > > >> > > backlog > > >> > > > > of > > >> > > > > > > > >> requests > > >> > > > > > > > >> > in > > >> > > > > > > > >> > > > the > > >> > > > > > > > >> > > > > >> > > request > > >> > > > > > > > >> > > > > >> > > > > queue, > > >> > > > > > > > >> > > > > >> > > > > or severe degradation of disk > > >> performance > > >> > > to > > >> > > > > have > > >> > > > > > > the > > >> > > > > > > > >> > > > > improvement. > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > Regards, > > >> > > > > > > > >> > > > > >> > > > > Lucas > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the > > >> timeout on > > >> > > the > > >> > > > > > > > producer > > >> > > > > > > > >> > side > > >> > > > > > > > >> > > > can > > >> > > > > > > > >> > > > > >> > trigger > > >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests > > >> > > > > > > > >> > > > > >> > > > > when the corresponding leader > broker > > >> is > > >> > > > > > overloaded, > > >> > > > > > > > >> > > > exacerbating > > >> > > > > > > > >> > > > > >> the > > >> > > > > > > > >> > > > > >> > > > > situation. > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, > Dong > > >> Lin > > >> > < > > >> > > > > > > > >> > > lindon...@gmail.com > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > wrote: > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > >> > > > > > Hey Lucas, > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed > > >> > > documentation > > >> > > > of > > >> > > > > > the > > >> > > > > > > > >> > > > experiment. > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > > > >> > > > > >> > > > > > Initially I also think having a > > >> > separate > > >> > > > > queue > > >> > > > > > > for > > >> > > > > > > > >> > > > controller > > >> > > > > > > > >> > > > > >> > > requests > > >> > > > > > > > >> > > > > >> > > > is > > >> > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned > > in > > >> the > > >> > > > > summary > > >> > > > > > > > >> section > > >> > > > > > > > >> > of > > >> > > > > > > > >> > > > the > > >> > > > > > > > >> > > > > >> > Google > > >> > > > > > > > >> > > > > >> > > > > doc, > > >> > > > > > > > >> > > > > >> > > > > > controller requests are > generally > > >> more > > >> > > -- -Regards, Mayuresh R. Gharat (862) 250-7125