Personally I am not fond of the dequeue approach simply because it is against the basic idea of isolating the controller plane and data plane. With a single dequeue, theoretically speaking the controller requests can starve the clients requests. I would prefer the approach with a separate controller request queue and a dedicated controller request handler thread.
Thanks, Jiangjie (Becket) Qin On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <lucasatu...@gmail.com> wrote: > Sure, I can summarize the usage of correlation id. But before I do that, it > seems > the same out-of-order processing can also happen to Produce requests sent > by producers, > following the same example you described earlier. > If that's the case, I think this probably deserves a separate doc and > design independent of this KIP. > > Lucas > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <lindon...@gmail.com> wrote: > > > Hey Lucas, > > > > Could you update the KIP if you are confident with the approach which > uses > > correlation id? The idea around correlation id is kind of scattered > across > > multiple emails. It will be useful if other reviews can read the KIP to > > understand the latest proposal. > > > > Thanks, > > Dong > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat < > > gharatmayures...@gmail.com> wrote: > > > > > I like the idea of the dequeue implementation by Lucas. This will help > us > > > avoid additional queue for controller and additional configs in Kafka. > > > > > > Thanks, > > > > > > Mayuresh > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <becket....@gmail.com> > wrote: > > > > > > > Hi Jun, > > > > > > > > The usage of correlation ID might still be useful to address the > cases > > > > that the controller epoch and leader epoch check are not sufficient > to > > > > guarantee correct behavior. For example, if the controller sends a > > > > LeaderAndIsrRequest followed by a StopReplicaRequest, and the broker > > > > processes it in the reverse order, the replica may still be wrongly > > > > recreated, right? > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <j...@confluent.io> wrote: > > > > > > > > > > Hmm, since we already use controller epoch and leader epoch for > > > properly > > > > > caching the latest partition state, do we really need correlation > id > > > for > > > > > ordering the controller requests? > > > > > > > > > > Thanks, > > > > > > > > > > Jun > > > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <becket....@gmail.com> > > > > wrote: > > > > > > > > > >> Lucas and Mayuresh, > > > > >> > > > > >> Good idea. The correlation id should work. > > > > >> > > > > >> In the ControllerChannelManager, a request will be resent until a > > > > response > > > > >> is received. So if the controller to broker connection disconnects > > > after > > > > >> controller sends R1_a, but before the response of R1_a is > received, > > a > > > > >> disconnection may cause the controller to resend R1_b. i.e. until > R1 > > > is > > > > >> acked, R2 won't be sent by the controller. > > > > >> This gives two guarantees: > > > > >> 1. Correlation id wise: R1_a < R1_b < R2. > > > > >> 2. On the broker side, when R2 is seen, R1 must have been > processed > > at > > > > >> least once. > > > > >> > > > > >> So on the broker side, with a single thread controller request > > > handler, > > > > the > > > > >> logic should be: > > > > >> 1. Process what ever request seen in the controller request queue > > > > >> 2. For the given epoch, drop request if its correlation id is > > smaller > > > > than > > > > >> that of the last processed request. > > > > >> > > > > >> Thanks, > > > > >> > > > > >> Jiangjie (Becket) Qin > > > > >> > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <j...@confluent.io> > wrote: > > > > >> > > > > >>> I agree that there is no strong ordering when there are more than > > one > > > > >>> socket connections. Currently, we rely on controllerEpoch and > > > > leaderEpoch > > > > >>> to ensure that the receiving broker picks up the latest state for > > > each > > > > >>> partition. > > > > >>> > > > > >>> One potential issue with the dequeue approach is that if the > queue > > is > > > > >> full, > > > > >>> there is no guarantee that the controller requests will be > enqueued > > > > >>> quickly. > > > > >>> > > > > >>> Thanks, > > > > >>> > > > > >>> Jun > > > > >>> > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat < > > > > >>> gharatmayures...@gmail.com > > > > >>>> wrote: > > > > >>> > > > > >>>> Yea, the correlationId is only set to 0 in the NetworkClient > > > > >> constructor. > > > > >>>> Since we reuse the same NetworkClient between Controller and the > > > > >> broker, > > > > >>> a > > > > >>>> disconnection should not cause it to reset to 0, in which case > it > > > can > > > > >> be > > > > >>>> used to reject obsolete requests. > > > > >>>> > > > > >>>> Thanks, > > > > >>>> > > > > >>>> Mayuresh > > > > >>>> > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang < > lucasatu...@gmail.com > > > > > > > >>> wrote: > > > > >>>> > > > > >>>>> @Dong, > > > > >>>>> Great example and explanation, thanks! > > > > >>>>> > > > > >>>>> @All > > > > >>>>> Regarding the example given by Dong, it seems even if we use a > > > queue, > > > > >>>> and a > > > > >>>>> dedicated controller request handling thread, > > > > >>>>> the same result can still happen because R1_a will be sent on > one > > > > >>>>> connection, and R1_b & R2 will be sent on a different > connection, > > > > >>>>> and there is no ordering between different connections on the > > > broker > > > > >>>> side. > > > > >>>>> I was discussing with Mayuresh offline, and it seems > correlation > > id > > > > >>>> within > > > > >>>>> the same NetworkClient object is monotonically increasing and > > never > > > > >>>> reset, > > > > >>>>> hence a broker can leverage that to properly reject obsolete > > > > >> requests. > > > > >>>>> Thoughts? > > > > >>>>> > > > > >>>>> Thanks, > > > > >>>>> Lucas > > > > >>>>> > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat < > > > > >>>>> gharatmayures...@gmail.com> wrote: > > > > >>>>> > > > > >>>>>> Actually nvm, correlationId is reset in case of connection > > loss, I > > > > >>>> think. > > > > >>>>>> > > > > >>>>>> Thanks, > > > > >>>>>> > > > > >>>>>> Mayuresh > > > > >>>>>> > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat < > > > > >>>>>> gharatmayures...@gmail.com> > > > > >>>>>> wrote: > > > > >>>>>> > > > > >>>>>>> I agree with Dong that out-of-order processing can happen > with > > > > >>>> having 2 > > > > >>>>>>> separate queues as well and it can even happen today. > > > > >>>>>>> Can we use the correlationId in the request from the > controller > > > > >> to > > > > >>>> the > > > > >>>>>>> broker to handle ordering ? > > > > >>>>>>> > > > > >>>>>>> Thanks, > > > > >>>>>>> > > > > >>>>>>> Mayuresh > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin < > > becket....@gmail.com > > > > >>> > > > > >>>>> wrote: > > > > >>>>>>> > > > > >>>>>>>> Good point, Joel. I agree that a dedicated controller > request > > > > >>>> handling > > > > >>>>>>>> thread would be a better isolation. It also solves the > > > > >> reordering > > > > >>>>> issue. > > > > >>>>>>>> > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy < > > > > >> jjkosh...@gmail.com> > > > > >>>>>> wrote: > > > > >>>>>>>> > > > > >>>>>>>>> Good example. I think this scenario can occur in the > current > > > > >>> code > > > > >>>> as > > > > >>>>>>>> well > > > > >>>>>>>>> but with even lower probability given that there are other > > > > >>>>>>>> non-controller > > > > >>>>>>>>> requests interleaved. It is still sketchy though and I > think > > a > > > > >>>> safer > > > > >>>>>>>>> approach would be separate queues and pinning controller > > > > >> request > > > > >>>>>>>> handling > > > > >>>>>>>>> to one handler thread. > > > > >>>>>>>>> > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin < > > > > >> lindon...@gmail.com > > > > >>>> > > > > >>>>>> wrote: > > > > >>>>>>>>> > > > > >>>>>>>>>> Hey Becket, > > > > >>>>>>>>>> > > > > >>>>>>>>>> I think you are right that there may be out-of-order > > > > >>> processing. > > > > >>>>>>>> However, > > > > >>>>>>>>>> it seems that out-of-order processing may also happen even > > > > >> if > > > > >>> we > > > > >>>>>> use a > > > > >>>>>>>>>> separate queue. > > > > >>>>>>>>>> > > > > >>>>>>>>>> Here is the example: > > > > >>>>>>>>>> > > > > >>>>>>>>>> - Controller sends R1 and got disconnected before > receiving > > > > >>>>>> response. > > > > >>>>>>>>> Then > > > > >>>>>>>>>> it reconnects and sends R2. Both requests now stay in the > > > > >>>>> controller > > > > >>>>>>>>>> request queue in the order they are sent. > > > > >>>>>>>>>> - thread1 takes R1_a from the request queue and then > thread2 > > > > >>>> takes > > > > >>>>>> R2 > > > > >>>>>>>>> from > > > > >>>>>>>>>> the request queue almost at the same time. > > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel. There is > chance > > > > >>> that > > > > >>>>>> R2's > > > > >>>>>>>>>> processing is completed before R1. > > > > >>>>>>>>>> > > > > >>>>>>>>>> If out-of-order processing can happen for both approaches > > > > >> with > > > > >>>>> very > > > > >>>>>>>> low > > > > >>>>>>>>>> probability, it may not be worthwhile to add the extra > > > > >> queue. > > > > >>>> What > > > > >>>>>> do > > > > >>>>>>>> you > > > > >>>>>>>>>> think? > > > > >>>>>>>>>> > > > > >>>>>>>>>> Thanks, > > > > >>>>>>>>>> Dong > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin < > > > > >>>> becket....@gmail.com > > > > >>>>>> > > > > >>>>>>>>> wrote: > > > > >>>>>>>>>> > > > > >>>>>>>>>>> Hi Mayuresh/Joel, > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Using the request channel as a dequeue was bright up some > > > > >>> time > > > > >>>>> ago > > > > >>>>>>>> when > > > > >>>>>>>>>> we > > > > >>>>>>>>>>> initially thinking of prioritizing the request. The > > > > >> concern > > > > >>>> was > > > > >>>>>> that > > > > >>>>>>>>> the > > > > >>>>>>>>>>> controller requests are supposed to be processed in > order. > > > > >>> If > > > > >>>> we > > > > >>>>>> can > > > > >>>>>>>>>> ensure > > > > >>>>>>>>>>> that there is one controller request in the request > > > > >> channel, > > > > >>>> the > > > > >>>>>>>> order > > > > >>>>>>>>> is > > > > >>>>>>>>>>> not a concern. But in cases that there are more than one > > > > >>>>>> controller > > > > >>>>>>>>>> request > > > > >>>>>>>>>>> inserted into the queue, the controller request order may > > > > >>>> change > > > > >>>>>> and > > > > >>>>>>>>>> cause > > > > >>>>>>>>>>> problem. For example, think about the following sequence: > > > > >>>>>>>>>>> 1. Controller successfully sent a request R1 to broker > > > > >>>>>>>>>>> 2. Broker receives R1 and put the request to the head of > > > > >> the > > > > >>>>>> request > > > > >>>>>>>>>> queue. > > > > >>>>>>>>>>> 3. Controller to broker connection failed and the > > > > >> controller > > > > >>>>>>>>> reconnected > > > > >>>>>>>>>> to > > > > >>>>>>>>>>> the broker. > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the broker > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head of the > > > > >> request > > > > >>>>> queue. > > > > >>>>>>>>>>> Now on the broker side, R2 will be processed before R1 is > > > > >>>>>> processed, > > > > >>>>>>>>>> which > > > > >>>>>>>>>>> may cause problem. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Thanks, > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Jiangjie (Becket) Qin > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy < > > > > >>>>> jjkosh...@gmail.com> > > > > >>>>>>>>> wrote: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a simpler > > > > >>>> less > > > > >>>>>>>>> invasive > > > > >>>>>>>>>>>> alternative and it should work. Jun/Becket/others, do > > > > >> you > > > > >>>> see > > > > >>>>>> any > > > > >>>>>>>>>>> pitfalls > > > > >>>>>>>>>>>> with this approach? > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang < > > > > >>>>>>>> lucasatu...@gmail.com> > > > > >>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> @Mayuresh, > > > > >>>>>>>>>>>>> That's a very interesting idea that I haven't thought > > > > >>>>> before. > > > > >>>>>>>>>>>>> It seems to solve our problem at hand pretty well, and > > > > >>>> also > > > > >>>>>>>>>>>>> avoids the need to have a new size metric and capacity > > > > >>>>> config > > > > >>>>>>>>>>>>> for the controller request queue. In fact, if we were > > > > >> to > > > > >>>>> adopt > > > > >>>>>>>>>>>>> this design, there is no public interface change, and > > > > >> we > > > > >>>>>>>>>>>>> probably don't need a KIP. > > > > >>>>>>>>>>>>> Also implementation wise, it seems > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily satisfy > > > > >>> the > > > > >>>>>>>>>> requirement > > > > >>>>>>>>>>>>> by supporting a capacity, and also allowing inserting > > > > >> at > > > > >>>>> both > > > > >>>>>>>> ends. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> My only concern is that this design is tied to the > > > > >>>>> coincidence > > > > >>>>>>>> that > > > > >>>>>>>>>>>>> we have two request priorities and there are two ends > > > > >>> to a > > > > >>>>>>>> deque. > > > > >>>>>>>>>>>>> Hence by using the proposed design, it seems the > > > > >> network > > > > >>>>> layer > > > > >>>>>>>> is > > > > >>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g. if > > > > >> we > > > > >>>> were > > > > >>>>>> to > > > > >>>>>>>> add > > > > >>>>>>>>>>>>> an extra priority level in the future for some reason, > > > > >>> we > > > > >>>>>> would > > > > >>>>>>>>>>> probably > > > > >>>>>>>>>>>>> need to go back to the design of separate queues, one > > > > >>> for > > > > >>>>> each > > > > >>>>>>>>>> priority > > > > >>>>>>>>>>>>> level. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and lean toward > > > > >>> your > > > > >>>>>>>> suggested > > > > >>>>>>>>>>>>> approach. > > > > >>>>>>>>>>>>> Let's hear what others think. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> @Becket, > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm > > > > >>> answering > > > > >>>>>> your > > > > >>>>>>>>>>> question > > > > >>>>>>>>>>>>> only in the context > > > > >>>>>>>>>>>>> of the current KIP design: I think your suggestion > > > > >> makes > > > > >>>>>> sense, > > > > >>>>>>>> and > > > > >>>>>>>>>> I'm > > > > >>>>>>>>>>>> ok > > > > >>>>>>>>>>>>> with removing the capacity config and > > > > >>>>>>>>>>>>> just relying on the default value of 20 being > > > > >> sufficient > > > > >>>>>> enough. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>> Lucas > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat < > > > > >>>>>>>>>>>>> gharatmayures...@gmail.com > > > > >>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Hi Lucas, > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Seems like the main intent here is to prioritize the > > > > >>>>>>>> controller > > > > >>>>>>>>>>> request > > > > >>>>>>>>>>>>>> over any other requests. > > > > >>>>>>>>>>>>>> In that case, we can change the request queue to a > > > > >>>>> dequeue, > > > > >>>>>>>> where > > > > >>>>>>>>>> you > > > > >>>>>>>>>>>>>> always insert the normal requests (produce, > > > > >>>> consume,..etc) > > > > >>>>>> to > > > > >>>>>>>> the > > > > >>>>>>>>>> end > > > > >>>>>>>>>>>> of > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller request, you > > > > >>> insert > > > > >>>>> it > > > > >>>>>> to > > > > >>>>>>>>> the > > > > >>>>>>>>>>> head > > > > >>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>> the queue. This ensures that the controller request > > > > >>> will > > > > >>>>> be > > > > >>>>>>>> given > > > > >>>>>>>>>>>> higher > > > > >>>>>>>>>>>>>> priority over other requests. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Also since we only read one request from the socket > > > > >>> and > > > > >>>>> mute > > > > >>>>>>>> it > > > > >>>>>>>>> and > > > > >>>>>>>>>>>> only > > > > >>>>>>>>>>>>>> unmute it after handling the request, this would > > > > >>> ensure > > > > >>>>> that > > > > >>>>>>>> we > > > > >>>>>>>>>> don't > > > > >>>>>>>>>>>>>> handle controller requests out of order. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> With this approach we can avoid the second queue and > > > > >>> the > > > > >>>>>>>>> additional > > > > >>>>>>>>>>>>> config > > > > >>>>>>>>>>>>>> for the size of the queue. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> What do you think ? > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Mayuresh > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin < > > > > >>>>>>>> becket....@gmail.com > > > > >>>>>>>>>> > > > > >>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> Hey Joel, > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree the > > > > >>> current > > > > >>>>>> design > > > > >>>>>>>>>> makes > > > > >>>>>>>>>>>>> sense. > > > > >>>>>>>>>>>>>>> My confusion is about whether the new config for > > > > >> the > > > > >>>>>>>> controller > > > > >>>>>>>>>>> queue > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case in > > > > >>>> which > > > > >>>>>>>> users > > > > >>>>>>>>>>> would > > > > >>>>>>>>>>>>>> change > > > > >>>>>>>>>>>>>>> it. > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin < > > > > >>>>>>>>>> becket....@gmail.com> > > > > >>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> Hi Lucas, > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we > > > > >>>> expect > > > > >>>>>>>> user to > > > > >>>>>>>>>>> ever > > > > >>>>>>>>>>>>>> change > > > > >>>>>>>>>>>>>>>> the controller request queue capacity"? If we > > > > >>> agree > > > > >>>>> that > > > > >>>>>>>> 20 > > > > >>>>>>>>> is > > > > >>>>>>>>>>>>> already > > > > >>>>>>>>>>>>>> a > > > > >>>>>>>>>>>>>>>> very generous default number and we do not > > > > >> expect > > > > >>>> user > > > > >>>>>> to > > > > >>>>>>>>>> change > > > > >>>>>>>>>>>> it, > > > > >>>>>>>>>>>>> is > > > > >>>>>>>>>>>>>>> it > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a config? > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang < > > > > >>>>>>>>>>> lucasatu...@gmail.com > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> @Becket > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that > > > > >>>>> normally > > > > >>>>>>>> there > > > > >>>>>>>>>>>> should > > > > >>>>>>>>>>>>> be > > > > >>>>>>>>>>>>>>>>> just > > > > >>>>>>>>>>>>>>>>> one controller request because of muting, > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there would be > > > > >> many > > > > >>>>>>>> enqueued > > > > >>>>>>>>>>>>> controller > > > > >>>>>>>>>>>>>>>>> requests. > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not sure > > > > >>>> which > > > > >>>>>> part > > > > >>>>>>>>>>> conveys > > > > >>>>>>>>>>>>> that > > > > >>>>>>>>>>>>>>>>> info. > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out the > > > > >>>>> section. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal > > > > >>>> conditions, > > > > >>>>>> the > > > > >>>>>>>>>> current > > > > >>>>>>>>>>>>>> design > > > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers running > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have the > > > > >>>>> controller > > > > >>>>>>>>> queue > > > > >>>>>>>>>>>>> capacity > > > > >>>>>>>>>>>>>>>>> config and simply make its capacity to be 1, > > > > >>>>>>>>>>>>>>>>> network threads handling requests from > > > > >> different > > > > >>>>>>>> controllers > > > > >>>>>>>>>>> will > > > > >>>>>>>>>>>> be > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome times, > > > > >>>>>>>>>>>>>>>>> which is probably not what we want. On the > > > > >> other > > > > >>>>> hand, > > > > >>>>>>>>> adding > > > > >>>>>>>>>>> the > > > > >>>>>>>>>>>>>> extra > > > > >>>>>>>>>>>>>>>>> config with a default value, say 20, guards us > > > > >>> from > > > > >>>>>>>> issues > > > > >>>>>>>>> in > > > > >>>>>>>>>>>> those > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much > > > > >>>> downside > > > > >>>>> of > > > > >>>>>>>>> adding > > > > >>>>>>>>>>> the > > > > >>>>>>>>>>>>>> extra > > > > >>>>>>>>>>>>>>>>> config. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> @Mayuresh > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete > > > > >>> statement > > > > >>>>>> based > > > > >>>>>>>> on > > > > >>>>>>>>> a > > > > >>>>>>>>>>>>> previous > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>>>> Lucas > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh > > > > >>> Gharat < > > > > >>>>>>>>>>>>>>>>> gharatmayures...@gmail.com> wrote: > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> Hi Lucas, > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP. > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you think "The > > > > >>>> memory > > > > >>>>>>>>>>> consumption > > > > >>>>>>>>>>>>> can > > > > >>>>>>>>>>>>>>> rise > > > > >>>>>>>>>>>>>>>>>> given the total number of queued requests can > > > > >>> go > > > > >>>> up > > > > >>>>>> to > > > > >>>>>>>> 2x" > > > > >>>>>>>>>> in > > > > >>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>> impact > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from > > > > >> controller > > > > >>>> to a > > > > >>>>>>>> Broker > > > > >>>>>>>>>> are > > > > >>>>>>>>>>>> not > > > > >>>>>>>>>>>>>>> high > > > > >>>>>>>>>>>>>>>>>> volume, right ? > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> Mayuresh > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin < > > > > >>>>>>>>>>>> becket....@gmail.com> > > > > >>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the > > > > >>>> control > > > > >>>>>>>> plane > > > > >>>>>>>>>> from > > > > >>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>> data > > > > >>>>>>>>>>>>>>>>>> plane > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense. > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the > > > > >> controller > > > > >>>>>> request > > > > >>>>>>>>> queue > > > > >>>>>>>>>>> may > > > > >>>>>>>>>>>>>> have > > > > >>>>>>>>>>>>>>>>> many > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case? > > > > >>> The > > > > >>>>>>>>> controller > > > > >>>>>>>>>>>>>> requests > > > > >>>>>>>>>>>>>>>>> still > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The > > > > >>> SocketServer > > > > >>>>>> will > > > > >>>>>>>>> mute > > > > >>>>>>>>>>> the > > > > >>>>>>>>>>>>>>> channel > > > > >>>>>>>>>>>>>>>>>> once > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the request > > > > >>>>> channel. > > > > >>>>>>>> So > > > > >>>>>>>>>>>> assuming > > > > >>>>>>>>>>>>>>> there > > > > >>>>>>>>>>>>>>>>> is > > > > >>>>>>>>>>>>>>>>>>> only one connection between controller and > > > > >>> each > > > > >>>>>>>> broker, > > > > >>>>>>>>> on > > > > >>>>>>>>>>> the > > > > >>>>>>>>>>>>>>> broker > > > > >>>>>>>>>>>>>>>>>> side, > > > > >>>>>>>>>>>>>>>>>>> there should be only one controller request > > > > >>> in > > > > >>>>> the > > > > >>>>>>>>>>> controller > > > > >>>>>>>>>>>>>>> request > > > > >>>>>>>>>>>>>>>>>> queue > > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do > > > > >> we > > > > >>>>> need > > > > >>>>>> a > > > > >>>>>>>>>>> separate > > > > >>>>>>>>>>>>>>>>> controller > > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The default > > > > >>>> value > > > > >>>>> 20 > > > > >>>>>>>>> means > > > > >>>>>>>>>>> that > > > > >>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>>>> expect > > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen > > > > >>> in a > > > > >>>>>> short > > > > >>>>>>>>>> period > > > > >>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>> time. > > > > >>>>>>>>>>>>>>>>> I > > > > >>>>>>>>>>>>>>>>>> am > > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should increase > > > > >> the > > > > >>>>>>>> controller > > > > >>>>>>>>>>>> request > > > > >>>>>>>>>>>>>>> queue > > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems > > > > >>>>>> indicating > > > > >>>>>>>>>>> something > > > > >>>>>>>>>>>>>> very > > > > >>>>>>>>>>>>>>>>> wrong > > > > >>>>>>>>>>>>>>>>>>> has happened. > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin < > > > > >>>>>>>>>>>> lindon...@gmail.com> > > > > >>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas. > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is > > > > >>> intuitive. > > > > >>>> It > > > > >>>>>>>> will > > > > >>>>>>>>> be > > > > >>>>>>>>>>> good > > > > >>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>>>> learn > > > > >>>>>>>>>>>>>>>>>>> more > > > > >>>>>>>>>>>>>>>>>>>> about the comments from other reviewers. > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas > > > > >>> Wang < > > > > >>>>>>>>>>>>>>> lucasatu...@gmail.com> > > > > >>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong, > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of > > > > >>> the > > > > >>>>> KIP > > > > >>>>>> by > > > > >>>>>>>>>>>> explaining > > > > >>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>> cases > > > > >>>>>>>>>>>>>>>>>>>> that > > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts. > > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your > > > > >>>>>> comments. > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>>>>>>>> Lucas > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas > > > > >>> Wang > > > > >>>> < > > > > >>>>>>>>>>>>>>> lucasatu...@gmail.com > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong, > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is > > > > >>>> merely > > > > >>>>>>>> for me > > > > >>>>>>>>>> to > > > > >>>>>>>>>>>>> easily > > > > >>>>>>>>>>>>>>>>>>> construct > > > > >>>>>>>>>>>>>>>>>>>> a > > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario > > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests. > > > > >> In > > > > >>>>>>>> production, > > > > >>>>>>>>>>> other > > > > >>>>>>>>>>>>>> than > > > > >>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>> disk > > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused > > > > >> by > > > > >>>> high > > > > >>>>>>>>> produce > > > > >>>>>>>>>>> QPS. > > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill > > > > >>> the > > > > >>>>>>>> broker > > > > >>>>>>>>> and > > > > >>>>>>>>>>>>> that's > > > > >>>>>>>>>>>>>>> when > > > > >>>>>>>>>>>>>>>>>> this > > > > >>>>>>>>>>>>>>>>>>>> KIP > > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD > > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup. > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question > > > > >>>> about > > > > >>>>>> each > > > > >>>>>>>>>>>>>> ProduceRequest > > > > >>>>>>>>>>>>>>>>>>> covering > > > > >>>>>>>>>>>>>>>>>>>>> 20 > > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly > > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr > > > > >>>>> request > > > > >>>>>>>> is > > > > >>>>>>>>>>>> enqueued > > > > >>>>>>>>>>>>>> that > > > > >>>>>>>>>>>>>>>>>> tries > > > > >>>>>>>>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say > > > > >> broker0, > > > > >>>>> from > > > > >>>>>>>>> leader > > > > >>>>>>>>>> to > > > > >>>>>>>>>>>>>>> follower > > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say > > > > >>>> *test-0*. > > > > >>>>>> For > > > > >>>>>>>> the > > > > >>>>>>>>>>> sake > > > > >>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>>> argument, > > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers, > > > > >>> say > > > > >>>>>>>> broker1, > > > > >>>>>>>>>> have > > > > >>>>>>>>>>>>>>> *stopped* > > > > >>>>>>>>>>>>>>>>>>>> fetching > > > > >>>>>>>>>>>>>>>>>>>>>> from > > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0. > > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests > > > > >>> have > > > > >>>>>> acks = > > > > >>>>>>>>> -1 > > > > >>>>>>>>>>>> (ALL) > > > > >>>>>>>>>>>>>>>>>>>>>> 1.1 without this KIP, the > > > > >>>> ProduceRequests > > > > >>>>>>>> ahead > > > > >>>>>>>>> of > > > > >>>>>>>>>>>>>>>>> LeaderAndISR > > > > >>>>>>>>>>>>>>>>>>> will > > > > >>>>>>>>>>>>>>>>>>>> be > > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory, > > > > >>>>>>>>>>>>>>>>>>>>>> and since they'll never be > > > > >>>>> replicated > > > > >>>>>>>> to > > > > >>>>>>>>>> other > > > > >>>>>>>>>>>>>> brokers > > > > >>>>>>>>>>>>>>>>>>> (because > > > > >>>>>>>>>>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will > > > > >>>>>>>>>>>>>>>>>>>>>> be completed either when the > > > > >>>>>>>> LeaderAndISR > > > > >>>>>>>>>>>> request > > > > >>>>>>>>>>>>> is > > > > >>>>>>>>>>>>>>>>>>> processed > > > > >>>>>>>>>>>>>>>>>>>> or > > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens. > > > > >>>>>>>>>>>>>>>>>>>>>> 1.2 With this KIP, broker0 will > > > > >>>>> immediately > > > > >>>>>>>>>>> transition > > > > >>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>> partition > > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower, > > > > >>>>>>>>>>>>>>>>>>>>>> after the current broker sees > > > > >>> the > > > > >>>>>>>>>> replication > > > > >>>>>>>>>>> of > > > > >>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>> remaining > > > > >>>>>>>>>>>>>>>>>>>> 19 > > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response > > > > >>>>> indicating > > > > >>>>>>>> that > > > > >>>>>>>>>>>>>>>>>>>>>> it's no longer the leader for > > > > >>> the > > > > >>>>>>>>> "test-0". > > > > >>>>>>>>>>>>>>>>>>>>>> To see the latency difference > > > > >> between > > > > >>>> 1.1 > > > > >>>>>> and > > > > >>>>>>>>> 1.2, > > > > >>>>>>>>>>>> let's > > > > >>>>>>>>>>>>>> say > > > > >>>>>>>>>>>>>>>>>> there > > > > >>>>>>>>>>>>>>>>>>>> are > > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the > > > > >>>>>> LeaderAndISR, > > > > >>>>>>>>> and > > > > >>>>>>>>>>>> there > > > > >>>>>>>>>>>>>> are > > > > >>>>>>>>>>>>>>> 8 > > > > >>>>>>>>>>>>>>>>> io > > > > >>>>>>>>>>>>>>>>>>>>> threads, > > > > >>>>>>>>>>>>>>>>>>>>>> so each io thread will process > > > > >>>>>> approximately > > > > >>>>>>>>> 3000 > > > > >>>>>>>>>>>>> produce > > > > >>>>>>>>>>>>>>>>>> requests. > > > > >>>>>>>>>>>>>>>>>>>> Now > > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that > > > > >>>>> finally > > > > >>>>>>>>>> processed > > > > >>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR. > > > > >>>>>>>>>>>>>>>>>>>>>> For the 3000 produce requests, if > > > > >> we > > > > >>>>> model > > > > >>>>>>>> the > > > > >>>>>>>>>> time > > > > >>>>>>>>>>>> when > > > > >>>>>>>>>>>>>>> their > > > > >>>>>>>>>>>>>>>>>>>>> remaining > > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1, > > > > >>>> ...t2999, > > > > >>>>>> and > > > > >>>>>>>>> the > > > > >>>>>>>>>>>>>>> LeaderAndISR > > > > >>>>>>>>>>>>>>>>>>>> request > > > > >>>>>>>>>>>>>>>>>>>>> is > > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000. > > > > >>>>>>>>>>>>>>>>>>>>>> Without this KIP, the 1st produce > > > > >>>> request > > > > >>>>>>>> would > > > > >>>>>>>>>> have > > > > >>>>>>>>>>>>>> waited > > > > >>>>>>>>>>>>>>> an > > > > >>>>>>>>>>>>>>>>>>> extra > > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the > > > > >>> 2nd > > > > >>>>> an > > > > >>>>>>>> extra > > > > >>>>>>>>>>> time > > > > >>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>> t3000 - > > > > >>>>>>>>>>>>>>>>>>> t1, > > > > >>>>>>>>>>>>>>>>>>>>> etc. > > > > >>>>>>>>>>>>>>>>>>>>>> Roughly speaking, the latency > > > > >>>> difference > > > > >>>>> is > > > > >>>>>>>>> bigger > > > > >>>>>>>>>>> for > > > > >>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>> earlier > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later > > > > >>> ones. > > > > >>>>> For > > > > >>>>>>>> the > > > > >>>>>>>>>> same > > > > >>>>>>>>>>>>>> reason, > > > > >>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>> more > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued > > > > >>>>>>>>>>>>>>>>>>>>>> before the LeaderAndISR, the bigger > > > > >>>>> benefit > > > > >>>>>>>> we > > > > >>>>>>>>> get > > > > >>>>>>>>>>>>> (capped > > > > >>>>>>>>>>>>>>> by > > > > >>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout). > > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests > > > > >>> have > > > > >>>>>>>> acks=0 or > > > > >>>>>>>>>>>> acks=1 > > > > >>>>>>>>>>>>>>>>>>>>>> There will be no latency > > > > >> differences > > > > >>> in > > > > >>>>>> this > > > > >>>>>>>>> case, > > > > >>>>>>>>>>> but > > > > >>>>>>>>>>>>>>>>>>>>>> 2.1 without this KIP, the records > > > > >> of > > > > >>>>>>>> partition > > > > >>>>>>>>>>> test-0 > > > > >>>>>>>>>>>> in > > > > >>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the > > > > >>> LeaderAndISR > > > > >>>>>> will > > > > >>>>>>>> be > > > > >>>>>>>>>>>> appended > > > > >>>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>> local > > > > >>>>>>>>>>>>>>>>>>>>> log, > > > > >>>>>>>>>>>>>>>>>>>>>> and eventually be truncated > > > > >>> after > > > > >>>>>>>>> processing > > > > >>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR. > > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as > > > > >>>>>>>>>>>>>>>>>>>>>> "some unofficial definition > > > > >> of > > > > >>>> data > > > > >>>>>>>> loss > > > > >>>>>>>>> in > > > > >>>>>>>>>>>> terms > > > > >>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>>> messages > > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark". > > > > >>>>>>>>>>>>>>>>>>>>>> 2.2 with this KIP, we can mitigate > > > > >>> the > > > > >>>>>> effect > > > > >>>>>>>>>> since > > > > >>>>>>>>>>> if > > > > >>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR > > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the > > > > >> response > > > > >>> to > > > > >>>>>>>>> producers > > > > >>>>>>>>>>> will > > > > >>>>>>>>>>>>>> have > > > > >>>>>>>>>>>>>>>>>>>>>> the NotLeaderForPartition > > > > >>> error, > > > > >>>>>>>> causing > > > > >>>>>>>>>>>> producers > > > > >>>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>>>> retry > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit > > > > >>> for > > > > >>>>>>>> reducing > > > > >>>>>>>>>> the > > > > >>>>>>>>>>>>>> latency > > > > >>>>>>>>>>>>>>>>> of a > > > > >>>>>>>>>>>>>>>>>>>>> broker > > > > >>>>>>>>>>>>>>>>>>>>>> becoming the follower, > > > > >>>>>>>>>>>>>>>>>>>>>> closely related is reducing the > > > > >> latency > > > > >>>> of > > > > >>>>> a > > > > >>>>>>>>> broker > > > > >>>>>>>>>>>>> becoming > > > > >>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>> leader. > > > > >>>>>>>>>>>>>>>>>>>>>> In this case, the benefit is even > > > > >> more > > > > >>>>>>>> obvious, if > > > > >>>>>>>>>>> other > > > > >>>>>>>>>>>>>>> brokers > > > > >>>>>>>>>>>>>>>>>> have > > > > >>>>>>>>>>>>>>>>>>>>>> resigned leadership, and the > > > > >>>>>>>>>>>>>>>>>>>>>> current broker should take > > > > >> leadership. > > > > >>>> Any > > > > >>>>>>>> delay > > > > >>>>>>>>> in > > > > >>>>>>>>>>>>>> processing > > > > >>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>>> LeaderAndISR will be perceived > > > > >>>>>>>>>>>>>>>>>>>>>> by clients as unavailability. In > > > > >>> extreme > > > > >>>>>> cases, > > > > >>>>>>>>> this > > > > >>>>>>>>>>> can > > > > >>>>>>>>>>>>>> cause > > > > >>>>>>>>>>>>>>>>>> failed > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests if the retries are > > > > >>>>>>>>>>>>>>>>>>>>>> exhausted. > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> Another two types of controller > > > > >>> requests > > > > >>>>> are > > > > >>>>>>>>>>>>> UpdateMetadata > > > > >>>>>>>>>>>>>>> and > > > > >>>>>>>>>>>>>>>>>>>>>> StopReplica, which I'll briefly > > > > >> discuss > > > > >>>> as > > > > >>>>>>>>> follows: > > > > >>>>>>>>>>>>>>>>>>>>>> For UpdateMetadata requests, delayed > > > > >>>>>> processing > > > > >>>>>>>>>> means > > > > >>>>>>>>>>>>>> clients > > > > >>>>>>>>>>>>>>>>>>> receiving > > > > >>>>>>>>>>>>>>>>>>>>>> stale metadata, e.g. with the wrong > > > > >>>>>> leadership > > > > >>>>>>>>> info > > > > >>>>>>>>>>>>>>>>>>>>>> for certain partitions, and the > > > > >> effect > > > > >>> is > > > > >>>>>> more > > > > >>>>>>>>>> retries > > > > >>>>>>>>>>>> or > > > > >>>>>>>>>>>>>> even > > > > >>>>>>>>>>>>>>>>>> fatal > > > > >>>>>>>>>>>>>>>>>>>>>> failure if the retries are exhausted. > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> For StopReplica requests, a long > > > > >>> queuing > > > > >>>>> time > > > > >>>>>>>> may > > > > >>>>>>>>>>>> degrade > > > > >>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>> performance > > > > >>>>>>>>>>>>>>>>>>>>>> of topic deletion. > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> Regarding your last question of the > > > > >>> delay > > > > >>>>> for > > > > >>>>>>>>>>>>>>>>>> DescribeLogDirsRequest, > > > > >>>>>>>>>>>>>>>>>>>> you > > > > >>>>>>>>>>>>>>>>>>>>>> are right > > > > >>>>>>>>>>>>>>>>>>>>>> that this KIP cannot help with the > > > > >>>> latency > > > > >>>>> in > > > > >>>>>>>>>> getting > > > > >>>>>>>>>>>> the > > > > >>>>>>>>>>>>>> log > > > > >>>>>>>>>>>>>>>>> dirs > > > > >>>>>>>>>>>>>>>>>>>> info, > > > > >>>>>>>>>>>>>>>>>>>>>> and it's only relevant > > > > >>>>>>>>>>>>>>>>>>>>>> when controller requests are > > > > >> involved. > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> Regards, > > > > >>>>>>>>>>>>>>>>>>>>>> Lucas > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 5:11 PM, Dong > > > > >>> Lin > > > > >>>> < > > > > >>>>>>>>>>>>>> lindon...@gmail.com > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>> Hey Jun, > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks much for the comments. It is > > > > >>> good > > > > >>>>>>>> point. > > > > >>>>>>>>> So > > > > >>>>>>>>>>> the > > > > >>>>>>>>>>>>>>> feature > > > > >>>>>>>>>>>>>>>>> may > > > > >>>>>>>>>>>>>>>>>>> be > > > > >>>>>>>>>>>>>>>>>>>>>>> useful for JBOD use-case. I have one > > > > >>>>>> question > > > > >>>>>>>>>> below. > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>> Hey Lucas, > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>> Do you think this feature is also > > > > >>> useful > > > > >>>>> for > > > > >>>>>>>>>> non-JBOD > > > > >>>>>>>>>>>>> setup > > > > >>>>>>>>>>>>>>> or > > > > >>>>>>>>>>>>>>>>> it > > > > >>>>>>>>>>>>>>>>>> is > > > > >>>>>>>>>>>>>>>>>>>>> only > > > > >>>>>>>>>>>>>>>>>>>>>>> useful for the JBOD setup? It may be > > > > >>>>> useful > > > > >>>>>> to > > > > >>>>>>>>>>>> understand > > > > >>>>>>>>>>>>>>> this. > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>> When the broker is setup using JBOD, > > > > >>> in > > > > >>>>>> order > > > > >>>>>>>> to > > > > >>>>>>>>>> move > > > > >>>>>>>>>>>>>> leaders > > > > >>>>>>>>>>>>>>>>> on > > > > >>>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>>>> failed > > > > >>>>>>>>>>>>>>>>>>>>>>> disk to other disks, the system > > > > >>> operator > > > > >>>>>> first > > > > >>>>>>>>>> needs > > > > >>>>>>>>>>> to > > > > >>>>>>>>>>>>> get > > > > >>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>> list > > > > >>>>>>>>>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>>>>>>>> partitions on the failed disk. This > > > > >> is > > > > >>>>>>>> currently > > > > >>>>>>>>>>>> achieved > > > > >>>>>>>>>>>>>>> using > > > > >>>>>>>>>>>>>>>>>>>>>>> AdminClient.describeLogDirs(), which > > > > >>>> sends > > > > >>>>>>>>>>>>>>>>> DescribeLogDirsRequest > > > > >>>>>>>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>>>> broker. If we only prioritize the > > > > >>>>> controller > > > > >>>>>>>>>>> requests, > > > > >>>>>>>>>>>>> then > > > > >>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest > > > > >>>>>>>>>>>>>>>>>>>>>>> may still take a long time to be > > > > >>>> processed > > > > >>>>>> by > > > > >>>>>>>> the > > > > >>>>>>>>>>>> broker. > > > > >>>>>>>>>>>>>> So > > > > >>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>> overall > > > > >>>>>>>>>>>>>>>>>>>>>>> time to move leaders away from the > > > > >>>> failed > > > > >>>>>> disk > > > > >>>>>>>>> may > > > > >>>>>>>>>>>> still > > > > >>>>>>>>>>>>> be > > > > >>>>>>>>>>>>>>>>> long > > > > >>>>>>>>>>>>>>>>>>> even > > > > >>>>>>>>>>>>>>>>>>>>> with > > > > >>>>>>>>>>>>>>>>>>>>>>> this KIP. What do you think? > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>>>>>>>>>> Dong > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 4:38 PM, > > > > >> Lucas > > > > >>>>> Wang < > > > > >>>>>>>>>>>>>>>>> lucasatu...@gmail.com > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the insightful comment, > > > > >>>> Jun. > > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>> @Dong, > > > > >>>>>>>>>>>>>>>>>>>>>>>> Since both of the two comments in > > > > >>> your > > > > >>>>>>>> previous > > > > >>>>>>>>>>> email > > > > >>>>>>>>>>>>> are > > > > >>>>>>>>>>>>>>>>> about > > > > >>>>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>>>>> benefits of this KIP and whether > > > > >>> it's > > > > >>>>>>>> useful, > > > > >>>>>>>>>>>>>>>>>>>>>>>> in light of Jun's last comment, do > > > > >>> you > > > > >>>>>> agree > > > > >>>>>>>>> that > > > > >>>>>>>>>>>> this > > > > >>>>>>>>>>>>>> KIP > > > > >>>>>>>>>>>>>>>>> can > > > > >>>>>>>>>>>>>>>>>> be > > > > >>>>>>>>>>>>>>>>>>>>>>>> beneficial in the case mentioned > > > > >> by > > > > >>>> Jun? > > > > >>>>>>>>>>>>>>>>>>>>>>>> Please let me know, thanks! > > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Regards, > > > > >>>>>>>>>>>>>>>>>>>>>>>> Lucas > > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 2:07 PM, > > > > >> Jun > > > > >>>> Rao > > > > >>>>> < > > > > >>>>>>>>>>>>>> j...@confluent.io> > > > > >>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lucas, Dong, > > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> If all disks on a broker are > > > > >> slow, > > > > >>>> one > > > > >>>>>>>>> probably > > > > >>>>>>>>>>>>> should > > > > >>>>>>>>>>>>>>> just > > > > >>>>>>>>>>>>>>>>>> kill > > > > >>>>>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>>>>>> broker. In that case, this KIP > > > > >> may > > > > >>>> not > > > > >>>>>>>> help. > > > > >>>>>>>>> If > > > > >>>>>>>>>>>> only > > > > >>>>>>>>>>>>>> one > > > > >>>>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>> disks > > > > >>>>>>>>>>>>>>>>>>>>>>> on > > > > >>>>>>>>>>>>>>>>>>>>>>>> a > > > > >>>>>>>>>>>>>>>>>>>>>>>>> broker is slow, one may want to > > > > >>> fail > > > > >>>>>> that > > > > >>>>>>>>> disk > > > > >>>>>>>>>>> and > > > > >>>>>>>>>>>>> move > > > > >>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>> leaders > > > > >>>>>>>>>>>>>>>>>>>>> on > > > > >>>>>>>>>>>>>>>>>>>>>>>> that > > > > >>>>>>>>>>>>>>>>>>>>>>>>> disk to other brokers. In that > > > > >>> case, > > > > >>>>>> being > > > > >>>>>>>>> able > > > > >>>>>>>>>>> to > > > > >>>>>>>>>>>>>>> process > > > > >>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsr > > > > >>>>>>>>>>>>>>>>>>>>>>>>> requests faster will potentially > > > > >>>> help > > > > >>>>>> the > > > > >>>>>>>>>>> producers > > > > >>>>>>>>>>>>>>> recover > > > > >>>>>>>>>>>>>>>>>>>> quicker. > > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Jun > > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 2, 2018 at 7:56 PM, > > > > >>> Dong > > > > >>>>>> Lin < > > > > >>>>>>>>>>>>>>>>> lindon...@gmail.com > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas, > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the reply. Some > > > > >>> follow > > > > >>>> up > > > > >>>>>>>>>> questions > > > > >>>>>>>>>>>>> below. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 1, if each > > > > >>>> ProduceRequest > > > > >>>>>>>> covers > > > > >>>>>>>>> 20 > > > > >>>>>>>>>>>>>>> partitions > > > > >>>>>>>>>>>>>>>>>> that > > > > >>>>>>>>>>>>>>>>>>>> are > > > > >>>>>>>>>>>>>>>>>>>>>>>>> randomly > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> distributed across all > > > > >>> partitions, > > > > >>>>>> then > > > > >>>>>>>>> each > > > > >>>>>>>>>>>>>>>>> ProduceRequest > > > > >>>>>>>>>>>>>>>>>>> will > > > > >>>>>>>>>>>>>>>>>>>>>>> likely > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> cover some partitions for > > > > >> which > > > > >>>> the > > > > >>>>>>>> broker > > > > >>>>>>>>> is > > > > >>>>>>>>>>>> still > > > > >>>>>>>>>>>>>>>>> leader > > > > >>>>>>>>>>>>>>>>>>> after > > > > >>>>>>>>>>>>>>>>>>>>> it > > > > >>>>>>>>>>>>>>>>>>>>>>>>> quickly > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> processes the > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsrRequest. Then > > > > >> broker > > > > >>>>> will > > > > >>>>>>>> still > > > > >>>>>>>>>> be > > > > >>>>>>>>>>>> slow > > > > >>>>>>>>>>>>>> in > > > > >>>>>>>>>>>>>>>>>>>> processing > > > > >>>>>>>>>>>>>>>>>>>>>>> these > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> ProduceRequest and request > > > > >> will > > > > >>>>> still > > > > >>>>>> be > > > > >>>>>>>>> very > > > > >>>>>>>>>>>> high > > > > >>>>>>>>>>>>>> with > > > > >>>>>>>>>>>>>>>>> this > > > > >>>>>>>>>>>>>>>>>>>> KIP. > > > > >>>>>>>>>>>>>>>>>>>>> It > > > > >>>>>>>>>>>>>>>>>>>>>>>>> seems > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> that most ProduceRequest will > > > > >>>> still > > > > >>>>>>>> timeout > > > > >>>>>>>>>>> after > > > > >>>>>>>>>>>>> 30 > > > > >>>>>>>>>>>>>>>>>> seconds. > > > > >>>>>>>>>>>>>>>>>>> Is > > > > >>>>>>>>>>>>>>>>>>>>>>> this > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> understanding correct? > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 2, if most > > > > >>>> ProduceRequest > > > > >>>>>> will > > > > >>>>>>>>>> still > > > > >>>>>>>>>>>>>> timeout > > > > >>>>>>>>>>>>>>>>> after > > > > >>>>>>>>>>>>>>>>>>> 30 > > > > >>>>>>>>>>>>>>>>>>>>>>>> seconds, > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> then it is less clear how this > > > > >>> KIP > > > > >>>>>>>> reduces > > > > >>>>>>>>>>>> average > > > > >>>>>>>>>>>>>>>>> produce > > > > >>>>>>>>>>>>>>>>>>>>> latency. > > > > >>>>>>>>>>>>>>>>>>>>>>> Can > > > > >>>>>>>>>>>>>>>>>>>>>>>>> you > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> clarify what metrics can be > > > > >>>> improved > > > > >>>>>> by > > > > >>>>>>>>> this > > > > >>>>>>>>>>> KIP? > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Not sure why system operator > > > > >>>>> directly > > > > >>>>>>>> cares > > > > >>>>>>>>>>>> number > > > > >>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>>>> truncated > > > > >>>>>>>>>>>>>>>>>>>>>>>> messages. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Do you mean this KIP can > > > > >> improve > > > > >>>>>> average > > > > >>>>>>>>>>>> throughput > > > > >>>>>>>>>>>>>> or > > > > >>>>>>>>>>>>>>>>>> reduce > > > > >>>>>>>>>>>>>>>>>>>>>>> message > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> duplication? It will be good > > > > >> to > > > > >>>>>>>> understand > > > > >>>>>>>>>>> this. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Dong > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 Jul 2018 at 7:12 AM > > > > >>>> Lucas > > > > >>>>>>>> Wang < > > > > >>>>>>>>>>>>>>>>>>> lucasatu...@gmail.com > > > > >>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dong, > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your valuable > > > > >>>> comments. > > > > >>>>>>>> Please > > > > >>>>>>>>>> see > > > > >>>>>>>>>>>> my > > > > >>>>>>>>>>>>>>> reply > > > > >>>>>>>>>>>>>>>>>>> below. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The Google doc showed > > > > >> only > > > > >>> 1 > > > > >>>>>>>>> partition. > > > > >>>>>>>>>>> Now > > > > >>>>>>>>>>>>>> let's > > > > >>>>>>>>>>>>>>>>>>> consider > > > > >>>>>>>>>>>>>>>>>>>> a > > > > >>>>>>>>>>>>>>>>>>>>>>> more > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> common > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> scenario > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> where broker0 is the leader > > > > >> of > > > > >>>>> many > > > > >>>>>>>>>>> partitions. > > > > >>>>>>>>>>>>> And > > > > >>>>>>>>>>>>>>>>> let's > > > > >>>>>>>>>>>>>>>>>>> say > > > > >>>>>>>>>>>>>>>>>>>>> for > > > > >>>>>>>>>>>>>>>>>>>>>>>> some > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> reason its IO becomes slow. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> The number of leader > > > > >>> partitions > > > > >>>> on > > > > >>>>>>>>> broker0 > > > > >>>>>>>>>> is > > > > >>>>>>>>>>>> so > > > > >>>>>>>>>>>>>>> large, > > > > >>>>>>>>>>>>>>>>>> say > > > > >>>>>>>>>>>>>>>>>>>> 10K, > > > > >>>>>>>>>>>>>>>>>>>>>>> that > > > > >>>>>>>>>>>>>>>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster is skewed, > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> and the operator would like > > > > >> to > > > > >>>>> shift > > > > >>>>>>>> the > > > > >>>>>>>>>>>>> leadership > > > > >>>>>>>>>>>>>>>>> for a > > > > >>>>>>>>>>>>>>>>>>> lot > > > > >>>>>>>>>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> partitions, say 9K, to other > > > > >>>>>> brokers, > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> either manually or through > > > > >>> some > > > > >>>>>>>> service > > > > >>>>>>>>>> like > > > > >>>>>>>>>>>>> cruise > > > > >>>>>>>>>>>>>>>>>> control. > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> With this KIP, not only will > > > > >>> the > > > > > > > > > > > > > -- > > > -Regards, > > > Mayuresh R. Gharat > > > (862) 250-7125 > > > > > >