Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Becket Qin Sun, 22 Jul 2018 02:59:20 -0700

Hi Jun,

The usage of correlation ID might still be useful to address the cases that the 
controller epoch and leader epoch check are not sufficient to guarantee correct 
behavior. For example, if the controller sends a LeaderAndIsrRequest followed 
by a StopReplicaRequest, and the broker processes it in the reverse order, the 
replica may still be wrongly recreated, right?


Thanks,

Jiangjie (Becket) Qin

> On Jul 22, 2018, at 11:47 AM, Jun Rao <[email protected]> wrote:
> 
> Hmm, since we already use controller epoch and leader epoch for properly
> caching the latest partition state, do we really need correlation id for
> ordering the controller requests?
> 
> Thanks,
> 
> Jun
> 
> On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <[email protected]> wrote:
> 
>> Lucas and Mayuresh,
>> 
>> Good idea. The correlation id should work.
>> 
>> In the ControllerChannelManager, a request will be resent until a response
>> is received. So if the controller to broker connection disconnects after
>> controller sends R1_a, but before the response of R1_a is received, a
>> disconnection may cause the controller to resend R1_b. i.e. until R1 is
>> acked, R2 won't be sent by the controller.
>> This gives two guarantees:
>> 1. Correlation id wise: R1_a < R1_b < R2.
>> 2. On the broker side, when R2 is seen, R1 must have been processed at
>> least once.
>> 
>> So on the broker side, with a single thread controller request handler, the
>> logic should be:
>> 1. Process what ever request seen in the controller request queue
>> 2. For the given epoch, drop request if its correlation id is smaller than
>> that of the last processed request.
>> 
>> Thanks,
>> 
>> Jiangjie (Becket) Qin
>> 
>> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <[email protected]> wrote:
>> 
>>> I agree that there is no strong ordering when there are more than one
>>> socket connections. Currently, we rely on controllerEpoch and leaderEpoch
>>> to ensure that the receiving broker picks up the latest state for each
>>> partition.
>>> 
>>> One potential issue with the dequeue approach is that if the queue is
>> full,
>>> there is no guarantee that the controller requests will be enqueued
>>> quickly.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
>>> [email protected]
>>>> wrote:
>>> 
>>>> Yea, the correlationId is only set to 0 in the NetworkClient
>> constructor.
>>>> Since we reuse the same NetworkClient between Controller and the
>> broker,
>>> a
>>>> disconnection should not cause it to reset to 0, in which case it can
>> be
>>>> used to reject obsolete requests.
>>>> 
>>>> Thanks,
>>>> 
>>>> Mayuresh
>>>> 
>>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <[email protected]>
>>> wrote:
>>>> 
>>>>> @Dong,
>>>>> Great example and explanation, thanks!
>>>>> 
>>>>> @All
>>>>> Regarding the example given by Dong, it seems even if we use a queue,
>>>> and a
>>>>> dedicated controller request handling thread,
>>>>> the same result can still happen because R1_a will be sent on one
>>>>> connection, and R1_b & R2 will be sent on a different connection,
>>>>> and there is no ordering between different connections on the broker
>>>> side.
>>>>> I was discussing with Mayuresh offline, and it seems correlation id
>>>> within
>>>>> the same NetworkClient object is monotonically increasing and never
>>>> reset,
>>>>> hence a broker can leverage that to properly reject obsolete
>> requests.
>>>>> Thoughts?
>>>>> 
>>>>> Thanks,
>>>>> Lucas
>>>>> 
>>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Actually nvm, correlationId is reset in case of connection loss, I
>>>> think.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Mayuresh
>>>>>> 
>>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
>>>>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> I agree with Dong that out-of-order processing can happen with
>>>> having 2
>>>>>>> separate queues as well and it can even happen today.
>>>>>>> Can we use the correlationId in the request from the controller
>> to
>>>> the
>>>>>>> broker to handle ordering ?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Mayuresh
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <[email protected]
>>> 
>>>>> wrote:
>>>>>>> 
>>>>>>>> Good point, Joel. I agree that a dedicated controller request
>>>> handling
>>>>>>>> thread would be a better isolation. It also solves the
>> reordering
>>>>> issue.
>>>>>>>> 
>>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
>> [email protected]>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Good example. I think this scenario can occur in the current
>>> code
>>>> as
>>>>>>>> well
>>>>>>>>> but with even lower probability given that there are other
>>>>>>>> non-controller
>>>>>>>>> requests interleaved. It is still sketchy though and I think a
>>>> safer
>>>>>>>>> approach would be separate queues and pinning controller
>> request
>>>>>>>> handling
>>>>>>>>> to one handler thread.
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
>> [email protected]
>>>> 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hey Becket,
>>>>>>>>>> 
>>>>>>>>>> I think you are right that there may be out-of-order
>>> processing.
>>>>>>>> However,
>>>>>>>>>> it seems that out-of-order processing may also happen even
>> if
>>> we
>>>>>> use a
>>>>>>>>>> separate queue.
>>>>>>>>>> 
>>>>>>>>>> Here is the example:
>>>>>>>>>> 
>>>>>>>>>> - Controller sends R1 and got disconnected before receiving
>>>>>> response.
>>>>>>>>> Then
>>>>>>>>>> it reconnects and sends R2. Both requests now stay in the
>>>>> controller
>>>>>>>>>> request queue in the order they are sent.
>>>>>>>>>> - thread1 takes R1_a from the request queue and then thread2
>>>> takes
>>>>>> R2
>>>>>>>>> from
>>>>>>>>>> the request queue almost at the same time.
>>>>>>>>>> - So R1_a and R2 are processed in parallel. There is chance
>>> that
>>>>>> R2's
>>>>>>>>>> processing is completed before R1.
>>>>>>>>>> 
>>>>>>>>>> If out-of-order processing can happen for both approaches
>> with
>>>>> very
>>>>>>>> low
>>>>>>>>>> probability, it may not be worthwhile to add the extra
>> queue.
>>>> What
>>>>>> do
>>>>>>>> you
>>>>>>>>>> think?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Dong
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
>>>> [email protected]
>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Mayuresh/Joel,
>>>>>>>>>>> 
>>>>>>>>>>> Using the request channel as a dequeue was bright up some
>>> time
>>>>> ago
>>>>>>>> when
>>>>>>>>>> we
>>>>>>>>>>> initially thinking of prioritizing the request. The
>> concern
>>>> was
>>>>>> that
>>>>>>>>> the
>>>>>>>>>>> controller requests are supposed to be processed in order.
>>> If
>>>> we
>>>>>> can
>>>>>>>>>> ensure
>>>>>>>>>>> that there is one controller request in the request
>> channel,
>>>> the
>>>>>>>> order
>>>>>>>>> is
>>>>>>>>>>> not a concern. But in cases that there are more than one
>>>>>> controller
>>>>>>>>>> request
>>>>>>>>>>> inserted into the queue, the controller request order may
>>>> change
>>>>>> and
>>>>>>>>>> cause
>>>>>>>>>>> problem. For example, think about the following sequence:
>>>>>>>>>>> 1. Controller successfully sent a request R1 to broker
>>>>>>>>>>> 2. Broker receives R1 and put the request to the head of
>> the
>>>>>> request
>>>>>>>>>> queue.
>>>>>>>>>>> 3. Controller to broker connection failed and the
>> controller
>>>>>>>>> reconnected
>>>>>>>>>> to
>>>>>>>>>>> the broker.
>>>>>>>>>>> 4. Controller sends a request R2 to the broker
>>>>>>>>>>> 5. Broker receives R2 and add it to the head of the
>> request
>>>>> queue.
>>>>>>>>>>> Now on the broker side, R2 will be processed before R1 is
>>>>>> processed,
>>>>>>>>>> which
>>>>>>>>>>> may cause problem.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a simpler
>>>> less
>>>>>>>>> invasive
>>>>>>>>>>>> alternative and it should work. Jun/Becket/others, do
>> you
>>>> see
>>>>>> any
>>>>>>>>>>> pitfalls
>>>>>>>>>>>> with this approach?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
>>>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> @Mayuresh,
>>>>>>>>>>>>> That's a very interesting idea that I haven't thought
>>>>> before.
>>>>>>>>>>>>> It seems to solve our problem at hand pretty well, and
>>>> also
>>>>>>>>>>>>> avoids the need to have a new size metric and capacity
>>>>> config
>>>>>>>>>>>>> for the controller request queue. In fact, if we were
>> to
>>>>> adopt
>>>>>>>>>>>>> this design, there is no public interface change, and
>> we
>>>>>>>>>>>>> probably don't need a KIP.
>>>>>>>>>>>>> Also implementation wise, it seems
>>>>>>>>>>>>> the java class LinkedBlockingQueue can readily satisfy
>>> the
>>>>>>>>>> requirement
>>>>>>>>>>>>> by supporting a capacity, and also allowing inserting
>> at
>>>>> both
>>>>>>>> ends.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> My only concern is that this design is tied to the
>>>>> coincidence
>>>>>>>> that
>>>>>>>>>>>>> we have two request priorities and there are two ends
>>> to a
>>>>>>>> deque.
>>>>>>>>>>>>> Hence by using the proposed design, it seems the
>> network
>>>>> layer
>>>>>>>> is
>>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g. if
>> we
>>>> were
>>>>>> to
>>>>>>>> add
>>>>>>>>>>>>> an extra priority level in the future for some reason,
>>> we
>>>>>> would
>>>>>>>>>>> probably
>>>>>>>>>>>>> need to go back to the design of separate queues, one
>>> for
>>>>> each
>>>>>>>>>> priority
>>>>>>>>>>>>> level.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In summary, I'm ok with both designs and lean toward
>>> your
>>>>>>>> suggested
>>>>>>>>>>>>> approach.
>>>>>>>>>>>>> Let's hear what others think.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> @Becket,
>>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm
>>> answering
>>>>>> your
>>>>>>>>>>> question
>>>>>>>>>>>>> only in the context
>>>>>>>>>>>>> of the current KIP design: I think your suggestion
>> makes
>>>>>> sense,
>>>>>>>> and
>>>>>>>>>> I'm
>>>>>>>>>>>> ok
>>>>>>>>>>>>> with removing the capacity config and
>>>>>>>>>>>>> just relying on the default value of 20 being
>> sufficient
>>>>>> enough.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Lucas,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Seems like the main intent here is to prioritize the
>>>>>>>> controller
>>>>>>>>>>> request
>>>>>>>>>>>>>> over any other requests.
>>>>>>>>>>>>>> In that case, we can change the request queue to a
>>>>> dequeue,
>>>>>>>> where
>>>>>>>>>> you
>>>>>>>>>>>>>> always insert the normal requests (produce,
>>>> consume,..etc)
>>>>>> to
>>>>>>>> the
>>>>>>>>>> end
>>>>>>>>>>>> of
>>>>>>>>>>>>>> the dequeue, but if its a controller request, you
>>> insert
>>>>> it
>>>>>> to
>>>>>>>>> the
>>>>>>>>>>> head
>>>>>>>>>>>>> of
>>>>>>>>>>>>>> the queue. This ensures that the controller request
>>> will
>>>>> be
>>>>>>>> given
>>>>>>>>>>>> higher
>>>>>>>>>>>>>> priority over other requests.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Also since we only read one request from the socket
>>> and
>>>>> mute
>>>>>>>> it
>>>>>>>>> and
>>>>>>>>>>>> only
>>>>>>>>>>>>>> unmute it after handling the request, this would
>>> ensure
>>>>> that
>>>>>>>> we
>>>>>>>>>> don't
>>>>>>>>>>>>>> handle controller requests out of order.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> With this approach we can avoid the second queue and
>>> the
>>>>>>>>> additional
>>>>>>>>>>>>> config
>>>>>>>>>>>>>> for the size of the queue.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What do you think ?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Mayuresh
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
>>>>>>>> [email protected]
>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hey Joel,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thank for the detail explanation. I agree the
>>> current
>>>>>> design
>>>>>>>>>> makes
>>>>>>>>>>>>> sense.
>>>>>>>>>>>>>>> My confusion is about whether the new config for
>> the
>>>>>>>> controller
>>>>>>>>>>> queue
>>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case in
>>>> which
>>>>>>>> users
>>>>>>>>>>> would
>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Lucas,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we
>>>> expect
>>>>>>>> user to
>>>>>>>>>>> ever
>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>> the controller request queue capacity"? If we
>>> agree
>>>>> that
>>>>>>>> 20
>>>>>>>>> is
>>>>>>>>>>>>> already
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> very generous default number and we do not
>> expect
>>>> user
>>>>>> to
>>>>>>>>>> change
>>>>>>>>>>>> it,
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> still necessary to expose this as a config?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> @Becket
>>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that
>>>>> normally
>>>>>>>> there
>>>>>>>>>>>> should
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>> one controller request because of muting,
>>>>>>>>>>>>>>>>> and I had NOT intended to say there would be
>> many
>>>>>>>> enqueued
>>>>>>>>>>>>> controller
>>>>>>>>>>>>>>>>> requests.
>>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not sure
>>>> which
>>>>>> part
>>>>>>>>>>> conveys
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> info.
>>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out the
>>>>> section.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
>>>> conditions,
>>>>>> the
>>>>>>>>>> current
>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>> does not preclude multiple controllers running
>>>>>>>>>>>>>>>>> at the same time, hence if we don't have the
>>>>> controller
>>>>>>>>> queue
>>>>>>>>>>>>> capacity
>>>>>>>>>>>>>>>>> config and simply make its capacity to be 1,
>>>>>>>>>>>>>>>>> network threads handling requests from
>> different
>>>>>>>> controllers
>>>>>>>>>>> will
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> blocked during those troublesome times,
>>>>>>>>>>>>>>>>> which is probably not what we want. On the
>> other
>>>>> hand,
>>>>>>>>> adding
>>>>>>>>>>> the
>>>>>>>>>>>>>> extra
>>>>>>>>>>>>>>>>> config with a default value, say 20, guards us
>>> from
>>>>>>>> issues
>>>>>>>>> in
>>>>>>>>>>>> those
>>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much
>>>> downside
>>>>> of
>>>>>>>>> adding
>>>>>>>>>>> the
>>>>>>>>>>>>>> extra
>>>>>>>>>>>>>>>>> config.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> @Mayuresh
>>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
>>> statement
>>>>>> based
>>>>>>>> on
>>>>>>>>> a
>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
>>> Gharat <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Lucas,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>> I am trying to understand why you think "The
>>>> memory
>>>>>>>>>>> consumption
>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> rise
>>>>>>>>>>>>>>>>>> given the total number of queued requests can
>>> go
>>>> up
>>>>>> to
>>>>>>>> 2x"
>>>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> impact
>>>>>>>>>>>>>>>>>> section. Normally the requests from
>> controller
>>>> to a
>>>>>>>> Broker
>>>>>>>>>> are
>>>>>>>>>>>> not
>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>>>> volume, right ?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Mayuresh
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the
>>>> control
>>>>>>>> plane
>>>>>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>> plane
>>>>>>>>>>>>>>>>>>> makes a lot of sense.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
>> controller
>>>>>> request
>>>>>>>>> queue
>>>>>>>>>>> may
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case?
>>> The
>>>>>>>>> controller
>>>>>>>>>>>>>> requests
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
>>> SocketServer
>>>>>> will
>>>>>>>>> mute
>>>>>>>>>>> the
>>>>>>>>>>>>>>> channel
>>>>>>>>>>>>>>>>>> once
>>>>>>>>>>>>>>>>>>> a request is read and put into the request
>>>>> channel.
>>>>>>>> So
>>>>>>>>>>>> assuming
>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> only one connection between controller and
>>> each
>>>>>>>> broker,
>>>>>>>>> on
>>>>>>>>>>> the
>>>>>>>>>>>>>>> broker
>>>>>>>>>>>>>>>>>> side,
>>>>>>>>>>>>>>>>>>> there should be only one controller request
>>> in
>>>>> the
>>>>>>>>>>> controller
>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>> queue
>>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do
>> we
>>>>> need
>>>>>> a
>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>> controller
>>>>>>>>>>>>>>>>>>> request queue capacity config? The default
>>>> value
>>>>> 20
>>>>>>>>> means
>>>>>>>>>>> that
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> expect
>>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen
>>> in a
>>>>>> short
>>>>>>>>>> period
>>>>>>>>>>>> of
>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>> not sure whether someone should increase
>> the
>>>>>>>> controller
>>>>>>>>>>>> request
>>>>>>>>>>>>>>> queue
>>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems
>>>>>> indicating
>>>>>>>>>>> something
>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>> wrong
>>>>>>>>>>>>>>>>>>> has happened.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I think the motivation section is
>>> intuitive.
>>>> It
>>>>>>>> will
>>>>>>>>> be
>>>>>>>>>>> good
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> learn
>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>> about the comments from other reviewers.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
>>> Wang <
>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi Dong,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
>>> the
>>>>> KIP
>>>>>> by
>>>>>>>>>>>> explaining
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> would have user impacts.
>>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
>>>>>> comments.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
>>> Wang
>>>> <
>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Hi Dong,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
>>>> merely
>>>>>>>> for me
>>>>>>>>>> to
>>>>>>>>>>>>> easily
>>>>>>>>>>>>>>>>>>> construct
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> testing scenario
>>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
>> In
>>>>>>>> production,
>>>>>>>>>>> other
>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> disk
>>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
>>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
>> by
>>>> high
>>>>>>>>> produce
>>>>>>>>>>> QPS.
>>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
>>> the
>>>>>>>> broker
>>>>>>>>> and
>>>>>>>>>>>>> that's
>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
>>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
>>>> about
>>>>>> each
>>>>>>>>>>>>>> ProduceRequest
>>>>>>>>>>>>>>>>>>> covering
>>>>>>>>>>>>>>>>>>>>> 20
>>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
>>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
>>>>> request
>>>>>>>> is
>>>>>>>>>>>> enqueued
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> tries
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
>> broker0,
>>>>> from
>>>>>>>>> leader
>>>>>>>>>> to
>>>>>>>>>>>>>>> follower
>>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
>>>> *test-0*.
>>>>>> For
>>>>>>>> the
>>>>>>>>>>> sake
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> argument,
>>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
>>> say
>>>>>>>> broker1,
>>>>>>>>>> have
>>>>>>>>>>>>>>> *stopped*
>>>>>>>>>>>>>>>>>>>> fetching
>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
>>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
>>> have
>>>>>> acks =
>>>>>>>>> -1
>>>>>>>>>>>> (ALL)
>>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
>>>> ProduceRequests
>>>>>>>> ahead
>>>>>>>>> of
>>>>>>>>>>>>>>>>> LeaderAndISR
>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
>>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
>>>>> replicated
>>>>>>>> to
>>>>>>>>>> other
>>>>>>>>>>>>>> brokers
>>>>>>>>>>>>>>>>>>> (because
>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
>>>>>>>>>>>>>>>>>>>>>>        be completed either when the
>>>>>>>> LeaderAndISR
>>>>>>>>>>>> request
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> processed
>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
>>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
>>>>> immediately
>>>>>>>>>>> transition
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> partition
>>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
>>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
>>> the
>>>>>>>>>> replication
>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> remaining
>>>>>>>>>>>>>>>>>>>> 19
>>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
>>>>> indicating
>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
>>> the
>>>>>>>>> "test-0".
>>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
>> between
>>>> 1.1
>>>>>> and
>>>>>>>>> 1.2,
>>>>>>>>>>>> let's
>>>>>>>>>>>>>> say
>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
>>>>>> LeaderAndISR,
>>>>>>>>> and
>>>>>>>>>>>> there
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> 8
>>>>>>>>>>>>>>>>> io
>>>>>>>>>>>>>>>>>>>>> threads,
>>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
>>>>>> approximately
>>>>>>>>> 3000
>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>>>> requests.
>>>>>>>>>>>>>>>>>>>> Now
>>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
>>>>> finally
>>>>>>>>>> processed
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> LeaderAndISR.
>>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
>> we
>>>>> model
>>>>>>>> the
>>>>>>>>>> time
>>>>>>>>>>>> when
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>> remaining
>>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
>>>> ...t2999,
>>>>>> and
>>>>>>>>> the
>>>>>>>>>>>>>>> LeaderAndISR
>>>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
>>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
>>>> request
>>>>>>>> would
>>>>>>>>>> have
>>>>>>>>>>>>>> waited
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>> extra
>>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
>>> 2nd
>>>>> an
>>>>>>>> extra
>>>>>>>>>>> time
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> t3000 -
>>>>>>>>>>>>>>>>>>> t1,
>>>>>>>>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
>>>> difference
>>>>> is
>>>>>>>>> bigger
>>>>>>>>>>> for
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> earlier
>>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
>>> ones.
>>>>> For
>>>>>>>> the
>>>>>>>>>> same
>>>>>>>>>>>>>> reason,
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
>>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
>>>>> benefit
>>>>>>>> we
>>>>>>>>> get
>>>>>>>>>>>>> (capped
>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> produce timeout).
>>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
>>> have
>>>>>>>> acks=0 or
>>>>>>>>>>>> acks=1
>>>>>>>>>>>>>>>>>>>>>>  There will be no latency
>> differences
>>> in
>>>>>> this
>>>>>>>>> case,
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
>> of
>>>>>>>> partition
>>>>>>>>>>> test-0
>>>>>>>>>>>> in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
>>> LeaderAndISR
>>>>>> will
>>>>>>>> be
>>>>>>>>>>>> appended
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>>>>>> log,
>>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
>>> after
>>>>>>>>> processing
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> LeaderAndISR.
>>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
>>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
>> of
>>>> data
>>>>>>>> loss
>>>>>>>>> in
>>>>>>>>>>>> terms
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
>>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
>>> the
>>>>>> effect
>>>>>>>>>> since
>>>>>>>>>>> if
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> LeaderAndISR
>>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
>> response
>>> to
>>>>>>>>> producers
>>>>>>>>>>> will
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
>>> error,
>>>>>>>> causing
>>>>>>>>>>>> producers
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> retry
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
>>> for
>>>>>>>> reducing
>>>>>>>>>> the
>>>>>>>>>>>>>> latency
>>>>>>>>>>>>>>>>> of a
>>>>>>>>>>>>>>>>>>>>> broker
>>>>>>>>>>>>>>>>>>>>>> becoming the follower,
>>>>>>>>>>>>>>>>>>>>>> closely related is reducing the
>> latency
>>>> of
>>>>> a
>>>>>>>>> broker
>>>>>>>>>>>>> becoming
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> leader.
>>>>>>>>>>>>>>>>>>>>>> In this case, the benefit is even
>> more
>>>>>>>> obvious, if
>>>>>>>>>>> other
>>>>>>>>>>>>>>> brokers
>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>> resigned leadership, and the
>>>>>>>>>>>>>>>>>>>>>> current broker should take
>> leadership.
>>>> Any
>>>>>>>> delay
>>>>>>>>> in
>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> LeaderAndISR will be perceived
>>>>>>>>>>>>>>>>>>>>>> by clients as unavailability. In
>>> extreme
>>>>>> cases,
>>>>>>>>> this
>>>>>>>>>>> can
>>>>>>>>>>>>>> cause
>>>>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>>>>>>>>> produce requests if the retries are
>>>>>>>>>>>>>>>>>>>>>> exhausted.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Another two types of controller
>>> requests
>>>>> are
>>>>>>>>>>>>> UpdateMetadata
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> StopReplica, which I'll briefly
>> discuss
>>>> as
>>>>>>>>> follows:
>>>>>>>>>>>>>>>>>>>>>> For UpdateMetadata requests, delayed
>>>>>> processing
>>>>>>>>>> means
>>>>>>>>>>>>>> clients
>>>>>>>>>>>>>>>>>>> receiving
>>>>>>>>>>>>>>>>>>>>>> stale metadata, e.g. with the wrong
>>>>>> leadership
>>>>>>>>> info
>>>>>>>>>>>>>>>>>>>>>> for certain partitions, and the
>> effect
>>> is
>>>>>> more
>>>>>>>>>> retries
>>>>>>>>>>>> or
>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>>> fatal
>>>>>>>>>>>>>>>>>>>>>> failure if the retries are exhausted.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> For StopReplica requests, a long
>>> queuing
>>>>> time
>>>>>>>> may
>>>>>>>>>>>> degrade
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>>>>>>> of topic deletion.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Regarding your last question of the
>>> delay
>>>>> for
>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest,
>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>> are right
>>>>>>>>>>>>>>>>>>>>>> that this KIP cannot help with the
>>>> latency
>>>>> in
>>>>>>>>>> getting
>>>>>>>>>>>> the
>>>>>>>>>>>>>> log
>>>>>>>>>>>>>>>>> dirs
>>>>>>>>>>>>>>>>>>>> info,
>>>>>>>>>>>>>>>>>>>>>> and it's only relevant
>>>>>>>>>>>>>>>>>>>>>> when controller requests are
>> involved.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 5:11 PM, Dong
>>> Lin
>>>> <
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hey Jun,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks much for the comments. It is
>>> good
>>>>>>>> point.
>>>>>>>>> So
>>>>>>>>>>> the
>>>>>>>>>>>>>>> feature
>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>> useful for JBOD use-case. I have one
>>>>>> question
>>>>>>>>>> below.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Do you think this feature is also
>>> useful
>>>>> for
>>>>>>>>>> non-JBOD
>>>>>>>>>>>>> setup
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>> useful for the JBOD setup? It may be
>>>>> useful
>>>>>> to
>>>>>>>>>>>> understand
>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> When the broker is setup using JBOD,
>>> in
>>>>>> order
>>>>>>>> to
>>>>>>>>>> move
>>>>>>>>>>>>>> leaders
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>>>>>>>>>> disk to other disks, the system
>>> operator
>>>>>> first
>>>>>>>>>> needs
>>>>>>>>>>> to
>>>>>>>>>>>>> get
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>> partitions on the failed disk. This
>> is
>>>>>>>> currently
>>>>>>>>>>>> achieved
>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>>>>>>> AdminClient.describeLogDirs(), which
>>>> sends
>>>>>>>>>>>>>>>>> DescribeLogDirsRequest
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> broker. If we only prioritize the
>>>>> controller
>>>>>>>>>>> requests,
>>>>>>>>>>>>> then
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest
>>>>>>>>>>>>>>>>>>>>>>> may still take a long time to be
>>>> processed
>>>>>> by
>>>>>>>> the
>>>>>>>>>>>> broker.
>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> overall
>>>>>>>>>>>>>>>>>>>>>>> time to move leaders away from the
>>>> failed
>>>>>> disk
>>>>>>>>> may
>>>>>>>>>>>> still
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> long
>>>>>>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>> this KIP. What do you think?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Dong
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 4:38 PM,
>> Lucas
>>>>> Wang <
>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the insightful comment,
>>>> Jun.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> @Dong,
>>>>>>>>>>>>>>>>>>>>>>>> Since both of the two comments in
>>> your
>>>>>>>> previous
>>>>>>>>>>> email
>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> benefits of this KIP and whether
>>> it's
>>>>>>>> useful,
>>>>>>>>>>>>>>>>>>>>>>>> in light of Jun's last comment, do
>>> you
>>>>>> agree
>>>>>>>>> that
>>>>>>>>>>>> this
>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>> beneficial in the case mentioned
>> by
>>>> Jun?
>>>>>>>>>>>>>>>>>>>>>>>> Please let me know, thanks!
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 2:07 PM,
>> Jun
>>>> Rao
>>>>> <
>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lucas, Dong,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> If all disks on a broker are
>> slow,
>>>> one
>>>>>>>>> probably
>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>> kill
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> broker. In that case, this KIP
>> may
>>>> not
>>>>>>>> help.
>>>>>>>>> If
>>>>>>>>>>>> only
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> disks
>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>> broker is slow, one may want to
>>> fail
>>>>>> that
>>>>>>>>> disk
>>>>>>>>>>> and
>>>>>>>>>>>>> move
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> leaders
>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> disk to other brokers. In that
>>> case,
>>>>>> being
>>>>>>>>> able
>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsr
>>>>>>>>>>>>>>>>>>>>>>>>> requests faster will potentially
>>>> help
>>>>>> the
>>>>>>>>>>> producers
>>>>>>>>>>>>>>> recover
>>>>>>>>>>>>>>>>>>>> quicker.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Jun
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 2, 2018 at 7:56 PM,
>>> Dong
>>>>>> Lin <
>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the reply. Some
>>> follow
>>>> up
>>>>>>>>>> questions
>>>>>>>>>>>>> below.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 1, if each
>>>> ProduceRequest
>>>>>>>> covers
>>>>>>>>> 20
>>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>> randomly
>>>>>>>>>>>>>>>>>>>>>>>>>> distributed across all
>>> partitions,
>>>>>> then
>>>>>>>>> each
>>>>>>>>>>>>>>>>> ProduceRequest
>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>> likely
>>>>>>>>>>>>>>>>>>>>>>>>>> cover some partitions for
>> which
>>>> the
>>>>>>>> broker
>>>>>>>>> is
>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>> leader
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>> quickly
>>>>>>>>>>>>>>>>>>>>>>>>>> processes the
>>>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsrRequest. Then
>> broker
>>>>> will
>>>>>>>> still
>>>>>>>>>> be
>>>>>>>>>>>> slow
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>> ProduceRequest and request
>> will
>>>>> still
>>>>>> be
>>>>>>>>> very
>>>>>>>>>>>> high
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> KIP.
>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>>>>>>>>>>> that most ProduceRequest will
>>>> still
>>>>>>>> timeout
>>>>>>>>>>> after
>>>>>>>>>>>>> 30
>>>>>>>>>>>>>>>>>> seconds.
>>>>>>>>>>>>>>>>>>> Is
>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>> understanding correct?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 2, if most
>>>> ProduceRequest
>>>>>> will
>>>>>>>>>> still
>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> 30
>>>>>>>>>>>>>>>>>>>>>>>> seconds,
>>>>>>>>>>>>>>>>>>>>>>>>>> then it is less clear how this
>>> KIP
>>>>>>>> reduces
>>>>>>>>>>>> average
>>>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>>>>>>> latency.
>>>>>>>>>>>>>>>>>>>>>>> Can
>>>>>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>> clarify what metrics can be
>>>> improved
>>>>>> by
>>>>>>>>> this
>>>>>>>>>>> KIP?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Not sure why system operator
>>>>> directly
>>>>>>>> cares
>>>>>>>>>>>> number
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> truncated
>>>>>>>>>>>>>>>>>>>>>>>> messages.
>>>>>>>>>>>>>>>>>>>>>>>>>> Do you mean this KIP can
>> improve
>>>>>> average
>>>>>>>>>>>> throughput
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>>>>>>>>> duplication? It will be good
>> to
>>>>>>>> understand
>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> Dong
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 Jul 2018 at 7:12 AM
>>>> Lucas
>>>>>>>> Wang <
>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dong,
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your valuable
>>>> comments.
>>>>>>>> Please
>>>>>>>>>> see
>>>>>>>>>>>> my
>>>>>>>>>>>>>>> reply
>>>>>>>>>>>>>>>>>>> below.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The Google doc showed
>> only
>>> 1
>>>>>>>>> partition.
>>>>>>>>>>> Now
>>>>>>>>>>>>>> let's
>>>>>>>>>>>>>>>>>>> consider
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>> common
>>>>>>>>>>>>>>>>>>>>>>>>>>> scenario
>>>>>>>>>>>>>>>>>>>>>>>>>>> where broker0 is the leader
>> of
>>>>> many
>>>>>>>>>>> partitions.
>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>> let's
>>>>>>>>>>>>>>>>>>> say
>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>>>> reason its IO becomes slow.
>>>>>>>>>>>>>>>>>>>>>>>>>>> The number of leader
>>> partitions
>>>> on
>>>>>>>>> broker0
>>>>>>>>>> is
>>>>>>>>>>>> so
>>>>>>>>>>>>>>> large,
>>>>>>>>>>>>>>>>>> say
>>>>>>>>>>>>>>>>>>>> 10K,
>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster is skewed,
>>>>>>>>>>>>>>>>>>>>>>>>>>> and the operator would like
>> to
>>>>> shift
>>>>>>>> the
>>>>>>>>>>>>> leadership
>>>>>>>>>>>>>>>>> for a
>>>>>>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>> partitions, say 9K, to other
>>>>>> brokers,
>>>>>>>>>>>>>>>>>>>>>>>>>>> either manually or through
>>> some
>>>>>>>> service
>>>>>>>>>> like
>>>>>>>>>>>>> cruise
>>>>>>>>>>>>>>>>>> control.
>>>>>>>>>>>>>>>>>>>>>>>>>>> With this KIP, not only will
>>> the
>>>>>>>>> leadership
>>>>>>>>>>>>>>> transitions
>>>>>>>>>>>>>>>>>>> finish
>>>>>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>>> quickly, helping the cluster
>>>>> itself
>>>>>>>>>> becoming
>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>> balanced,
>>>>>>>>>>>>>>>>>>>>>>>>>>> but all existing producers
>>>>>>>> corresponding
>>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>> 9K
>>>>>>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> errors relatively quickly
>>>>>>>>>>>>>>>>>>>>>>>>>>> rather than relying on their
>>>>>> timeout,
>>>>>>>>>> thanks
>>>>>>>>>>> to
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> batched
>>>>>>>>>>>>>>>>>>>>> async
>>>>>>>>>>>>>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>>>>>>>>>> operations.
>>>>>>>>>>>>>>>>>>>>>>>>>>> To me it's a useful feature
>> to
>>>>> have
>>>>>>>>> during
>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>> troublesome
>>>>>>>>>>>>>>>>>>>>> times.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. The experiments in the
>>> Google
>>>>> Doc
>>>>>>>> have
>>>>>>>>>>> shown
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>>>>>>>>>>> producers
>>>>>>>>>>>>>>>>>>>>>>>>>>> receive an explicit error
>>>>>>>>>>>> NotLeaderForPartition,
>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>>>>>>>>> retry
>>>>>>>>>>>>>>>>>>>>>>>>>>> immediately.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Therefore the latency (~14
>>>>>>>> seconds+quick
>>>>>>>>>>> retry)
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>> much smaller
>>>>>>>>>>>>>>>>>>>>>>>>>>> compared with the case of
>>> timing
>>>>> out
>>>>>>>>>> without
>>>>>>>>>>>> the
>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>> (30
>>>>>>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>> timing
>>>>>>>>>>>>>>>>>>>>>>>>>>> out + quick retry).
>>>>>>>>>>>>>>>>>>>>>>>>>>> One might argue that
>> reducing
>>>> the
>>>>>>>> timing
>>>>>>>>>> out
>>>>>>>>>>> on
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> producer
>>>>>>>>>>>>>>>>>>>>> side
>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>> achieve the same result,
>>>>>>>>>>>>>>>>>>>>>>>>>>> yet reducing the timeout has
>>> its
>>>>> own
>>>>>>>>>>>>> drawbacks[1].
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Also *IF* there were a
>> metric
>>> to
>>>>>> show
>>>>>>>> the
>>>>>>>>>>>> number
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> truncated
>>>>>>>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>> brokers,
>>>>>>>>>>>>>>>>>>>>>>>>>>> with the experiments done in
>>> the
>>>>>>>> Google
>>>>>>>>>> Doc,
>>>>>>>>>>> it
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> easy
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> a lot fewer messages need
>>>>>>>>>>>>>>>>>>>>>>>>>>> to be truncated on broker0
>>> since
>>>>> the
>>>>>>>>>>> up-to-date
>>>>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>>>>>>> avoids
>>>>>>>>>>>>>>>>>>>>>>>>> appending
>>>>>>>>>>>>>>>>>>>>>>>>>>> of messages
>>>>>>>>>>>>>>>>>>>>>>>>>>> in subsequent PRODUCE
>>> requests.
>>>> If
>>>>>> we
>>>>>>>>> talk
>>>>>>>>>>> to a
>>>>>>>>>>>>>>> system
>>>>>>>>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> ask
>>>>>>>>>>>>>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>>>>>>>>>>>>> they prefer fewer wasteful
>>> IOs,
>>>> I
>>>>>> bet
>>>>>>>>> most
>>>>>>>>>>>> likely
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> answer
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>> yes.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. To answer your question,
>> I
>>>>> think
>>>>>> it
>>>>>>>>>> might
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> helpful to
>>>>>>>>>>>>>>>>>>>>>>> construct
>>>>>>>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>>>> formulas.
>>>>>>>>>>>>>>>>>>>>>>>>>>> To simplify the modeling,
>> I'm
>>>>> going
>>>>>>>> back
>>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>> ONE partition involved.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Following the experiments in
>>> the
>>>>>>>> Google
>>>>>>>>>> Doc,
>>>>>>>>>>>>> let's
>>>>>>>>>>>>>>> say
>>>>>>>>>>>>>>>>>>> broker0
>>>>>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> follower at time t0,
>>>>>>>>>>>>>>>>>>>>>>>>>>> and after t0 there were
>> still
>>> N
>>>>>>>> produce
>>>>>>>>>>>> requests
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>>>>>> queue.
>>>>>>>>>>>>>>>>>>>>>>>>>>> With the up-to-date metadata
>>>>> brought
>>>>>>>> by
>>>>>>>>>> this
>>>>>>>>>>>> KIP,
>>>>>>>>>>>>>>>>> broker0
>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>> reply
>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>> NotLeaderForPartition
>>> exception,
>>>>>>>>>>>>>>>>>>>>>>>>>>> let's use M1 to denote the
>>>> average
>>>>>>>>>> processing
>>>>>>>>>>>>> time
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> replying
>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>> error message.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Without this KIP, the broker
>>>> will
>>>>>>>> need to
>>>>>>>>>>>> append
>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> segments,
>>>>>>>>>>>>>>>>>>>>>>>>>>> which may trigger a flush to
>>>> disk,
>>>>>>>>>>>>>>>>>>>>>>>>>>> let's use M2 to denote the
>>>> average
>>>>>>>>>> processing
>>>>>>>>>>>>> time
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>>>> logic.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Then the average extra
>> latency
>>>>>>>> incurred
>>>>>>>>>>> without
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> N
>>>>>>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>>>>>>>>>> (M2 -
>>>>>>>>>>>>>>>>>>>>>>>>>> M1) /
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> In practice, M2 should
>> always
>>> be
>>>>>>>> larger
>>>>>>>>>> than
>>>>>>>>>>>> M1,
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>> long
>>>>>>>>>>>>>>>>>>>>>>>>> as N
>>>>>>>>>>>>>>>>>>>>>>>>>>> is positive,
>>>>>>>>>>>>>>>>>>>>>>>>>>> we would see improvements on
>>> the
>>>>>>>> average
>>>>>>>>>>>> latency.
>>>>>>>>>>>>>>>>>>>>>>>>>>> There does not need to be
>>>>>> significant
>>>>>>>>>> backlog
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> requests
>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>>>>>>>>> queue,
>>>>>>>>>>>>>>>>>>>>>>>>>>> or severe degradation of
>> disk
>>>>>>>> performance
>>>>>>>>>> to
>>>>>>>>>>>> have
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> improvement.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] For instance, reducing
>> the
>>>>>>>> timeout on
>>>>>>>>>> the
>>>>>>>>>>>>>>> producer
>>>>>>>>>>>>>>>>>> side
>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>> trigger
>>>>>>>>>>>>>>>>>>>>>>>>>>> unnecessary duplicate
>> requests
>>>>>>>>>>>>>>>>>>>>>>>>>>> when the corresponding
>> leader
>>>>> broker
>>>>>>>> is
>>>>>>>>>>>>> overloaded,
>>>>>>>>>>>>>>>>>>>> exacerbating
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Jul 1, 2018 at 9:18
>>> PM,
>>>>> Dong
>>>>>>>> Lin
>>>>>>>>> <
>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks much for the
>> detailed
>>>>>>>>>> documentation
>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> experiment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially I also think
>>> having
>>>> a
>>>>>>>>> separate
>>>>>>>>>>>> queue
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> controller
>>>>>>>>>>>>>>>>>>>>>>>>> requests
>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> useful because, as you
>>>> mentioned
>>>>>> in
>>>>>>>> the
>>>>>>>>>>>> summary
>>>>>>>>>>>>>>>>> section
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> Google
>>>>>>>>>>>>>>>>>>>>>>>>>>> doc,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> controller requests are
>>>>> generally
>>>>>>>> more
>>>>>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> -Regards,
>>>> Mayuresh R. Gharat
>>>> (862) 250-7125
>>>> 
>>> 
>>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Reply via email to