Re: [DISCUSS] KIP-74: Add Fetch Response Size Limit in Bytes

Andrey L. Neporada Mon, 15 Aug 2016 03:11:30 -0700

Hi all!

KIP-74 is updated to sync up with mail list discussion.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+Limit+in+Bytes


Your feedback is highly appreciated.

Thanks,
Andrey.


> On 12 Aug 2016, at 20:49, Andrey L. Neporada <anepor...@yandex-team.ru> wrote:
> 
> Hi!
>> On 12 Aug 2016, at 20:22, Jun Rao <j...@confluent.io> wrote:
>> 
>> Hi, Andrey,
>> 
>> Yes, I agree that it's more work for the client to do the round-robin logic
>> since it has to be stateful. However, that applies to both the consumer
>> client and the replica fetch thread. I just feel that it's weird to use one
>> strategy in the replica fetch thread and another in the consumer client. An
>> alternative is to just let the leader broker shuffle the partitions before
>> filling up the bytes. This alleviates the need for the clients to maintain
>> additional states. It's not as deterministic as doing round-robin in the
>> client, but is probably good enough in practice.
>> 
> 
> Oh, I got the point. You are talking about high-level consumer client API 
> (i.e. Consumer.java).
> Yes, it makes sense to use the same strategy as in ReplicaFetcher thread 
> there.
> 
> 
>> For 2), I am not sure if we want to set the default limit of fetch response
>> in the consumer to be 0. Today, sending a response of more than 2GB will
>> cause buffer overflow. How about defaulting it to 50BM? This should be
>> enough for the common case. For the default fetch response limit in the
>> replica fetcher, I'd recommend 10MB since there could be multiple replica
>> fetcher threads to different brokers.
>> 
> 
> Ok, I’ll put this numbers in PR - we can discuss them later.
> 
>> Thanks,
>> 
>> Jun
>> 
> 
> Andrey.
> 
>> 
>> On Fri, Aug 12, 2016 at 9:19 AM, Andrey L. Neporada <
>> anepor...@yandex-team.ru> wrote:
>> 
>>> Hi!
>>> 
>>>> On 12 Aug 2016, at 18:32, Jun Rao <j...@confluent.io> wrote:
>>>> 
>>>> Hi, Andrey,
>>>> 
>>>> Why shouldn't the client library do reordering? It seems that if
>>>> ReplicaFetcher thread does round-robin, the consumer client should do
>>> that
>>>> too?
>>>> 
>>> 
>>> IMHO the client library is not a good place to implement such logic.
>>> For example, if we want to put round-robin logic in client lib, it will
>>> become stageful (since it should remember first empty partition received
>>> from server). It should also remember some kind of context to distinguish
>>> one end-user (or partition set) from another, etc.
>>> 
>>> Personally, I'd expect from client library to submit requests as is.
>>> Anyway, I think it is difficult to come up with reordering logic that will
>>> be good for all clients.
>>> 
>>> So, I propose:
>>> 
>>> 1) preserve per-partition limit in fetch request (as discussed)
>>> 2) make default limit of fetch response to be 0 (which means no limit at
>>> all). So, by default, new fetch request should behave exactly like old one
>>> 3) implement round-robin logic in ReplicaFetcherThread and use non-zero
>>> default fetch response limit there.
>>> 4) document that all end-users who want to use FetchRequest with actual
>>> response limit to implement some kind of round-robin to ensure fairness (if
>>> they care about it)
>>> 
>>> 
>>>> Thanks,
>>>> 
>>>> Jun
>>>> 
>>> 
>>> Thanks,
>>> Andrey.
>>> 
>>> 
>>>> On Fri, Aug 12, 2016 at 3:56 AM, Andrey L. Neporada <
>>>> anepor...@yandex-team.ru> wrote:
>>>> 
>>>>> Hi!
>>>>> 
>>>>>> On 12 Aug 2016, at 04:29, Jun Rao <j...@confluent.io> wrote:
>>>>>> 
>>>>>> Hi, Andrey,
>>>>>> 
>>>>>> One potential benefit of keeping the per partition limit is for Kafka
>>>>>> stream. When reading messages from different partitions, KStream
>>> prefers
>>>>> to
>>>>>> read from partitions with smaller timestamps first and only advances
>>> the
>>>>>> KStream timestamp when it sees at least one message from every
>>> partition.
>>>>>> Being able to fill up multiple partitions in a single fetch response
>>> can
>>>>>> help KStream advance the timestamp quicker when there is backlog from
>>> the
>>>>>> input. So, it's probably better if we just add a new response limit
>>> while
>>>>>> keeping the per partition limit.
>>>>>> 
>>>>> 
>>>>> Yes, this makes sense for me.
>>>>> 
>>>>>> Also, for fairness across partitions, you mentioned "The solution is to
>>>>>> start fetching from first empty partition in round-robin fashion or to
>>>>>> perform random shuffle of partitions.".  It seems the former is more
>>>>>> deterministic. Did you use that in your implementation and should we
>>>>>> recommend that for non-java clients as well?
>>>>>> 
>>>>> 
>>>>> In my initial implementation I just did random shuffling on server side.
>>>>> Now I plan to use deterministic approach - do round-robin in
>>>>> ReplicaFetcher thread - I believe the client library itself shouldn’t do
>>>>> any form of reordering.
>>>>> 
>>>>> But we should definitely recommend some form of shuffling if client
>>> wants
>>>>> to limit response size.
>>>>> Not sure which shuffling method is better.
>>>>> 
>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Jun
>>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Andrey.
>>>>> 
>>>>> 
>>> 
>>> 
>

Re: [DISCUSS] KIP-74: Add Fetch Response Size Limit in Bytes

Reply via email to