Hi all! KIP-74 is updated to sync up with mail list discussion. https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+Limit+in+Bytes
Your feedback is highly appreciated. Thanks, Andrey. > On 12 Aug 2016, at 20:49, Andrey L. Neporada <anepor...@yandex-team.ru> wrote: > > Hi! >> On 12 Aug 2016, at 20:22, Jun Rao <j...@confluent.io> wrote: >> >> Hi, Andrey, >> >> Yes, I agree that it's more work for the client to do the round-robin logic >> since it has to be stateful. However, that applies to both the consumer >> client and the replica fetch thread. I just feel that it's weird to use one >> strategy in the replica fetch thread and another in the consumer client. An >> alternative is to just let the leader broker shuffle the partitions before >> filling up the bytes. This alleviates the need for the clients to maintain >> additional states. It's not as deterministic as doing round-robin in the >> client, but is probably good enough in practice. >> > > Oh, I got the point. You are talking about high-level consumer client API > (i.e. Consumer.java). > Yes, it makes sense to use the same strategy as in ReplicaFetcher thread > there. > > >> For 2), I am not sure if we want to set the default limit of fetch response >> in the consumer to be 0. Today, sending a response of more than 2GB will >> cause buffer overflow. How about defaulting it to 50BM? This should be >> enough for the common case. For the default fetch response limit in the >> replica fetcher, I'd recommend 10MB since there could be multiple replica >> fetcher threads to different brokers. >> > > Ok, I’ll put this numbers in PR - we can discuss them later. > >> Thanks, >> >> Jun >> > > Andrey. > >> >> On Fri, Aug 12, 2016 at 9:19 AM, Andrey L. Neporada < >> anepor...@yandex-team.ru> wrote: >> >>> Hi! >>> >>>> On 12 Aug 2016, at 18:32, Jun Rao <j...@confluent.io> wrote: >>>> >>>> Hi, Andrey, >>>> >>>> Why shouldn't the client library do reordering? It seems that if >>>> ReplicaFetcher thread does round-robin, the consumer client should do >>> that >>>> too? >>>> >>> >>> IMHO the client library is not a good place to implement such logic. >>> For example, if we want to put round-robin logic in client lib, it will >>> become stageful (since it should remember first empty partition received >>> from server). It should also remember some kind of context to distinguish >>> one end-user (or partition set) from another, etc. >>> >>> Personally, I'd expect from client library to submit requests as is. >>> Anyway, I think it is difficult to come up with reordering logic that will >>> be good for all clients. >>> >>> So, I propose: >>> >>> 1) preserve per-partition limit in fetch request (as discussed) >>> 2) make default limit of fetch response to be 0 (which means no limit at >>> all). So, by default, new fetch request should behave exactly like old one >>> 3) implement round-robin logic in ReplicaFetcherThread and use non-zero >>> default fetch response limit there. >>> 4) document that all end-users who want to use FetchRequest with actual >>> response limit to implement some kind of round-robin to ensure fairness (if >>> they care about it) >>> >>> >>>> Thanks, >>>> >>>> Jun >>>> >>> >>> Thanks, >>> Andrey. >>> >>> >>>> On Fri, Aug 12, 2016 at 3:56 AM, Andrey L. Neporada < >>>> anepor...@yandex-team.ru> wrote: >>>> >>>>> Hi! >>>>> >>>>>> On 12 Aug 2016, at 04:29, Jun Rao <j...@confluent.io> wrote: >>>>>> >>>>>> Hi, Andrey, >>>>>> >>>>>> One potential benefit of keeping the per partition limit is for Kafka >>>>>> stream. When reading messages from different partitions, KStream >>> prefers >>>>> to >>>>>> read from partitions with smaller timestamps first and only advances >>> the >>>>>> KStream timestamp when it sees at least one message from every >>> partition. >>>>>> Being able to fill up multiple partitions in a single fetch response >>> can >>>>>> help KStream advance the timestamp quicker when there is backlog from >>> the >>>>>> input. So, it's probably better if we just add a new response limit >>> while >>>>>> keeping the per partition limit. >>>>>> >>>>> >>>>> Yes, this makes sense for me. >>>>> >>>>>> Also, for fairness across partitions, you mentioned "The solution is to >>>>>> start fetching from first empty partition in round-robin fashion or to >>>>>> perform random shuffle of partitions.". It seems the former is more >>>>>> deterministic. Did you use that in your implementation and should we >>>>>> recommend that for non-java clients as well? >>>>>> >>>>> >>>>> In my initial implementation I just did random shuffling on server side. >>>>> Now I plan to use deterministic approach - do round-robin in >>>>> ReplicaFetcher thread - I believe the client library itself shouldn’t do >>>>> any form of reordering. >>>>> >>>>> But we should definitely recommend some form of shuffling if client >>> wants >>>>> to limit response size. >>>>> Not sure which shuffling method is better. >>>>> >>>>> >>>>>> Thanks, >>>>>> >>>>>> Jun >>>>>> >>>>> >>>>> Thanks, >>>>> Andrey. >>>>> >>>>> >>> >>> >