Hi, Andrey,

Yes, I agree that it's more work for the client to do the round-robin logic
since it has to be stateful. However, that applies to both the consumer
client and the replica fetch thread. I just feel that it's weird to use one
strategy in the replica fetch thread and another in the consumer client. An
alternative is to just let the leader broker shuffle the partitions before
filling up the bytes. This alleviates the need for the clients to maintain
additional states. It's not as deterministic as doing round-robin in the
client, but is probably good enough in practice.

For 2), I am not sure if we want to set the default limit of fetch response
in the consumer to be 0. Today, sending a response of more than 2GB will
cause buffer overflow. How about defaulting it to 50BM? This should be
enough for the common case. For the default fetch response limit in the
replica fetcher, I'd recommend 10MB since there could be multiple replica
fetcher threads to different brokers.

Thanks,

Jun


On Fri, Aug 12, 2016 at 9:19 AM, Andrey L. Neporada <
anepor...@yandex-team.ru> wrote:

> Hi!
>
> > On 12 Aug 2016, at 18:32, Jun Rao <j...@confluent.io> wrote:
> >
> > Hi, Andrey,
> >
> > Why shouldn't the client library do reordering? It seems that if
> > ReplicaFetcher thread does round-robin, the consumer client should do
> that
> > too?
> >
>
> IMHO the client library is not a good place to implement such logic.
> For example, if we want to put round-robin logic in client lib, it will
> become stageful (since it should remember first empty partition received
> from server). It should also remember some kind of context to distinguish
> one end-user (or partition set) from another, etc.
>
> Personally, I'd expect from client library to submit requests as is.
> Anyway, I think it is difficult to come up with reordering logic that will
> be good for all clients.
>
> So, I propose:
>
> 1) preserve per-partition limit in fetch request (as discussed)
> 2) make default limit of fetch response to be 0 (which means no limit at
> all). So, by default, new fetch request should behave exactly like old one
> 3) implement round-robin logic in ReplicaFetcherThread and use non-zero
> default fetch response limit there.
> 4) document that all end-users who want to use FetchRequest with actual
> response limit to implement some kind of round-robin to ensure fairness (if
> they care about it)
>
>
> > Thanks,
> >
> > Jun
> >
>
> Thanks,
> Andrey.
>
>
> > On Fri, Aug 12, 2016 at 3:56 AM, Andrey L. Neporada <
> > anepor...@yandex-team.ru> wrote:
> >
> >> Hi!
> >>
> >>> On 12 Aug 2016, at 04:29, Jun Rao <j...@confluent.io> wrote:
> >>>
> >>> Hi, Andrey,
> >>>
> >>> One potential benefit of keeping the per partition limit is for Kafka
> >>> stream. When reading messages from different partitions, KStream
> prefers
> >> to
> >>> read from partitions with smaller timestamps first and only advances
> the
> >>> KStream timestamp when it sees at least one message from every
> partition.
> >>> Being able to fill up multiple partitions in a single fetch response
> can
> >>> help KStream advance the timestamp quicker when there is backlog from
> the
> >>> input. So, it's probably better if we just add a new response limit
> while
> >>> keeping the per partition limit.
> >>>
> >>
> >> Yes, this makes sense for me.
> >>
> >>> Also, for fairness across partitions, you mentioned "The solution is to
> >>> start fetching from first empty partition in round-robin fashion or to
> >>> perform random shuffle of partitions.".  It seems the former is more
> >>> deterministic. Did you use that in your implementation and should we
> >>> recommend that for non-java clients as well?
> >>>
> >>
> >> In my initial implementation I just did random shuffling on server side.
> >> Now I plan to use deterministic approach - do round-robin in
> >> ReplicaFetcher thread - I believe the client library itself shouldn’t do
> >> any form of reordering.
> >>
> >> But we should definitely recommend some form of shuffling if client
> wants
> >> to limit response size.
> >> Not sure which shuffling method is better.
> >>
> >>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>
> >> Thanks,
> >> Andrey.
> >>
> >>
>
>

Reply via email to