Hey Becket,

1. I agree that this is a problem. I think this ended up this way because
there are two ways a TimeoutException can be thrown, the server can timeout
or we can timeout waiting for memory. I think the complaint at the time was
that it was annoying that you needed to write the same exception handling
logic in two places. That is annoying but I think you are right that we
should probably make it be the case that the callbacks are always in order
by partition as that is a bigger problem.

2. This is by design but we argued it both ways. The question is what are
you batching for? If you are batching to avoid small network sends and
amortize request overhead then the current approach makes the most sense,
if you are batching to reduce file writes then your more conservative
approach makes more sense. By opportunistically piggybacking on an existing
request you kind of get to send these records for free, but if you waited
maybe you would have accumulated more data for those partitions to justify
a send anyway with more data for each write. Philosophically you could
argue for either. I think the case the current behavior might be suboptimal
for is where you have lots of partitions, a really high linger.ms, and a
fair amount of skew, and so as a result each full partition triggers a send
for smaller partitions. A starting point for optimization would be to
measure this bad case and figure out how much opportunity there is for
improvement (say 500 partitions with strong skew to one or two of them).
Our benchmarks tend to skip this as they fairly distribute load. If the
opportunity is large here we can discuss if it's worth trying to fix and
what gets worse.

-Jay


On Mon, May 18, 2015 at 3:44 PM, Jiangjie Qin <j...@linkedin.com.invalid>
wrote:

> Hi,
>
> I have two questions when writing patch for KAFKA-2142 and needs some help:
>
>   1.  In send(), we actually might call callback.onCompletion(), this
> might break the guarantee that callbacks are fired in sending order of a
> particular partition.
>   2.  In RecordAccumulator, current logic is that if one batch to a broker
> is ready (full or reaches linger.ms), we drain all the batches to the
> same broker even if other batches are not full or reaching linger.ms yet.
> Guozhang mentioned that this is by design, but it seems different from the
> java doc and might have negative effect on batching.  Was there any concern
> on this?
>
> Thanks.
>
> Jiangjie (Becket) Qin
>
>

Reply via email to