Hi Ismael,
Thanks for your comments.

1. Why do we have to reallocate the buffer? We can keep a list of buffers
instead and avoid reallocation.
-> Do you mean we allocate multiple buffers with "buffer.initial.size", and
link them together (with linked list)?
ex:
a. We allocate 4KB initial buffer
| 4KB |

b. when new records reached and the remaining buffer is not enough for the
records, we create another batch with "batch.initial.size" buffer
ex: we already have 3KB of data in the 1st buffer, and here comes the 2KB
record

| 4KB (1KB remaining) |
now, record: 2KB coming
We fill the 1st 1KB into 1st buffer, and create new buffer, and linked
together, and fill the rest of data into it
| 4KB (full) | ---> | 4KB (3KB remaining) |

Is that what you mean?
If so, I think I like this idea!
If not, please explain more detail about it.
Thank you.

2. I think we should also consider tweaking the semantics of batch.size so
that the sent batches can be larger if the batch is not ready to be sent
(while still respecting max.request.size and perhaps a new max.batch.size).

--> In the KIP, I was trying to make the "batch.size" as the upper bound of
the batch size, and introduce a "batch.initial.size" as initial batch size.
So are you saying that we can let "batch.size" as initial batch size and
introduce a "max.batch.size" as upper bound value?
That's a good suggestion, but that would change the semantics of
"batch.size", which might surprise some users. I think my original proposal
("batch.initial.size") is safer for users. What do you think?

Thank you.
Luke


On Mon, Oct 18, 2021 at 3:12 AM Ismael Juma <ism...@juma.me.uk> wrote:

> I think we should also consider tweaking the semantics of batch.size so
> that the sent batches can be larger if the batch is not ready to be sent
> (while still respecting max.request.size and perhaps a new max.batch.size).
>
> Ismael
>
> On Sun, Oct 17, 2021, 12:08 PM Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Hi Luke,
> >
> > Thanks for the KIP. Why do we have to reallocate the buffer? We can keep
> a
> > list of buffers instead and avoid reallocation.
> >
> > Ismael
> >
> > On Sun, Oct 17, 2021, 2:02 AM Luke Chen <show...@gmail.com> wrote:
> >
> >> Hi Kafka dev,
> >> I'd like to start the discussion for the proposal: KIP-782: Expandable
> >> batch size in producer.
> >>
> >> The main purpose for this KIP is to have better memory usage in
> producer,
> >> and also save users from the dilemma while setting the batch size
> >> configuration. After this KIP, users can set a higher batch.size without
> >> worries, and of course, with an appropriate "batch.initial.size" and
> >> "batch.reallocation.factor".
> >>
> >> Derailed description can be found here:
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-782%3A+Expandable+batch+size+in+producer
> >>
> >> Any comments and feedback are welcome.
> >>
> >> Thank you.
> >> Luke
> >>
> >
>

Reply via email to