Hi Luke,

Make sense, done!

Thank you.
Sergio Troiano

On Tue, 8 Mar 2022 at 03:02, Luke Chen <show...@gmail.com> wrote:

> Hi Sergio,
>
> > I don't want this to minimize the main feature I want to deploy as I
> think the
> message size limit is not as important as the limiting the amount of
> batches.
>
> Agree! Let's focus on the feature of limiting the batch amounts.
>
> One more comment to the KIP:
> 1. Could you put the new parameter description into the KIP proposed change
> section? That would make it much clear.
>
>
> Thank you.
> Luke
>
> On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
> <sergio.troi...@adevinta.com.invalid> wrote:
>
> > hey Luke,
> >
> > I am interested in expanding the KIP scope but I am a bit concerned this
> > could create a lot of noise and confusion as they look like very similar
> > parameters, I agree this is a small change, so I think if I do it
> properly
> > it should not be a problem at all, I just will need a couple more of days
> > as I want to create the proper tests as well.
> >
> > I have a doubt about editing the KIP, I mean should I add this as a new
> > feature as well?, should I describe this as a side effect finding? I
> don't
> > want this to minimize the main feature I want to deploy as I think the
> > message size limit is not as important as the limiting the amount of
> > batches.
> >
> > It is up to you, if you guys consider we must add this in this KIP then I
> > will be happy to do it. 😀
> >
> > Best regards.
> > Sergio Troiano
> >
> > On Mon, 7 Mar 2022 at 02:01, Luke Chen <show...@gmail.com> wrote:
> >
> > > Hi Sergio,
> > >
> > > Thanks for your explanation.
> > > Make sense to me.
> > >
> > > > Only interesting thing that I have just found is *max-message-size
> *is
> > > not
> > > used while dump logs are requested, instead it is used by dumpIndex
> > >
> > > Are you interested in expanding the scope of this KIP to include the
> > > *max-message-size* in dumping logs?
> > > I think it's good because it will also be a small change, and no need
> to
> > go
> > > through another KIP discussing/voting process. But I'm fine if you want
> > to
> > > keep this KIP as is, and create another JIRA ticket for future work.
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
> > > <sergio.troi...@adevinta.com.invalid> wrote:
> > >
> > > > hey Luke,
> > > >
> > > > Let me answer them:
> > > > 1. If the *max-batches-size* is too small that results in no records
> > > > output, will we output any information to the user?
> > > >
> > > > If the  *max-batches-size*is even smaller than the first batch then
> > there
> > > > won't be any output, this is handled by FileRecords class, I think
> this
> > > is
> > > > correct as this is the expected behaviour.
> > > >
> > > > 2. After your explanation, I guess the use of *max-batches-size*
> won't
> > > > conflict with *max-message-size*, right?
> > > >
> > > > Only interesting thing that I have just found is *max-message-size
> *is
> > > not
> > > > used while dump logs are requested, instead it is used by dumpIndex
> > > > <
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
> > > > >
> > > > so,
> > > > this feature is not working for dumping logs, even though I checked
> if
> > > > there is a unit test for this and there is not any. Maybe we could
> > > create a
> > > > ticket for this?
> > > >
> > > > Regards.
> > > >
> > > >
> > > > On Sat, 5 Mar 2022 at 10:36, Luke Chen <show...@gmail.com> wrote:
> > > >
> > > > > Hi Sergio,
> > > > >
> > > > > Thanks for the explanation! Very clear!
> > > > > I think we should put this example and explanation into KIP.
> > > > >
> > > > > Other comments:
> > > > > 1. If the *max-batches-size* is too small that results in no
> records
> > > > > output, will we output any information to the user?
> > > > > 2. After your explanation, I guess the use of *max-batches-size*
> > won't
> > > > > conflict with *max-message-size*, right?
> > > > > That is, user can set the 2 arguments at the same time. Is that
> > > correct?
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
> > > > > <sergio.troi...@adevinta.com.invalid> wrote:
> > > > >
> > > > > > hey Luke,
> > > > > >
> > > > > > thanks for the interest, it is a good question, please let me
> > explain
> > > > > you:
> > > > > >
> > > > > > *max-message-size *a filter for the size of each batch, so for
> > > example
> > > > if
> > > > > > Iset --max-message-size 1000 bytes and my segment log has 300
> > > batches,
> > > > > 150
> > > > > > of them has a size of 500 bytes  and the other 150 has a size of
> > 2000
> > > > > bytes
> > > > > > then the script will skip the las 150 ones as each batch is
> heavier
> > > > than
> > > > > > the limit.
> > > > > >
> > > > > > In the other hand following the same example above with
> > > > *max-batches-size
> > > > > > *set
> > > > > > to 1000 bytes it will only print out the first 2 batches (500
> bytes
> > > > each)
> > > > > > and stop, This will avoid reading the whole file
> > > > > >
> > > > > >
> > > > > > Also if all of them are smaller than 1000 bytes it will end up
> > > printing
> > > > > out
> > > > > > all the batches.
> > > > > > The idea of my change is to limit the *amount* of batches no
> matter
> > > > their
> > > > > > size.
> > > > > >
> > > > > > I hope this reply helps.
> > > > > > Best regards.
> > > > > >
> > > > > > On Sat, 5 Mar 2022 at 08:00, Luke Chen <show...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Hi Sergio,
> > > > > > >
> > > > > > > Thanks for the KIP!
> > > > > > >
> > > > > > > One question:
> > > > > > > I saw there's a `max-message-size` argument that seems to do
> the
> > > same
> > > > > > thing
> > > > > > > as you want.
> > > > > > > Could you help explain what's the difference between
> > > > `max-message-size`
> > > > > > and
> > > > > > > `max-batches-size`?
> > > > > > >
> > > > > > > Thank you.
> > > > > > > Luke
> > > > > > >
> > > > > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True <
> k...@mustardgrain.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Sergio,
> > > > > > > >
> > > > > > > > Thanks for the KIP. I don't know anything about the log
> segment
> > > > > > > internals,
> > > > > > > > but the logic and implementation seem sound.
> > > > > > > >
> > > > > > > > Three questions:
> > > > > > > >  1. Since the --max-batches-size unit is bytes, does it
> matter
> > if
> > > > > that
> > > > > > > > size doesn't align to a record boundary?
> > > > > > > >  2. Can you add a check to make sure that --max-batches-size
> > > > doesn't
> > > > > > > allow
> > > > > > > > the user to pass in a negative number?
> > > > > > > >  3. Can you add/update any unit tests related to the
> > > > DumpLogSegments
> > > > > > > > arguments?
> > > > > > > > Thanks,
> > > > > > > > Kirk
> > > > > > > >
> > > > > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to