Hi Sergio,

Thanks for your explanation.
Make sense to me.

> Only interesting thing that I have just found is *max-message-size *is not
used while dump logs are requested, instead it is used by dumpIndex

Are you interested in expanding the scope of this KIP to include the
*max-message-size* in dumping logs?
I think it's good because it will also be a small change, and no need to go
through another KIP discussing/voting process. But I'm fine if you want to
keep this KIP as is, and create another JIRA ticket for future work.

Thank you.
Luke

On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
<sergio.troi...@adevinta.com.invalid> wrote:

> hey Luke,
>
> Let me answer them:
> 1. If the *max-batches-size* is too small that results in no records
> output, will we output any information to the user?
>
> If the  *max-batches-size*is even smaller than the first batch then there
> won't be any output, this is handled by FileRecords class, I think this is
> correct as this is the expected behaviour.
>
> 2. After your explanation, I guess the use of *max-batches-size* won't
> conflict with *max-message-size*, right?
>
> Only interesting thing that I have just found is *max-message-size *is not
> used while dump logs are requested, instead it is used by dumpIndex
> <
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
> >
> so,
> this feature is not working for dumping logs, even though I checked if
> there is a unit test for this and there is not any. Maybe we could create a
> ticket for this?
>
> Regards.
>
>
> On Sat, 5 Mar 2022 at 10:36, Luke Chen <show...@gmail.com> wrote:
>
> > Hi Sergio,
> >
> > Thanks for the explanation! Very clear!
> > I think we should put this example and explanation into KIP.
> >
> > Other comments:
> > 1. If the *max-batches-size* is too small that results in no records
> > output, will we output any information to the user?
> > 2. After your explanation, I guess the use of *max-batches-size* won't
> > conflict with *max-message-size*, right?
> > That is, user can set the 2 arguments at the same time. Is that correct?
> >
> > Thank you.
> > Luke
> >
> > Thank you.
> > Luke
> >
> > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
> > <sergio.troi...@adevinta.com.invalid> wrote:
> >
> > > hey Luke,
> > >
> > > thanks for the interest, it is a good question, please let me explain
> > you:
> > >
> > > *max-message-size *a filter for the size of each batch, so for example
> if
> > > Iset --max-message-size 1000 bytes and my segment log has 300 batches,
> > 150
> > > of them has a size of 500 bytes  and the other 150 has a size of 2000
> > bytes
> > > then the script will skip the las 150 ones as each batch is heavier
> than
> > > the limit.
> > >
> > > In the other hand following the same example above with
> *max-batches-size
> > > *set
> > > to 1000 bytes it will only print out the first 2 batches (500 bytes
> each)
> > > and stop, This will avoid reading the whole file
> > >
> > >
> > > Also if all of them are smaller than 1000 bytes it will end up printing
> > out
> > > all the batches.
> > > The idea of my change is to limit the *amount* of batches no matter
> their
> > > size.
> > >
> > > I hope this reply helps.
> > > Best regards.
> > >
> > > On Sat, 5 Mar 2022 at 08:00, Luke Chen <show...@gmail.com> wrote:
> > >
> > > > Hi Sergio,
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > One question:
> > > > I saw there's a `max-message-size` argument that seems to do the same
> > > thing
> > > > as you want.
> > > > Could you help explain what's the difference between
> `max-message-size`
> > > and
> > > > `max-batches-size`?
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True <k...@mustardgrain.com>
> > wrote:
> > > >
> > > > > Hi Sergio,
> > > > >
> > > > > Thanks for the KIP. I don't know anything about the log segment
> > > > internals,
> > > > > but the logic and implementation seem sound.
> > > > >
> > > > > Three questions:
> > > > >  1. Since the --max-batches-size unit is bytes, does it matter if
> > that
> > > > > size doesn't align to a record boundary?
> > > > >  2. Can you add a check to make sure that --max-batches-size
> doesn't
> > > > allow
> > > > > the user to pass in a negative number?
> > > > >  3. Can you add/update any unit tests related to the
> DumpLogSegments
> > > > > arguments?
> > > > > Thanks,
> > > > > Kirk
> > > > >
> > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to