Hey guys,

What is the next step? Who decides when it is time for voting?


Thanks!

Sent from my iPhone

> On 8 Mar 2022, at 19:57, Sergio Daniel Troiano <sergio.troi...@adevinta.com> 
> wrote:
> 
> 
> Hi Michael,
> 
> Yes, it's a good idea and I considered it, the main problem is the 
> FileRecords class does not accept number of batches as a parameter, it 
> accepts bytes instead, so if we want to do so either we redesign a core class 
> or we create a new one.
> One of the pretty things (I consider) about this change which will bring a 
> huge benefit is the change in the code is pretty small and it's on the final 
> script, it does not require any deep change in a core library.
> 
> An alternative which requires a big change as well without touching the 
> FileRecords class would be accept number of batches as parameter, then call 
> the FileRecords.slice() with a small value (bytes) count the batches, see if 
> we can satisfy the number of batches, if not then we call it again and again 
> until we reach the amount of batches. This will require a lot of code as well
> 
> So long story short, the proposal change is quite small, it uses the current 
> classes and has a big benefit.
> 
> Maybe in the future we could consider the FileRecords class to support 
> getting the amount of batches as parameters and we encapsulate this logic in 
> the proper class (FileRecords)
> What do you think?
> 
> Thanks
> Sergio 
> Thanks
> Sergio
> 
>> On Tue, 8 Mar 2022 at 18:32, Mickael Maison <mickael.mai...@gmail.com> wrote:
>> Hi Sergio,
>> 
>> Thanks for the KIP. Instead of specifying the size in bytes, have you
>> considered specifying it in terms of number of batches? I think it's a
>> bit more user friendly than a size in raw bytes.
>> For example: --num-batches: The number of batches to read from the log 
>> segment.
>> 
>> Thanks,
>> Mickael
>> 
>> 
>> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano
>> <sergio.troi...@adevinta.com.invalid> wrote:
>> >
>> > Hi Luke,
>> >
>> > Make sense, done!
>> >
>> > Thank you.
>> > Sergio Troiano
>> >
>> > On Tue, 8 Mar 2022 at 03:02, Luke Chen <show...@gmail.com> wrote:
>> >
>> > > Hi Sergio,
>> > >
>> > > > I don't want this to minimize the main feature I want to deploy as I
>> > > think the
>> > > message size limit is not as important as the limiting the amount of
>> > > batches.
>> > >
>> > > Agree! Let's focus on the feature of limiting the batch amounts.
>> > >
>> > > One more comment to the KIP:
>> > > 1. Could you put the new parameter description into the KIP proposed 
>> > > change
>> > > section? That would make it much clear.
>> > >
>> > >
>> > > Thank you.
>> > > Luke
>> > >
>> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
>> > > <sergio.troi...@adevinta.com.invalid> wrote:
>> > >
>> > > > hey Luke,
>> > > >
>> > > > I am interested in expanding the KIP scope but I am a bit concerned 
>> > > > this
>> > > > could create a lot of noise and confusion as they look like very 
>> > > > similar
>> > > > parameters, I agree this is a small change, so I think if I do it
>> > > properly
>> > > > it should not be a problem at all, I just will need a couple more of 
>> > > > days
>> > > > as I want to create the proper tests as well.
>> > > >
>> > > > I have a doubt about editing the KIP, I mean should I add this as a new
>> > > > feature as well?, should I describe this as a side effect finding? I
>> > > don't
>> > > > want this to minimize the main feature I want to deploy as I think the
>> > > > message size limit is not as important as the limiting the amount of
>> > > > batches.
>> > > >
>> > > > It is up to you, if you guys consider we must add this in this KIP 
>> > > > then I
>> > > > will be happy to do it. 😀
>> > > >
>> > > > Best regards.
>> > > > Sergio Troiano
>> > > >
>> > > > On Mon, 7 Mar 2022 at 02:01, Luke Chen <show...@gmail.com> wrote:
>> > > >
>> > > > > Hi Sergio,
>> > > > >
>> > > > > Thanks for your explanation.
>> > > > > Make sense to me.
>> > > > >
>> > > > > > Only interesting thing that I have just found is *max-message-size
>> > > *is
>> > > > > not
>> > > > > used while dump logs are requested, instead it is used by dumpIndex
>> > > > >
>> > > > > Are you interested in expanding the scope of this KIP to include the
>> > > > > *max-message-size* in dumping logs?
>> > > > > I think it's good because it will also be a small change, and no need
>> > > to
>> > > > go
>> > > > > through another KIP discussing/voting process. But I'm fine if you 
>> > > > > want
>> > > > to
>> > > > > keep this KIP as is, and create another JIRA ticket for future work.
>> > > > >
>> > > > > Thank you.
>> > > > > Luke
>> > > > >
>> > > > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
>> > > > > <sergio.troi...@adevinta.com.invalid> wrote:
>> > > > >
>> > > > > > hey Luke,
>> > > > > >
>> > > > > > Let me answer them:
>> > > > > > 1. If the *max-batches-size* is too small that results in no 
>> > > > > > records
>> > > > > > output, will we output any information to the user?
>> > > > > >
>> > > > > > If the  *max-batches-size*is even smaller than the first batch then
>> > > > there
>> > > > > > won't be any output, this is handled by FileRecords class, I think
>> > > this
>> > > > > is
>> > > > > > correct as this is the expected behaviour.
>> > > > > >
>> > > > > > 2. After your explanation, I guess the use of *max-batches-size*
>> > > won't
>> > > > > > conflict with *max-message-size*, right?
>> > > > > >
>> > > > > > Only interesting thing that I have just found is *max-message-size
>> > > *is
>> > > > > not
>> > > > > > used while dump logs are requested, instead it is used by dumpIndex
>> > > > > > <
>> > > > > >
>> > > > >
>> > > >
>> > > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
>> > > > > > >
>> > > > > > so,
>> > > > > > this feature is not working for dumping logs, even though I checked
>> > > if
>> > > > > > there is a unit test for this and there is not any. Maybe we could
>> > > > > create a
>> > > > > > ticket for this?
>> > > > > >
>> > > > > > Regards.
>> > > > > >
>> > > > > >
>> > > > > > On Sat, 5 Mar 2022 at 10:36, Luke Chen <show...@gmail.com> wrote:
>> > > > > >
>> > > > > > > Hi Sergio,
>> > > > > > >
>> > > > > > > Thanks for the explanation! Very clear!
>> > > > > > > I think we should put this example and explanation into KIP.
>> > > > > > >
>> > > > > > > Other comments:
>> > > > > > > 1. If the *max-batches-size* is too small that results in no
>> > > records
>> > > > > > > output, will we output any information to the user?
>> > > > > > > 2. After your explanation, I guess the use of *max-batches-size*
>> > > > won't
>> > > > > > > conflict with *max-message-size*, right?
>> > > > > > > That is, user can set the 2 arguments at the same time. Is that
>> > > > > correct?
>> > > > > > >
>> > > > > > > Thank you.
>> > > > > > > Luke
>> > > > > > >
>> > > > > > > Thank you.
>> > > > > > > Luke
>> > > > > > >
>> > > > > > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
>> > > > > > > <sergio.troi...@adevinta.com.invalid> wrote:
>> > > > > > >
>> > > > > > > > hey Luke,
>> > > > > > > >
>> > > > > > > > thanks for the interest, it is a good question, please let me
>> > > > explain
>> > > > > > > you:
>> > > > > > > >
>> > > > > > > > *max-message-size *a filter for the size of each batch, so for
>> > > > > example
>> > > > > > if
>> > > > > > > > Iset --max-message-size 1000 bytes and my segment log has 300
>> > > > > batches,
>> > > > > > > 150
>> > > > > > > > of them has a size of 500 bytes  and the other 150 has a size 
>> > > > > > > > of
>> > > > 2000
>> > > > > > > bytes
>> > > > > > > > then the script will skip the las 150 ones as each batch is
>> > > heavier
>> > > > > > than
>> > > > > > > > the limit.
>> > > > > > > >
>> > > > > > > > In the other hand following the same example above with
>> > > > > > *max-batches-size
>> > > > > > > > *set
>> > > > > > > > to 1000 bytes it will only print out the first 2 batches (500
>> > > bytes
>> > > > > > each)
>> > > > > > > > and stop, This will avoid reading the whole file
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Also if all of them are smaller than 1000 bytes it will end up
>> > > > > printing
>> > > > > > > out
>> > > > > > > > all the batches.
>> > > > > > > > The idea of my change is to limit the *amount* of batches no
>> > > matter
>> > > > > > their
>> > > > > > > > size.
>> > > > > > > >
>> > > > > > > > I hope this reply helps.
>> > > > > > > > Best regards.
>> > > > > > > >
>> > > > > > > > On Sat, 5 Mar 2022 at 08:00, Luke Chen <show...@gmail.com>
>> > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Sergio,
>> > > > > > > > >
>> > > > > > > > > Thanks for the KIP!
>> > > > > > > > >
>> > > > > > > > > One question:
>> > > > > > > > > I saw there's a `max-message-size` argument that seems to do
>> > > the
>> > > > > same
>> > > > > > > > thing
>> > > > > > > > > as you want.
>> > > > > > > > > Could you help explain what's the difference between
>> > > > > > `max-message-size`
>> > > > > > > > and
>> > > > > > > > > `max-batches-size`?
>> > > > > > > > >
>> > > > > > > > > Thank you.
>> > > > > > > > > Luke
>> > > > > > > > >
>> > > > > > > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True <
>> > > k...@mustardgrain.com>
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi Sergio,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for the KIP. I don't know anything about the log
>> > > segment
>> > > > > > > > > internals,
>> > > > > > > > > > but the logic and implementation seem sound.
>> > > > > > > > > >
>> > > > > > > > > > Three questions:
>> > > > > > > > > >  1. Since the --max-batches-size unit is bytes, does it
>> > > matter
>> > > > if
>> > > > > > > that
>> > > > > > > > > > size doesn't align to a record boundary?
>> > > > > > > > > >  2. Can you add a check to make sure that 
>> > > > > > > > > > --max-batches-size
>> > > > > > doesn't
>> > > > > > > > > allow
>> > > > > > > > > > the user to pass in a negative number?
>> > > > > > > > > >  3. Can you add/update any unit tests related to the
>> > > > > > DumpLogSegments
>> > > > > > > > > > arguments?
>> > > > > > > > > > Thanks,
>> > > > > > > > > > Kirk
>> > > > > > > > > >
>> > > > > > > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano 
>> > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >

Reply via email to