Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-25 Thread David Jacot
Thanks, Sergio.

On Fri, Mar 25, 2022 at 7:41 AM Sergio Daniel Troiano
 wrote:
>
> Hi David,
>
> I apologize. I missed your suggestion.
> By the way I like it and I have applied your suggestion.
>
> About the rejected alternatives I have updated the KIP as well
>
> Best regards
> Sergio Troiano
>
> On Fri, 25 Mar 2022 at 06:50, David Jacot  wrote:
>
> > Hi Sergio,
> >
> > I made a suggestion a few weeks ago about the name about the parameter but
> > haven’t got a response for it. Did you consider it?
> >
> > Do we need to update the rejected alternatives section to mention the
> > alternative options discussed in this thread?
> >
> > Thanks,
> > David
> >
> > Le ven. 25 mars 2022 à 03:44, Luke Chen  a écrit :
> >
> > > Hi Sergio,
> > >
> > > Thanks for asking.
> > > Since it's been discussed for weeks, you can start the vote anytime.
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Fri, Mar 25, 2022 at 10:40 AM Sergio Troiano
> > >  wrote:
> > >
> > > > Hey guys,
> > > >
> > > >
> > > > What is the next step? Who decides when it is time for voting?
> > > >
> > > >
> > > > Thanks!
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On 8 Mar 2022, at 19:57, Sergio Daniel Troiano <
> > > > sergio.troi...@adevinta.com> wrote:
> > > > >
> > > > > 
> > > > > Hi Michael,
> > > > >
> > > > > Yes, it's a good idea and I considered it, the main problem is the
> > > > FileRecords class does not accept number of batches as a parameter, it
> > > > accepts bytes instead, so if we want to do so either we redesign a core
> > > > class or we create a new one.
> > > > > One of the pretty things (I consider) about this change which will
> > > bring
> > > > a huge benefit is the change in the code is pretty small and it's on
> > the
> > > > final script, it does not require any deep change in a core library.
> > > > >
> > > > > An alternative which requires a big change as well without touching
> > the
> > > > FileRecords class would be accept number of batches as parameter, then
> > > call
> > > > the FileRecords.slice() with a small value (bytes) count the batches,
> > see
> > > > if we can satisfy the number of batches, if not then we call it again
> > and
> > > > again until we reach the amount of batches. This will require a lot of
> > > code
> > > > as well
> > > > >
> > > > > So long story short, the proposal change is quite small, it uses the
> > > > current classes and has a big benefit.
> > > > >
> > > > > Maybe in the future we could consider the FileRecords class to
> > support
> > > > getting the amount of batches as parameters and we encapsulate this
> > logic
> > > > in the proper class (FileRecords)
> > > > > What do you think?
> > > > >
> > > > > Thanks
> > > > > Sergio
> > > > > Thanks
> > > > > Sergio
> > > > >
> > > > >> On Tue, 8 Mar 2022 at 18:32, Mickael Maison <
> > mickael.mai...@gmail.com
> > > >
> > > > wrote:
> > > > >> Hi Sergio,
> > > > >>
> > > > >> Thanks for the KIP. Instead of specifying the size in bytes, have
> > you
> > > > >> considered specifying it in terms of number of batches? I think
> > it's a
> > > > >> bit more user friendly than a size in raw bytes.
> > > > >> For example: --num-batches: The number of batches to read from the
> > log
> > > > segment.
> > > > >>
> > > > >> Thanks,
> > > > >> Mickael
> > > > >>
> > > > >>
> > > > >> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano
> > > > >>  wrote:
> > > > >> >
> > > > >> > Hi Luke,
> > > > >> >
> > > > >> > Make sense, done!
> > > > >> >
> > > > >> > Thank you.
> > > > >> > Sergio Troiano
> > > > >> >
> > > > >> > On Tue, 8 Mar 2022 at 03:02, Luke Chen  wrote:
> > > > >> >
> > > > >> > > Hi Sergio,
> > > > >> > >
> > > > >> > > > I don't want this to minimize the main feature I want to
> > deploy
> > > > as I
> > > > >> > > think the
> > > > >> > > message size limit is not as important as the limiting the
> > amount
> > > of
> > > > >> > > batches.
> > > > >> > >
> > > > >> > > Agree! Let's focus on the feature of limiting the batch amounts.
> > > > >> > >
> > > > >> > > One more comment to the KIP:
> > > > >> > > 1. Could you put the new parameter description into the KIP
> > > > proposed change
> > > > >> > > section? That would make it much clear.
> > > > >> > >
> > > > >> > >
> > > > >> > > Thank you.
> > > > >> > > Luke
> > > > >> > >
> > > > >> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
> > > > >> > >  wrote:
> > > > >> > >
> > > > >> > > > hey Luke,
> > > > >> > > >
> > > > >> > > > I am interested in expanding the KIP scope but I am a bit
> > > > concerned this
> > > > >> > > > could create a lot of noise and confusion as they look like
> > very
> > > > similar
> > > > >> > > > parameters, I agree this is a small change, so I think if I do
> > > it
> > > > >> > > properly
> > > > >> > > > it should not be a problem at all, I just will need a couple
> > > more
> > > > of days
> > > > >> > > > as I want to create the proper tests as well.
> > > > >> > > >
> > > > >> > > > I have a doubt about editing 

Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-25 Thread Sergio Daniel Troiano
Hi David,

I apologize. I missed your suggestion.
By the way I like it and I have applied your suggestion.

About the rejected alternatives I have updated the KIP as well

Best regards
Sergio Troiano

On Fri, 25 Mar 2022 at 06:50, David Jacot  wrote:

> Hi Sergio,
>
> I made a suggestion a few weeks ago about the name about the parameter but
> haven’t got a response for it. Did you consider it?
>
> Do we need to update the rejected alternatives section to mention the
> alternative options discussed in this thread?
>
> Thanks,
> David
>
> Le ven. 25 mars 2022 à 03:44, Luke Chen  a écrit :
>
> > Hi Sergio,
> >
> > Thanks for asking.
> > Since it's been discussed for weeks, you can start the vote anytime.
> >
> > Thank you.
> > Luke
> >
> > On Fri, Mar 25, 2022 at 10:40 AM Sergio Troiano
> >  wrote:
> >
> > > Hey guys,
> > >
> > >
> > > What is the next step? Who decides when it is time for voting?
> > >
> > >
> > > Thanks!
> > >
> > > Sent from my iPhone
> > >
> > > > On 8 Mar 2022, at 19:57, Sergio Daniel Troiano <
> > > sergio.troi...@adevinta.com> wrote:
> > > >
> > > > 
> > > > Hi Michael,
> > > >
> > > > Yes, it's a good idea and I considered it, the main problem is the
> > > FileRecords class does not accept number of batches as a parameter, it
> > > accepts bytes instead, so if we want to do so either we redesign a core
> > > class or we create a new one.
> > > > One of the pretty things (I consider) about this change which will
> > bring
> > > a huge benefit is the change in the code is pretty small and it's on
> the
> > > final script, it does not require any deep change in a core library.
> > > >
> > > > An alternative which requires a big change as well without touching
> the
> > > FileRecords class would be accept number of batches as parameter, then
> > call
> > > the FileRecords.slice() with a small value (bytes) count the batches,
> see
> > > if we can satisfy the number of batches, if not then we call it again
> and
> > > again until we reach the amount of batches. This will require a lot of
> > code
> > > as well
> > > >
> > > > So long story short, the proposal change is quite small, it uses the
> > > current classes and has a big benefit.
> > > >
> > > > Maybe in the future we could consider the FileRecords class to
> support
> > > getting the amount of batches as parameters and we encapsulate this
> logic
> > > in the proper class (FileRecords)
> > > > What do you think?
> > > >
> > > > Thanks
> > > > Sergio
> > > > Thanks
> > > > Sergio
> > > >
> > > >> On Tue, 8 Mar 2022 at 18:32, Mickael Maison <
> mickael.mai...@gmail.com
> > >
> > > wrote:
> > > >> Hi Sergio,
> > > >>
> > > >> Thanks for the KIP. Instead of specifying the size in bytes, have
> you
> > > >> considered specifying it in terms of number of batches? I think
> it's a
> > > >> bit more user friendly than a size in raw bytes.
> > > >> For example: --num-batches: The number of batches to read from the
> log
> > > segment.
> > > >>
> > > >> Thanks,
> > > >> Mickael
> > > >>
> > > >>
> > > >> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano
> > > >>  wrote:
> > > >> >
> > > >> > Hi Luke,
> > > >> >
> > > >> > Make sense, done!
> > > >> >
> > > >> > Thank you.
> > > >> > Sergio Troiano
> > > >> >
> > > >> > On Tue, 8 Mar 2022 at 03:02, Luke Chen  wrote:
> > > >> >
> > > >> > > Hi Sergio,
> > > >> > >
> > > >> > > > I don't want this to minimize the main feature I want to
> deploy
> > > as I
> > > >> > > think the
> > > >> > > message size limit is not as important as the limiting the
> amount
> > of
> > > >> > > batches.
> > > >> > >
> > > >> > > Agree! Let's focus on the feature of limiting the batch amounts.
> > > >> > >
> > > >> > > One more comment to the KIP:
> > > >> > > 1. Could you put the new parameter description into the KIP
> > > proposed change
> > > >> > > section? That would make it much clear.
> > > >> > >
> > > >> > >
> > > >> > > Thank you.
> > > >> > > Luke
> > > >> > >
> > > >> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
> > > >> > >  wrote:
> > > >> > >
> > > >> > > > hey Luke,
> > > >> > > >
> > > >> > > > I am interested in expanding the KIP scope but I am a bit
> > > concerned this
> > > >> > > > could create a lot of noise and confusion as they look like
> very
> > > similar
> > > >> > > > parameters, I agree this is a small change, so I think if I do
> > it
> > > >> > > properly
> > > >> > > > it should not be a problem at all, I just will need a couple
> > more
> > > of days
> > > >> > > > as I want to create the proper tests as well.
> > > >> > > >
> > > >> > > > I have a doubt about editing the KIP, I mean should I add this
> > as
> > > a new
> > > >> > > > feature as well?, should I describe this as a side effect
> > > finding? I
> > > >> > > don't
> > > >> > > > want this to minimize the main feature I want to deploy as I
> > > think the
> > > >> > > > message size limit is not as important as the limiting the
> > amount
> > > of
> > > >> > > > batches.
> > > >> > > >
> > > >> > 

Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-24 Thread David Jacot
Hi Sergio,

I made a suggestion a few weeks ago about the name about the parameter but
haven’t got a response for it. Did you consider it?

Do we need to update the rejected alternatives section to mention the
alternative options discussed in this thread?

Thanks,
David

Le ven. 25 mars 2022 à 03:44, Luke Chen  a écrit :

> Hi Sergio,
>
> Thanks for asking.
> Since it's been discussed for weeks, you can start the vote anytime.
>
> Thank you.
> Luke
>
> On Fri, Mar 25, 2022 at 10:40 AM Sergio Troiano
>  wrote:
>
> > Hey guys,
> >
> >
> > What is the next step? Who decides when it is time for voting?
> >
> >
> > Thanks!
> >
> > Sent from my iPhone
> >
> > > On 8 Mar 2022, at 19:57, Sergio Daniel Troiano <
> > sergio.troi...@adevinta.com> wrote:
> > >
> > > 
> > > Hi Michael,
> > >
> > > Yes, it's a good idea and I considered it, the main problem is the
> > FileRecords class does not accept number of batches as a parameter, it
> > accepts bytes instead, so if we want to do so either we redesign a core
> > class or we create a new one.
> > > One of the pretty things (I consider) about this change which will
> bring
> > a huge benefit is the change in the code is pretty small and it's on the
> > final script, it does not require any deep change in a core library.
> > >
> > > An alternative which requires a big change as well without touching the
> > FileRecords class would be accept number of batches as parameter, then
> call
> > the FileRecords.slice() with a small value (bytes) count the batches, see
> > if we can satisfy the number of batches, if not then we call it again and
> > again until we reach the amount of batches. This will require a lot of
> code
> > as well
> > >
> > > So long story short, the proposal change is quite small, it uses the
> > current classes and has a big benefit.
> > >
> > > Maybe in the future we could consider the FileRecords class to support
> > getting the amount of batches as parameters and we encapsulate this logic
> > in the proper class (FileRecords)
> > > What do you think?
> > >
> > > Thanks
> > > Sergio
> > > Thanks
> > > Sergio
> > >
> > >> On Tue, 8 Mar 2022 at 18:32, Mickael Maison  >
> > wrote:
> > >> Hi Sergio,
> > >>
> > >> Thanks for the KIP. Instead of specifying the size in bytes, have you
> > >> considered specifying it in terms of number of batches? I think it's a
> > >> bit more user friendly than a size in raw bytes.
> > >> For example: --num-batches: The number of batches to read from the log
> > segment.
> > >>
> > >> Thanks,
> > >> Mickael
> > >>
> > >>
> > >> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano
> > >>  wrote:
> > >> >
> > >> > Hi Luke,
> > >> >
> > >> > Make sense, done!
> > >> >
> > >> > Thank you.
> > >> > Sergio Troiano
> > >> >
> > >> > On Tue, 8 Mar 2022 at 03:02, Luke Chen  wrote:
> > >> >
> > >> > > Hi Sergio,
> > >> > >
> > >> > > > I don't want this to minimize the main feature I want to deploy
> > as I
> > >> > > think the
> > >> > > message size limit is not as important as the limiting the amount
> of
> > >> > > batches.
> > >> > >
> > >> > > Agree! Let's focus on the feature of limiting the batch amounts.
> > >> > >
> > >> > > One more comment to the KIP:
> > >> > > 1. Could you put the new parameter description into the KIP
> > proposed change
> > >> > > section? That would make it much clear.
> > >> > >
> > >> > >
> > >> > > Thank you.
> > >> > > Luke
> > >> > >
> > >> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
> > >> > >  wrote:
> > >> > >
> > >> > > > hey Luke,
> > >> > > >
> > >> > > > I am interested in expanding the KIP scope but I am a bit
> > concerned this
> > >> > > > could create a lot of noise and confusion as they look like very
> > similar
> > >> > > > parameters, I agree this is a small change, so I think if I do
> it
> > >> > > properly
> > >> > > > it should not be a problem at all, I just will need a couple
> more
> > of days
> > >> > > > as I want to create the proper tests as well.
> > >> > > >
> > >> > > > I have a doubt about editing the KIP, I mean should I add this
> as
> > a new
> > >> > > > feature as well?, should I describe this as a side effect
> > finding? I
> > >> > > don't
> > >> > > > want this to minimize the main feature I want to deploy as I
> > think the
> > >> > > > message size limit is not as important as the limiting the
> amount
> > of
> > >> > > > batches.
> > >> > > >
> > >> > > > It is up to you, if you guys consider we must add this in this
> > KIP then I
> > >> > > > will be happy to do it. 
> > >> > > >
> > >> > > > Best regards.
> > >> > > > Sergio Troiano
> > >> > > >
> > >> > > > On Mon, 7 Mar 2022 at 02:01, Luke Chen 
> wrote:
> > >> > > >
> > >> > > > > Hi Sergio,
> > >> > > > >
> > >> > > > > Thanks for your explanation.
> > >> > > > > Make sense to me.
> > >> > > > >
> > >> > > > > > Only interesting thing that I have just found is
> > *max-message-size
> > >> > > *is
> > >> > > > > not
> > >> > > > > used while dump logs are requested, instead it is 

Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-24 Thread Luke Chen
Hi Sergio,

Thanks for asking.
Since it's been discussed for weeks, you can start the vote anytime.

Thank you.
Luke

On Fri, Mar 25, 2022 at 10:40 AM Sergio Troiano
 wrote:

> Hey guys,
>
>
> What is the next step? Who decides when it is time for voting?
>
>
> Thanks!
>
> Sent from my iPhone
>
> > On 8 Mar 2022, at 19:57, Sergio Daniel Troiano <
> sergio.troi...@adevinta.com> wrote:
> >
> > 
> > Hi Michael,
> >
> > Yes, it's a good idea and I considered it, the main problem is the
> FileRecords class does not accept number of batches as a parameter, it
> accepts bytes instead, so if we want to do so either we redesign a core
> class or we create a new one.
> > One of the pretty things (I consider) about this change which will bring
> a huge benefit is the change in the code is pretty small and it's on the
> final script, it does not require any deep change in a core library.
> >
> > An alternative which requires a big change as well without touching the
> FileRecords class would be accept number of batches as parameter, then call
> the FileRecords.slice() with a small value (bytes) count the batches, see
> if we can satisfy the number of batches, if not then we call it again and
> again until we reach the amount of batches. This will require a lot of code
> as well
> >
> > So long story short, the proposal change is quite small, it uses the
> current classes and has a big benefit.
> >
> > Maybe in the future we could consider the FileRecords class to support
> getting the amount of batches as parameters and we encapsulate this logic
> in the proper class (FileRecords)
> > What do you think?
> >
> > Thanks
> > Sergio
> > Thanks
> > Sergio
> >
> >> On Tue, 8 Mar 2022 at 18:32, Mickael Maison 
> wrote:
> >> Hi Sergio,
> >>
> >> Thanks for the KIP. Instead of specifying the size in bytes, have you
> >> considered specifying it in terms of number of batches? I think it's a
> >> bit more user friendly than a size in raw bytes.
> >> For example: --num-batches: The number of batches to read from the log
> segment.
> >>
> >> Thanks,
> >> Mickael
> >>
> >>
> >> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano
> >>  wrote:
> >> >
> >> > Hi Luke,
> >> >
> >> > Make sense, done!
> >> >
> >> > Thank you.
> >> > Sergio Troiano
> >> >
> >> > On Tue, 8 Mar 2022 at 03:02, Luke Chen  wrote:
> >> >
> >> > > Hi Sergio,
> >> > >
> >> > > > I don't want this to minimize the main feature I want to deploy
> as I
> >> > > think the
> >> > > message size limit is not as important as the limiting the amount of
> >> > > batches.
> >> > >
> >> > > Agree! Let's focus on the feature of limiting the batch amounts.
> >> > >
> >> > > One more comment to the KIP:
> >> > > 1. Could you put the new parameter description into the KIP
> proposed change
> >> > > section? That would make it much clear.
> >> > >
> >> > >
> >> > > Thank you.
> >> > > Luke
> >> > >
> >> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
> >> > >  wrote:
> >> > >
> >> > > > hey Luke,
> >> > > >
> >> > > > I am interested in expanding the KIP scope but I am a bit
> concerned this
> >> > > > could create a lot of noise and confusion as they look like very
> similar
> >> > > > parameters, I agree this is a small change, so I think if I do it
> >> > > properly
> >> > > > it should not be a problem at all, I just will need a couple more
> of days
> >> > > > as I want to create the proper tests as well.
> >> > > >
> >> > > > I have a doubt about editing the KIP, I mean should I add this as
> a new
> >> > > > feature as well?, should I describe this as a side effect
> finding? I
> >> > > don't
> >> > > > want this to minimize the main feature I want to deploy as I
> think the
> >> > > > message size limit is not as important as the limiting the amount
> of
> >> > > > batches.
> >> > > >
> >> > > > It is up to you, if you guys consider we must add this in this
> KIP then I
> >> > > > will be happy to do it. 
> >> > > >
> >> > > > Best regards.
> >> > > > Sergio Troiano
> >> > > >
> >> > > > On Mon, 7 Mar 2022 at 02:01, Luke Chen  wrote:
> >> > > >
> >> > > > > Hi Sergio,
> >> > > > >
> >> > > > > Thanks for your explanation.
> >> > > > > Make sense to me.
> >> > > > >
> >> > > > > > Only interesting thing that I have just found is
> *max-message-size
> >> > > *is
> >> > > > > not
> >> > > > > used while dump logs are requested, instead it is used by
> dumpIndex
> >> > > > >
> >> > > > > Are you interested in expanding the scope of this KIP to
> include the
> >> > > > > *max-message-size* in dumping logs?
> >> > > > > I think it's good because it will also be a small change, and
> no need
> >> > > to
> >> > > > go
> >> > > > > through another KIP discussing/voting process. But I'm fine if
> you want
> >> > > > to
> >> > > > > keep this KIP as is, and create another JIRA ticket for future
> work.
> >> > > > >
> >> > > > > Thank you.
> >> > > > > Luke
> >> > > > >
> >> > > > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
> >> > > > >  wrote:
> >> > > > >
> 

Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-24 Thread Sergio Troiano
Hey guys,


What is the next step? Who decides when it is time for voting?


Thanks!

Sent from my iPhone

> On 8 Mar 2022, at 19:57, Sergio Daniel Troiano  
> wrote:
> 
> 
> Hi Michael,
> 
> Yes, it's a good idea and I considered it, the main problem is the 
> FileRecords class does not accept number of batches as a parameter, it 
> accepts bytes instead, so if we want to do so either we redesign a core class 
> or we create a new one.
> One of the pretty things (I consider) about this change which will bring a 
> huge benefit is the change in the code is pretty small and it's on the final 
> script, it does not require any deep change in a core library.
> 
> An alternative which requires a big change as well without touching the 
> FileRecords class would be accept number of batches as parameter, then call 
> the FileRecords.slice() with a small value (bytes) count the batches, see if 
> we can satisfy the number of batches, if not then we call it again and again 
> until we reach the amount of batches. This will require a lot of code as well
> 
> So long story short, the proposal change is quite small, it uses the current 
> classes and has a big benefit.
> 
> Maybe in the future we could consider the FileRecords class to support 
> getting the amount of batches as parameters and we encapsulate this logic in 
> the proper class (FileRecords)
> What do you think?
> 
> Thanks
> Sergio 
> Thanks
> Sergio
> 
>> On Tue, 8 Mar 2022 at 18:32, Mickael Maison  wrote:
>> Hi Sergio,
>> 
>> Thanks for the KIP. Instead of specifying the size in bytes, have you
>> considered specifying it in terms of number of batches? I think it's a
>> bit more user friendly than a size in raw bytes.
>> For example: --num-batches: The number of batches to read from the log 
>> segment.
>> 
>> Thanks,
>> Mickael
>> 
>> 
>> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano
>>  wrote:
>> >
>> > Hi Luke,
>> >
>> > Make sense, done!
>> >
>> > Thank you.
>> > Sergio Troiano
>> >
>> > On Tue, 8 Mar 2022 at 03:02, Luke Chen  wrote:
>> >
>> > > Hi Sergio,
>> > >
>> > > > I don't want this to minimize the main feature I want to deploy as I
>> > > think the
>> > > message size limit is not as important as the limiting the amount of
>> > > batches.
>> > >
>> > > Agree! Let's focus on the feature of limiting the batch amounts.
>> > >
>> > > One more comment to the KIP:
>> > > 1. Could you put the new parameter description into the KIP proposed 
>> > > change
>> > > section? That would make it much clear.
>> > >
>> > >
>> > > Thank you.
>> > > Luke
>> > >
>> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
>> > >  wrote:
>> > >
>> > > > hey Luke,
>> > > >
>> > > > I am interested in expanding the KIP scope but I am a bit concerned 
>> > > > this
>> > > > could create a lot of noise and confusion as they look like very 
>> > > > similar
>> > > > parameters, I agree this is a small change, so I think if I do it
>> > > properly
>> > > > it should not be a problem at all, I just will need a couple more of 
>> > > > days
>> > > > as I want to create the proper tests as well.
>> > > >
>> > > > I have a doubt about editing the KIP, I mean should I add this as a new
>> > > > feature as well?, should I describe this as a side effect finding? I
>> > > don't
>> > > > want this to minimize the main feature I want to deploy as I think the
>> > > > message size limit is not as important as the limiting the amount of
>> > > > batches.
>> > > >
>> > > > It is up to you, if you guys consider we must add this in this KIP 
>> > > > then I
>> > > > will be happy to do it. 
>> > > >
>> > > > Best regards.
>> > > > Sergio Troiano
>> > > >
>> > > > On Mon, 7 Mar 2022 at 02:01, Luke Chen  wrote:
>> > > >
>> > > > > Hi Sergio,
>> > > > >
>> > > > > Thanks for your explanation.
>> > > > > Make sense to me.
>> > > > >
>> > > > > > Only interesting thing that I have just found is *max-message-size
>> > > *is
>> > > > > not
>> > > > > used while dump logs are requested, instead it is used by dumpIndex
>> > > > >
>> > > > > Are you interested in expanding the scope of this KIP to include the
>> > > > > *max-message-size* in dumping logs?
>> > > > > I think it's good because it will also be a small change, and no need
>> > > to
>> > > > go
>> > > > > through another KIP discussing/voting process. But I'm fine if you 
>> > > > > want
>> > > > to
>> > > > > keep this KIP as is, and create another JIRA ticket for future work.
>> > > > >
>> > > > > Thank you.
>> > > > > Luke
>> > > > >
>> > > > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
>> > > > >  wrote:
>> > > > >
>> > > > > > hey Luke,
>> > > > > >
>> > > > > > Let me answer them:
>> > > > > > 1. If the *max-batches-size* is too small that results in no 
>> > > > > > records
>> > > > > > output, will we output any information to the user?
>> > > > > >
>> > > > > > If the  *max-batches-size*is even smaller than the first batch then
>> > > > there
>> > > > > > won't be any output, this is handled by 

Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-08 Thread Sergio Daniel Troiano
Hi Michael,

Yes, it's a good idea and I considered it, the main problem is the
*FileRecords* class does not accept number of batches as a parameter, it
accepts bytes instead, so if we want to do so either we redesign a core
class or we create a new one.
One of the pretty things (I consider) about this change which will bring a
huge benefit is the change in the code is pretty small and it's on the
final script, it does not require any deep change in a core library.

An alternative which requires a big change as well without touching the
FileRecords class would be accept number of batches as parameter, then call
the FileRecords.slice() with a small value (bytes) count the batches, see
if we can satisfy the number of batches, if not then we call it again and
again until we reach the amount of batches. This will require a lot of code
as well

So long story short, the proposal change is quite small, it uses the
current classes and has a big benefit.

Maybe in the future we could consider the FileRecords class to support
getting the amount of batches as parameters and we encapsulate this logic
in the proper class (FileRecords)
What do you think?

Thanks
Sergio
Thanks
Sergio

On Tue, 8 Mar 2022 at 18:32, Mickael Maison 
wrote:

> Hi Sergio,
>
> Thanks for the KIP. Instead of specifying the size in bytes, have you
> considered specifying it in terms of number of batches? I think it's a
> bit more user friendly than a size in raw bytes.
> For example: --num-batches: The number of batches to read from the log
> segment.
>
> Thanks,
> Mickael
>
>
> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano
>  wrote:
> >
> > Hi Luke,
> >
> > Make sense, done!
> >
> > Thank you.
> > Sergio Troiano
> >
> > On Tue, 8 Mar 2022 at 03:02, Luke Chen  wrote:
> >
> > > Hi Sergio,
> > >
> > > > I don't want this to minimize the main feature I want to deploy as I
> > > think the
> > > message size limit is not as important as the limiting the amount of
> > > batches.
> > >
> > > Agree! Let's focus on the feature of limiting the batch amounts.
> > >
> > > One more comment to the KIP:
> > > 1. Could you put the new parameter description into the KIP proposed
> change
> > > section? That would make it much clear.
> > >
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
> > >  wrote:
> > >
> > > > hey Luke,
> > > >
> > > > I am interested in expanding the KIP scope but I am a bit concerned
> this
> > > > could create a lot of noise and confusion as they look like very
> similar
> > > > parameters, I agree this is a small change, so I think if I do it
> > > properly
> > > > it should not be a problem at all, I just will need a couple more of
> days
> > > > as I want to create the proper tests as well.
> > > >
> > > > I have a doubt about editing the KIP, I mean should I add this as a
> new
> > > > feature as well?, should I describe this as a side effect finding? I
> > > don't
> > > > want this to minimize the main feature I want to deploy as I think
> the
> > > > message size limit is not as important as the limiting the amount of
> > > > batches.
> > > >
> > > > It is up to you, if you guys consider we must add this in this KIP
> then I
> > > > will be happy to do it. 
> > > >
> > > > Best regards.
> > > > Sergio Troiano
> > > >
> > > > On Mon, 7 Mar 2022 at 02:01, Luke Chen  wrote:
> > > >
> > > > > Hi Sergio,
> > > > >
> > > > > Thanks for your explanation.
> > > > > Make sense to me.
> > > > >
> > > > > > Only interesting thing that I have just found is
> *max-message-size
> > > *is
> > > > > not
> > > > > used while dump logs are requested, instead it is used by dumpIndex
> > > > >
> > > > > Are you interested in expanding the scope of this KIP to include
> the
> > > > > *max-message-size* in dumping logs?
> > > > > I think it's good because it will also be a small change, and no
> need
> > > to
> > > > go
> > > > > through another KIP discussing/voting process. But I'm fine if you
> want
> > > > to
> > > > > keep this KIP as is, and create another JIRA ticket for future
> work.
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
> > > > >  wrote:
> > > > >
> > > > > > hey Luke,
> > > > > >
> > > > > > Let me answer them:
> > > > > > 1. If the *max-batches-size* is too small that results in no
> records
> > > > > > output, will we output any information to the user?
> > > > > >
> > > > > > If the  *max-batches-size*is even smaller than the first batch
> then
> > > > there
> > > > > > won't be any output, this is handled by FileRecords class, I
> think
> > > this
> > > > > is
> > > > > > correct as this is the expected behaviour.
> > > > > >
> > > > > > 2. After your explanation, I guess the use of *max-batches-size*
> > > won't
> > > > > > conflict with *max-message-size*, right?
> > > > > >
> > > > > > Only interesting thing that I have just found is
> *max-message-size
> > > *is
> > > > > not
> > > > > > used while 

Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-08 Thread Mickael Maison
Hi Sergio,

Thanks for the KIP. Instead of specifying the size in bytes, have you
considered specifying it in terms of number of batches? I think it's a
bit more user friendly than a size in raw bytes.
For example: --num-batches: The number of batches to read from the log segment.

Thanks,
Mickael


On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano
 wrote:
>
> Hi Luke,
>
> Make sense, done!
>
> Thank you.
> Sergio Troiano
>
> On Tue, 8 Mar 2022 at 03:02, Luke Chen  wrote:
>
> > Hi Sergio,
> >
> > > I don't want this to minimize the main feature I want to deploy as I
> > think the
> > message size limit is not as important as the limiting the amount of
> > batches.
> >
> > Agree! Let's focus on the feature of limiting the batch amounts.
> >
> > One more comment to the KIP:
> > 1. Could you put the new parameter description into the KIP proposed change
> > section? That would make it much clear.
> >
> >
> > Thank you.
> > Luke
> >
> > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
> >  wrote:
> >
> > > hey Luke,
> > >
> > > I am interested in expanding the KIP scope but I am a bit concerned this
> > > could create a lot of noise and confusion as they look like very similar
> > > parameters, I agree this is a small change, so I think if I do it
> > properly
> > > it should not be a problem at all, I just will need a couple more of days
> > > as I want to create the proper tests as well.
> > >
> > > I have a doubt about editing the KIP, I mean should I add this as a new
> > > feature as well?, should I describe this as a side effect finding? I
> > don't
> > > want this to minimize the main feature I want to deploy as I think the
> > > message size limit is not as important as the limiting the amount of
> > > batches.
> > >
> > > It is up to you, if you guys consider we must add this in this KIP then I
> > > will be happy to do it. 
> > >
> > > Best regards.
> > > Sergio Troiano
> > >
> > > On Mon, 7 Mar 2022 at 02:01, Luke Chen  wrote:
> > >
> > > > Hi Sergio,
> > > >
> > > > Thanks for your explanation.
> > > > Make sense to me.
> > > >
> > > > > Only interesting thing that I have just found is *max-message-size
> > *is
> > > > not
> > > > used while dump logs are requested, instead it is used by dumpIndex
> > > >
> > > > Are you interested in expanding the scope of this KIP to include the
> > > > *max-message-size* in dumping logs?
> > > > I think it's good because it will also be a small change, and no need
> > to
> > > go
> > > > through another KIP discussing/voting process. But I'm fine if you want
> > > to
> > > > keep this KIP as is, and create another JIRA ticket for future work.
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
> > > >  wrote:
> > > >
> > > > > hey Luke,
> > > > >
> > > > > Let me answer them:
> > > > > 1. If the *max-batches-size* is too small that results in no records
> > > > > output, will we output any information to the user?
> > > > >
> > > > > If the  *max-batches-size*is even smaller than the first batch then
> > > there
> > > > > won't be any output, this is handled by FileRecords class, I think
> > this
> > > > is
> > > > > correct as this is the expected behaviour.
> > > > >
> > > > > 2. After your explanation, I guess the use of *max-batches-size*
> > won't
> > > > > conflict with *max-message-size*, right?
> > > > >
> > > > > Only interesting thing that I have just found is *max-message-size
> > *is
> > > > not
> > > > > used while dump logs are requested, instead it is used by dumpIndex
> > > > > <
> > > > >
> > > >
> > >
> > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
> > > > > >
> > > > > so,
> > > > > this feature is not working for dumping logs, even though I checked
> > if
> > > > > there is a unit test for this and there is not any. Maybe we could
> > > > create a
> > > > > ticket for this?
> > > > >
> > > > > Regards.
> > > > >
> > > > >
> > > > > On Sat, 5 Mar 2022 at 10:36, Luke Chen  wrote:
> > > > >
> > > > > > Hi Sergio,
> > > > > >
> > > > > > Thanks for the explanation! Very clear!
> > > > > > I think we should put this example and explanation into KIP.
> > > > > >
> > > > > > Other comments:
> > > > > > 1. If the *max-batches-size* is too small that results in no
> > records
> > > > > > output, will we output any information to the user?
> > > > > > 2. After your explanation, I guess the use of *max-batches-size*
> > > won't
> > > > > > conflict with *max-message-size*, right?
> > > > > > That is, user can set the 2 arguments at the same time. Is that
> > > > correct?
> > > > > >
> > > > > > Thank you.
> > > > > > Luke
> > > > > >
> > > > > > Thank you.
> > > > > > Luke
> > > > > >
> > > > > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
> > > > > >  wrote:
> > > > > >
> > > > > > > hey Luke,
> > > > > > >
> > > > > > > thanks for the interest, it is a good question, please let me
> > > explain
> > > > > > you:
> > > > > 

Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-07 Thread Sergio Daniel Troiano
Hi Luke,

Make sense, done!

Thank you.
Sergio Troiano

On Tue, 8 Mar 2022 at 03:02, Luke Chen  wrote:

> Hi Sergio,
>
> > I don't want this to minimize the main feature I want to deploy as I
> think the
> message size limit is not as important as the limiting the amount of
> batches.
>
> Agree! Let's focus on the feature of limiting the batch amounts.
>
> One more comment to the KIP:
> 1. Could you put the new parameter description into the KIP proposed change
> section? That would make it much clear.
>
>
> Thank you.
> Luke
>
> On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
>  wrote:
>
> > hey Luke,
> >
> > I am interested in expanding the KIP scope but I am a bit concerned this
> > could create a lot of noise and confusion as they look like very similar
> > parameters, I agree this is a small change, so I think if I do it
> properly
> > it should not be a problem at all, I just will need a couple more of days
> > as I want to create the proper tests as well.
> >
> > I have a doubt about editing the KIP, I mean should I add this as a new
> > feature as well?, should I describe this as a side effect finding? I
> don't
> > want this to minimize the main feature I want to deploy as I think the
> > message size limit is not as important as the limiting the amount of
> > batches.
> >
> > It is up to you, if you guys consider we must add this in this KIP then I
> > will be happy to do it. 
> >
> > Best regards.
> > Sergio Troiano
> >
> > On Mon, 7 Mar 2022 at 02:01, Luke Chen  wrote:
> >
> > > Hi Sergio,
> > >
> > > Thanks for your explanation.
> > > Make sense to me.
> > >
> > > > Only interesting thing that I have just found is *max-message-size
> *is
> > > not
> > > used while dump logs are requested, instead it is used by dumpIndex
> > >
> > > Are you interested in expanding the scope of this KIP to include the
> > > *max-message-size* in dumping logs?
> > > I think it's good because it will also be a small change, and no need
> to
> > go
> > > through another KIP discussing/voting process. But I'm fine if you want
> > to
> > > keep this KIP as is, and create another JIRA ticket for future work.
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
> > >  wrote:
> > >
> > > > hey Luke,
> > > >
> > > > Let me answer them:
> > > > 1. If the *max-batches-size* is too small that results in no records
> > > > output, will we output any information to the user?
> > > >
> > > > If the  *max-batches-size*is even smaller than the first batch then
> > there
> > > > won't be any output, this is handled by FileRecords class, I think
> this
> > > is
> > > > correct as this is the expected behaviour.
> > > >
> > > > 2. After your explanation, I guess the use of *max-batches-size*
> won't
> > > > conflict with *max-message-size*, right?
> > > >
> > > > Only interesting thing that I have just found is *max-message-size
> *is
> > > not
> > > > used while dump logs are requested, instead it is used by dumpIndex
> > > > <
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
> > > > >
> > > > so,
> > > > this feature is not working for dumping logs, even though I checked
> if
> > > > there is a unit test for this and there is not any. Maybe we could
> > > create a
> > > > ticket for this?
> > > >
> > > > Regards.
> > > >
> > > >
> > > > On Sat, 5 Mar 2022 at 10:36, Luke Chen  wrote:
> > > >
> > > > > Hi Sergio,
> > > > >
> > > > > Thanks for the explanation! Very clear!
> > > > > I think we should put this example and explanation into KIP.
> > > > >
> > > > > Other comments:
> > > > > 1. If the *max-batches-size* is too small that results in no
> records
> > > > > output, will we output any information to the user?
> > > > > 2. After your explanation, I guess the use of *max-batches-size*
> > won't
> > > > > conflict with *max-message-size*, right?
> > > > > That is, user can set the 2 arguments at the same time. Is that
> > > correct?
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
> > > > >  wrote:
> > > > >
> > > > > > hey Luke,
> > > > > >
> > > > > > thanks for the interest, it is a good question, please let me
> > explain
> > > > > you:
> > > > > >
> > > > > > *max-message-size *a filter for the size of each batch, so for
> > > example
> > > > if
> > > > > > Iset --max-message-size 1000 bytes and my segment log has 300
> > > batches,
> > > > > 150
> > > > > > of them has a size of 500 bytes  and the other 150 has a size of
> > 2000
> > > > > bytes
> > > > > > then the script will skip the las 150 ones as each batch is
> heavier
> > > > than
> > > > > > the limit.
> > > > > >
> > > > > > In the other hand following the same example above with
> > > > *max-batches-size
> > > > > > *set
> > > > > > to 1000 bytes it will only print out the first 2 batches (500
> bytes
> > > > each)
> > > > 

Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-07 Thread Luke Chen
Hi Sergio,

> I don't want this to minimize the main feature I want to deploy as I
think the
message size limit is not as important as the limiting the amount of
batches.

Agree! Let's focus on the feature of limiting the batch amounts.

One more comment to the KIP:
1. Could you put the new parameter description into the KIP proposed change
section? That would make it much clear.


Thank you.
Luke

On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
 wrote:

> hey Luke,
>
> I am interested in expanding the KIP scope but I am a bit concerned this
> could create a lot of noise and confusion as they look like very similar
> parameters, I agree this is a small change, so I think if I do it properly
> it should not be a problem at all, I just will need a couple more of days
> as I want to create the proper tests as well.
>
> I have a doubt about editing the KIP, I mean should I add this as a new
> feature as well?, should I describe this as a side effect finding? I don't
> want this to minimize the main feature I want to deploy as I think the
> message size limit is not as important as the limiting the amount of
> batches.
>
> It is up to you, if you guys consider we must add this in this KIP then I
> will be happy to do it. 
>
> Best regards.
> Sergio Troiano
>
> On Mon, 7 Mar 2022 at 02:01, Luke Chen  wrote:
>
> > Hi Sergio,
> >
> > Thanks for your explanation.
> > Make sense to me.
> >
> > > Only interesting thing that I have just found is *max-message-size *is
> > not
> > used while dump logs are requested, instead it is used by dumpIndex
> >
> > Are you interested in expanding the scope of this KIP to include the
> > *max-message-size* in dumping logs?
> > I think it's good because it will also be a small change, and no need to
> go
> > through another KIP discussing/voting process. But I'm fine if you want
> to
> > keep this KIP as is, and create another JIRA ticket for future work.
> >
> > Thank you.
> > Luke
> >
> > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
> >  wrote:
> >
> > > hey Luke,
> > >
> > > Let me answer them:
> > > 1. If the *max-batches-size* is too small that results in no records
> > > output, will we output any information to the user?
> > >
> > > If the  *max-batches-size*is even smaller than the first batch then
> there
> > > won't be any output, this is handled by FileRecords class, I think this
> > is
> > > correct as this is the expected behaviour.
> > >
> > > 2. After your explanation, I guess the use of *max-batches-size* won't
> > > conflict with *max-message-size*, right?
> > >
> > > Only interesting thing that I have just found is *max-message-size *is
> > not
> > > used while dump logs are requested, instead it is used by dumpIndex
> > > <
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
> > > >
> > > so,
> > > this feature is not working for dumping logs, even though I checked if
> > > there is a unit test for this and there is not any. Maybe we could
> > create a
> > > ticket for this?
> > >
> > > Regards.
> > >
> > >
> > > On Sat, 5 Mar 2022 at 10:36, Luke Chen  wrote:
> > >
> > > > Hi Sergio,
> > > >
> > > > Thanks for the explanation! Very clear!
> > > > I think we should put this example and explanation into KIP.
> > > >
> > > > Other comments:
> > > > 1. If the *max-batches-size* is too small that results in no records
> > > > output, will we output any information to the user?
> > > > 2. After your explanation, I guess the use of *max-batches-size*
> won't
> > > > conflict with *max-message-size*, right?
> > > > That is, user can set the 2 arguments at the same time. Is that
> > correct?
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
> > > >  wrote:
> > > >
> > > > > hey Luke,
> > > > >
> > > > > thanks for the interest, it is a good question, please let me
> explain
> > > > you:
> > > > >
> > > > > *max-message-size *a filter for the size of each batch, so for
> > example
> > > if
> > > > > Iset --max-message-size 1000 bytes and my segment log has 300
> > batches,
> > > > 150
> > > > > of them has a size of 500 bytes  and the other 150 has a size of
> 2000
> > > > bytes
> > > > > then the script will skip the las 150 ones as each batch is heavier
> > > than
> > > > > the limit.
> > > > >
> > > > > In the other hand following the same example above with
> > > *max-batches-size
> > > > > *set
> > > > > to 1000 bytes it will only print out the first 2 batches (500 bytes
> > > each)
> > > > > and stop, This will avoid reading the whole file
> > > > >
> > > > >
> > > > > Also if all of them are smaller than 1000 bytes it will end up
> > printing
> > > > out
> > > > > all the batches.
> > > > > The idea of my change is to limit the *amount* of batches no matter
> > > their
> > > > > size.
> > > > >
> > > > > I hope this reply helps.
> > > > > Best regards.
> > > > >
> > > > > On Sat, 5 Mar 2022 at 

Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-07 Thread Sergio Daniel Troiano
hey Luke,

I am interested in expanding the KIP scope but I am a bit concerned this
could create a lot of noise and confusion as they look like very similar
parameters, I agree this is a small change, so I think if I do it properly
it should not be a problem at all, I just will need a couple more of days
as I want to create the proper tests as well.

I have a doubt about editing the KIP, I mean should I add this as a new
feature as well?, should I describe this as a side effect finding? I don't
want this to minimize the main feature I want to deploy as I think the
message size limit is not as important as the limiting the amount of
batches.

It is up to you, if you guys consider we must add this in this KIP then I
will be happy to do it. 

Best regards.
Sergio Troiano

On Mon, 7 Mar 2022 at 02:01, Luke Chen  wrote:

> Hi Sergio,
>
> Thanks for your explanation.
> Make sense to me.
>
> > Only interesting thing that I have just found is *max-message-size *is
> not
> used while dump logs are requested, instead it is used by dumpIndex
>
> Are you interested in expanding the scope of this KIP to include the
> *max-message-size* in dumping logs?
> I think it's good because it will also be a small change, and no need to go
> through another KIP discussing/voting process. But I'm fine if you want to
> keep this KIP as is, and create another JIRA ticket for future work.
>
> Thank you.
> Luke
>
> On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
>  wrote:
>
> > hey Luke,
> >
> > Let me answer them:
> > 1. If the *max-batches-size* is too small that results in no records
> > output, will we output any information to the user?
> >
> > If the  *max-batches-size*is even smaller than the first batch then there
> > won't be any output, this is handled by FileRecords class, I think this
> is
> > correct as this is the expected behaviour.
> >
> > 2. After your explanation, I guess the use of *max-batches-size* won't
> > conflict with *max-message-size*, right?
> >
> > Only interesting thing that I have just found is *max-message-size *is
> not
> > used while dump logs are requested, instead it is used by dumpIndex
> > <
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
> > >
> > so,
> > this feature is not working for dumping logs, even though I checked if
> > there is a unit test for this and there is not any. Maybe we could
> create a
> > ticket for this?
> >
> > Regards.
> >
> >
> > On Sat, 5 Mar 2022 at 10:36, Luke Chen  wrote:
> >
> > > Hi Sergio,
> > >
> > > Thanks for the explanation! Very clear!
> > > I think we should put this example and explanation into KIP.
> > >
> > > Other comments:
> > > 1. If the *max-batches-size* is too small that results in no records
> > > output, will we output any information to the user?
> > > 2. After your explanation, I guess the use of *max-batches-size* won't
> > > conflict with *max-message-size*, right?
> > > That is, user can set the 2 arguments at the same time. Is that
> correct?
> > >
> > > Thank you.
> > > Luke
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
> > >  wrote:
> > >
> > > > hey Luke,
> > > >
> > > > thanks for the interest, it is a good question, please let me explain
> > > you:
> > > >
> > > > *max-message-size *a filter for the size of each batch, so for
> example
> > if
> > > > Iset --max-message-size 1000 bytes and my segment log has 300
> batches,
> > > 150
> > > > of them has a size of 500 bytes  and the other 150 has a size of 2000
> > > bytes
> > > > then the script will skip the las 150 ones as each batch is heavier
> > than
> > > > the limit.
> > > >
> > > > In the other hand following the same example above with
> > *max-batches-size
> > > > *set
> > > > to 1000 bytes it will only print out the first 2 batches (500 bytes
> > each)
> > > > and stop, This will avoid reading the whole file
> > > >
> > > >
> > > > Also if all of them are smaller than 1000 bytes it will end up
> printing
> > > out
> > > > all the batches.
> > > > The idea of my change is to limit the *amount* of batches no matter
> > their
> > > > size.
> > > >
> > > > I hope this reply helps.
> > > > Best regards.
> > > >
> > > > On Sat, 5 Mar 2022 at 08:00, Luke Chen  wrote:
> > > >
> > > > > Hi Sergio,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > One question:
> > > > > I saw there's a `max-message-size` argument that seems to do the
> same
> > > > thing
> > > > > as you want.
> > > > > Could you help explain what's the difference between
> > `max-message-size`
> > > > and
> > > > > `max-batches-size`?
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True 
> > > wrote:
> > > > >
> > > > > > Hi Sergio,
> > > > > >
> > > > > > Thanks for the KIP. I don't know anything about the log segment
> > > > > internals,
> > > > > > but the logic and implementation seem sound.
> > > > > >
> > > > > > Three questions:
> > 

Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-06 Thread Luke Chen
Hi Sergio,

Thanks for your explanation.
Make sense to me.

> Only interesting thing that I have just found is *max-message-size *is not
used while dump logs are requested, instead it is used by dumpIndex

Are you interested in expanding the scope of this KIP to include the
*max-message-size* in dumping logs?
I think it's good because it will also be a small change, and no need to go
through another KIP discussing/voting process. But I'm fine if you want to
keep this KIP as is, and create another JIRA ticket for future work.

Thank you.
Luke

On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
 wrote:

> hey Luke,
>
> Let me answer them:
> 1. If the *max-batches-size* is too small that results in no records
> output, will we output any information to the user?
>
> If the  *max-batches-size*is even smaller than the first batch then there
> won't be any output, this is handled by FileRecords class, I think this is
> correct as this is the expected behaviour.
>
> 2. After your explanation, I guess the use of *max-batches-size* won't
> conflict with *max-message-size*, right?
>
> Only interesting thing that I have just found is *max-message-size *is not
> used while dump logs are requested, instead it is used by dumpIndex
> <
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
> >
> so,
> this feature is not working for dumping logs, even though I checked if
> there is a unit test for this and there is not any. Maybe we could create a
> ticket for this?
>
> Regards.
>
>
> On Sat, 5 Mar 2022 at 10:36, Luke Chen  wrote:
>
> > Hi Sergio,
> >
> > Thanks for the explanation! Very clear!
> > I think we should put this example and explanation into KIP.
> >
> > Other comments:
> > 1. If the *max-batches-size* is too small that results in no records
> > output, will we output any information to the user?
> > 2. After your explanation, I guess the use of *max-batches-size* won't
> > conflict with *max-message-size*, right?
> > That is, user can set the 2 arguments at the same time. Is that correct?
> >
> > Thank you.
> > Luke
> >
> > Thank you.
> > Luke
> >
> > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
> >  wrote:
> >
> > > hey Luke,
> > >
> > > thanks for the interest, it is a good question, please let me explain
> > you:
> > >
> > > *max-message-size *a filter for the size of each batch, so for example
> if
> > > Iset --max-message-size 1000 bytes and my segment log has 300 batches,
> > 150
> > > of them has a size of 500 bytes  and the other 150 has a size of 2000
> > bytes
> > > then the script will skip the las 150 ones as each batch is heavier
> than
> > > the limit.
> > >
> > > In the other hand following the same example above with
> *max-batches-size
> > > *set
> > > to 1000 bytes it will only print out the first 2 batches (500 bytes
> each)
> > > and stop, This will avoid reading the whole file
> > >
> > >
> > > Also if all of them are smaller than 1000 bytes it will end up printing
> > out
> > > all the batches.
> > > The idea of my change is to limit the *amount* of batches no matter
> their
> > > size.
> > >
> > > I hope this reply helps.
> > > Best regards.
> > >
> > > On Sat, 5 Mar 2022 at 08:00, Luke Chen  wrote:
> > >
> > > > Hi Sergio,
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > One question:
> > > > I saw there's a `max-message-size` argument that seems to do the same
> > > thing
> > > > as you want.
> > > > Could you help explain what's the difference between
> `max-message-size`
> > > and
> > > > `max-batches-size`?
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True 
> > wrote:
> > > >
> > > > > Hi Sergio,
> > > > >
> > > > > Thanks for the KIP. I don't know anything about the log segment
> > > > internals,
> > > > > but the logic and implementation seem sound.
> > > > >
> > > > > Three questions:
> > > > >  1. Since the --max-batches-size unit is bytes, does it matter if
> > that
> > > > > size doesn't align to a record boundary?
> > > > >  2. Can you add a check to make sure that --max-batches-size
> doesn't
> > > > allow
> > > > > the user to pass in a negative number?
> > > > >  3. Can you add/update any unit tests related to the
> DumpLogSegments
> > > > > arguments?
> > > > > Thanks,
> > > > > Kirk
> > > > >
> > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-06 Thread Sergio Daniel Troiano
hey Luke,

Let me answer them:
1. If the *max-batches-size* is too small that results in no records
output, will we output any information to the user?

If the  *max-batches-size*is even smaller than the first batch then there
won't be any output, this is handled by FileRecords class, I think this is
correct as this is the expected behaviour.

2. After your explanation, I guess the use of *max-batches-size* won't
conflict with *max-message-size*, right?

Only interesting thing that I have just found is *max-message-size *is not
used while dump logs are requested, instead it is used by dumpIndex

so,
this feature is not working for dumping logs, even though I checked if
there is a unit test for this and there is not any. Maybe we could create a
ticket for this?

Regards.


On Sat, 5 Mar 2022 at 10:36, Luke Chen  wrote:

> Hi Sergio,
>
> Thanks for the explanation! Very clear!
> I think we should put this example and explanation into KIP.
>
> Other comments:
> 1. If the *max-batches-size* is too small that results in no records
> output, will we output any information to the user?
> 2. After your explanation, I guess the use of *max-batches-size* won't
> conflict with *max-message-size*, right?
> That is, user can set the 2 arguments at the same time. Is that correct?
>
> Thank you.
> Luke
>
> Thank you.
> Luke
>
> On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
>  wrote:
>
> > hey Luke,
> >
> > thanks for the interest, it is a good question, please let me explain
> you:
> >
> > *max-message-size *a filter for the size of each batch, so for example if
> > Iset --max-message-size 1000 bytes and my segment log has 300 batches,
> 150
> > of them has a size of 500 bytes  and the other 150 has a size of 2000
> bytes
> > then the script will skip the las 150 ones as each batch is heavier than
> > the limit.
> >
> > In the other hand following the same example above with *max-batches-size
> > *set
> > to 1000 bytes it will only print out the first 2 batches (500 bytes each)
> > and stop, This will avoid reading the whole file
> >
> >
> > Also if all of them are smaller than 1000 bytes it will end up printing
> out
> > all the batches.
> > The idea of my change is to limit the *amount* of batches no matter their
> > size.
> >
> > I hope this reply helps.
> > Best regards.
> >
> > On Sat, 5 Mar 2022 at 08:00, Luke Chen  wrote:
> >
> > > Hi Sergio,
> > >
> > > Thanks for the KIP!
> > >
> > > One question:
> > > I saw there's a `max-message-size` argument that seems to do the same
> > thing
> > > as you want.
> > > Could you help explain what's the difference between `max-message-size`
> > and
> > > `max-batches-size`?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True 
> wrote:
> > >
> > > > Hi Sergio,
> > > >
> > > > Thanks for the KIP. I don't know anything about the log segment
> > > internals,
> > > > but the logic and implementation seem sound.
> > > >
> > > > Three questions:
> > > >  1. Since the --max-batches-size unit is bytes, does it matter if
> that
> > > > size doesn't align to a record boundary?
> > > >  2. Can you add a check to make sure that --max-batches-size doesn't
> > > allow
> > > > the user to pass in a negative number?
> > > >  3. Can you add/update any unit tests related to the DumpLogSegments
> > > > arguments?
> > > > Thanks,
> > > > Kirk
> > > >
> > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-06 Thread Sergio Daniel Troiano
hey Kirk,

Thanks for the questions, please let me answer them:

1. This is handled by the *FileRecords class, *now the open uses the slice

 which takes care of the end

bytes . There will be some trailing bytes which is totally fine, you can
see here it was expected even here

before my proposal, I added an extra conditional to avoid printing the
warning in the main script here

.

2. *FileRecords *class already checks the value
,
we do the same, similar case when we set the maxMessageSizeOpt



3. I have just added the unit test 




On Fri, 4 Mar 2022 at 20:21, Kirk True  wrote:

> Hi Sergio,
>
> Thanks for the KIP. I don't know anything about the log segment internals,
> but the logic and implementation seem sound.
>
> Three questions:
>  1. Since the --max-batches-size unit is bytes, does it matter if that
> size doesn't align to a record boundary?
>  2. Can you add a check to make sure that --max-batches-size doesn't allow
> the user to pass in a negative number?
>  3. Can you add/update any unit tests related to the DumpLogSegments
> arguments?
> Thanks,
> Kirk
>
> On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> >
>


Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-05 Thread David Jacot
Hi Sergio,

I wonder if « max-bytes » would be a better name than « max-batches-size ».
The intend is more explicit. What do you think?

Best,
David

Le sam. 5 mars 2022 à 10:36, Luke Chen  a écrit :

> Hi Sergio,
>
> Thanks for the explanation! Very clear!
> I think we should put this example and explanation into KIP.
>
> Other comments:
> 1. If the *max-batches-size* is too small that results in no records
> output, will we output any information to the user?
> 2. After your explanation, I guess the use of *max-batches-size* won't
> conflict with *max-message-size*, right?
> That is, user can set the 2 arguments at the same time. Is that correct?
>
> Thank you.
> Luke
>
> Thank you.
> Luke
>
> On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
>  wrote:
>
> > hey Luke,
> >
> > thanks for the interest, it is a good question, please let me explain
> you:
> >
> > *max-message-size *a filter for the size of each batch, so for example if
> > Iset --max-message-size 1000 bytes and my segment log has 300 batches,
> 150
> > of them has a size of 500 bytes  and the other 150 has a size of 2000
> bytes
> > then the script will skip the las 150 ones as each batch is heavier than
> > the limit.
> >
> > In the other hand following the same example above with *max-batches-size
> > *set
> > to 1000 bytes it will only print out the first 2 batches (500 bytes each)
> > and stop, This will avoid reading the whole file
> >
> >
> > Also if all of them are smaller than 1000 bytes it will end up printing
> out
> > all the batches.
> > The idea of my change is to limit the *amount* of batches no matter their
> > size.
> >
> > I hope this reply helps.
> > Best regards.
> >
> > On Sat, 5 Mar 2022 at 08:00, Luke Chen  wrote:
> >
> > > Hi Sergio,
> > >
> > > Thanks for the KIP!
> > >
> > > One question:
> > > I saw there's a `max-message-size` argument that seems to do the same
> > thing
> > > as you want.
> > > Could you help explain what's the difference between `max-message-size`
> > and
> > > `max-batches-size`?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True 
> wrote:
> > >
> > > > Hi Sergio,
> > > >
> > > > Thanks for the KIP. I don't know anything about the log segment
> > > internals,
> > > > but the logic and implementation seem sound.
> > > >
> > > > Three questions:
> > > >  1. Since the --max-batches-size unit is bytes, does it matter if
> that
> > > > size doesn't align to a record boundary?
> > > >  2. Can you add a check to make sure that --max-batches-size doesn't
> > > allow
> > > > the user to pass in a negative number?
> > > >  3. Can you add/update any unit tests related to the DumpLogSegments
> > > > arguments?
> > > > Thanks,
> > > > Kirk
> > > >
> > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-05 Thread Luke Chen
Hi Sergio,

Thanks for the explanation! Very clear!
I think we should put this example and explanation into KIP.

Other comments:
1. If the *max-batches-size* is too small that results in no records
output, will we output any information to the user?
2. After your explanation, I guess the use of *max-batches-size* won't
conflict with *max-message-size*, right?
That is, user can set the 2 arguments at the same time. Is that correct?

Thank you.
Luke

Thank you.
Luke

On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
 wrote:

> hey Luke,
>
> thanks for the interest, it is a good question, please let me explain you:
>
> *max-message-size *a filter for the size of each batch, so for example if
> Iset --max-message-size 1000 bytes and my segment log has 300 batches, 150
> of them has a size of 500 bytes  and the other 150 has a size of 2000 bytes
> then the script will skip the las 150 ones as each batch is heavier than
> the limit.
>
> In the other hand following the same example above with *max-batches-size
> *set
> to 1000 bytes it will only print out the first 2 batches (500 bytes each)
> and stop, This will avoid reading the whole file
>
>
> Also if all of them are smaller than 1000 bytes it will end up printing out
> all the batches.
> The idea of my change is to limit the *amount* of batches no matter their
> size.
>
> I hope this reply helps.
> Best regards.
>
> On Sat, 5 Mar 2022 at 08:00, Luke Chen  wrote:
>
> > Hi Sergio,
> >
> > Thanks for the KIP!
> >
> > One question:
> > I saw there's a `max-message-size` argument that seems to do the same
> thing
> > as you want.
> > Could you help explain what's the difference between `max-message-size`
> and
> > `max-batches-size`?
> >
> > Thank you.
> > Luke
> >
> > On Sat, Mar 5, 2022 at 3:21 AM Kirk True  wrote:
> >
> > > Hi Sergio,
> > >
> > > Thanks for the KIP. I don't know anything about the log segment
> > internals,
> > > but the logic and implementation seem sound.
> > >
> > > Three questions:
> > >  1. Since the --max-batches-size unit is bytes, does it matter if that
> > > size doesn't align to a record boundary?
> > >  2. Can you add a check to make sure that --max-batches-size doesn't
> > allow
> > > the user to pass in a negative number?
> > >  3. Can you add/update any unit tests related to the DumpLogSegments
> > > arguments?
> > > Thanks,
> > > Kirk
> > >
> > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-05 Thread Sergio Daniel Troiano
hey Luke,

thanks for the interest, it is a good question, please let me explain you:

*max-message-size *a filter for the size of each batch, so for example if
Iset --max-message-size 1000 bytes and my segment log has 300 batches, 150
of them has a size of 500 bytes  and the other 150 has a size of 2000 bytes
then the script will skip the las 150 ones as each batch is heavier than
the limit.

In the other hand following the same example above with *max-batches-size  *set
to 1000 bytes it will only print out the first 2 batches (500 bytes each)
and stop, This will avoid reading the whole file


Also if all of them are smaller than 1000 bytes it will end up printing out
all the batches.
The idea of my change is to limit the *amount* of batches no matter their
size.

I hope this reply helps.
Best regards.

On Sat, 5 Mar 2022 at 08:00, Luke Chen  wrote:

> Hi Sergio,
>
> Thanks for the KIP!
>
> One question:
> I saw there's a `max-message-size` argument that seems to do the same thing
> as you want.
> Could you help explain what's the difference between `max-message-size` and
> `max-batches-size`?
>
> Thank you.
> Luke
>
> On Sat, Mar 5, 2022 at 3:21 AM Kirk True  wrote:
>
> > Hi Sergio,
> >
> > Thanks for the KIP. I don't know anything about the log segment
> internals,
> > but the logic and implementation seem sound.
> >
> > Three questions:
> >  1. Since the --max-batches-size unit is bytes, does it matter if that
> > size doesn't align to a record boundary?
> >  2. Can you add a check to make sure that --max-batches-size doesn't
> allow
> > the user to pass in a negative number?
> >  3. Can you add/update any unit tests related to the DumpLogSegments
> > arguments?
> > Thanks,
> > Kirk
> >
> > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> > >
> >
>


Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-04 Thread Luke Chen
Hi Sergio,

Thanks for the KIP!

One question:
I saw there's a `max-message-size` argument that seems to do the same thing
as you want.
Could you help explain what's the difference between `max-message-size` and
`max-batches-size`?

Thank you.
Luke

On Sat, Mar 5, 2022 at 3:21 AM Kirk True  wrote:

> Hi Sergio,
>
> Thanks for the KIP. I don't know anything about the log segment internals,
> but the logic and implementation seem sound.
>
> Three questions:
>  1. Since the --max-batches-size unit is bytes, does it matter if that
> size doesn't align to a record boundary?
>  2. Can you add a check to make sure that --max-batches-size doesn't allow
> the user to pass in a negative number?
>  3. Can you add/update any unit tests related to the DumpLogSegments
> arguments?
> Thanks,
> Kirk
>
> On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> >
>


Re: [DISCUSS] KIP-824 Allowing dumping segmentlogs limiting the batches in the output

2022-03-04 Thread Kirk True
Hi Sergio,

Thanks for the KIP. I don't know anything about the log segment internals, but 
the logic and implementation seem sound.

Three questions:
 1. Since the --max-batches-size unit is bytes, does it matter if that size 
doesn't align to a record boundary?
 2. Can you add a check to make sure that --max-batches-size doesn't allow the 
user to pass in a negative number?
 3. Can you add/update any unit tests related to the DumpLogSegments arguments?
Thanks,
Kirk

On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
>