Hi David,

I apologize. I missed your suggestion.
By the way I like it and I have applied your suggestion.

About the rejected alternatives I have updated the KIP as well

Best regards
Sergio Troiano

On Fri, 25 Mar 2022 at 06:50, David Jacot <david.ja...@gmail.com> wrote:

> Hi Sergio,
>
> I made a suggestion a few weeks ago about the name about the parameter but
> haven’t got a response for it. Did you consider it?
>
> Do we need to update the rejected alternatives section to mention the
> alternative options discussed in this thread?
>
> Thanks,
> David
>
> Le ven. 25 mars 2022 à 03:44, Luke Chen <show...@gmail.com> a écrit :
>
> > Hi Sergio,
> >
> > Thanks for asking.
> > Since it's been discussed for weeks, you can start the vote anytime.
> >
> > Thank you.
> > Luke
> >
> > On Fri, Mar 25, 2022 at 10:40 AM Sergio Troiano
> > <sergio.troi...@adevinta.com.invalid> wrote:
> >
> > > Hey guys,
> > >
> > >
> > > What is the next step? Who decides when it is time for voting?
> > >
> > >
> > > Thanks!
> > >
> > > Sent from my iPhone
> > >
> > > > On 8 Mar 2022, at 19:57, Sergio Daniel Troiano <
> > > sergio.troi...@adevinta.com> wrote:
> > > >
> > > > 
> > > > Hi Michael,
> > > >
> > > > Yes, it's a good idea and I considered it, the main problem is the
> > > FileRecords class does not accept number of batches as a parameter, it
> > > accepts bytes instead, so if we want to do so either we redesign a core
> > > class or we create a new one.
> > > > One of the pretty things (I consider) about this change which will
> > bring
> > > a huge benefit is the change in the code is pretty small and it's on
> the
> > > final script, it does not require any deep change in a core library.
> > > >
> > > > An alternative which requires a big change as well without touching
> the
> > > FileRecords class would be accept number of batches as parameter, then
> > call
> > > the FileRecords.slice() with a small value (bytes) count the batches,
> see
> > > if we can satisfy the number of batches, if not then we call it again
> and
> > > again until we reach the amount of batches. This will require a lot of
> > code
> > > as well
> > > >
> > > > So long story short, the proposal change is quite small, it uses the
> > > current classes and has a big benefit.
> > > >
> > > > Maybe in the future we could consider the FileRecords class to
> support
> > > getting the amount of batches as parameters and we encapsulate this
> logic
> > > in the proper class (FileRecords)
> > > > What do you think?
> > > >
> > > > Thanks
> > > > Sergio
> > > > Thanks
> > > > Sergio
> > > >
> > > >> On Tue, 8 Mar 2022 at 18:32, Mickael Maison <
> mickael.mai...@gmail.com
> > >
> > > wrote:
> > > >> Hi Sergio,
> > > >>
> > > >> Thanks for the KIP. Instead of specifying the size in bytes, have
> you
> > > >> considered specifying it in terms of number of batches? I think
> it's a
> > > >> bit more user friendly than a size in raw bytes.
> > > >> For example: --num-batches: The number of batches to read from the
> log
> > > segment.
> > > >>
> > > >> Thanks,
> > > >> Mickael
> > > >>
> > > >>
> > > >> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano
> > > >> <sergio.troi...@adevinta.com.invalid> wrote:
> > > >> >
> > > >> > Hi Luke,
> > > >> >
> > > >> > Make sense, done!
> > > >> >
> > > >> > Thank you.
> > > >> > Sergio Troiano
> > > >> >
> > > >> > On Tue, 8 Mar 2022 at 03:02, Luke Chen <show...@gmail.com> wrote:
> > > >> >
> > > >> > > Hi Sergio,
> > > >> > >
> > > >> > > > I don't want this to minimize the main feature I want to
> deploy
> > > as I
> > > >> > > think the
> > > >> > > message size limit is not as important as the limiting the
> amount
> > of
> > > >> > > batches.
> > > >> > >
> > > >> > > Agree! Let's focus on the feature of limiting the batch amounts.
> > > >> > >
> > > >> > > One more comment to the KIP:
> > > >> > > 1. Could you put the new parameter description into the KIP
> > > proposed change
> > > >> > > section? That would make it much clear.
> > > >> > >
> > > >> > >
> > > >> > > Thank you.
> > > >> > > Luke
> > > >> > >
> > > >> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
> > > >> > > <sergio.troi...@adevinta.com.invalid> wrote:
> > > >> > >
> > > >> > > > hey Luke,
> > > >> > > >
> > > >> > > > I am interested in expanding the KIP scope but I am a bit
> > > concerned this
> > > >> > > > could create a lot of noise and confusion as they look like
> very
> > > similar
> > > >> > > > parameters, I agree this is a small change, so I think if I do
> > it
> > > >> > > properly
> > > >> > > > it should not be a problem at all, I just will need a couple
> > more
> > > of days
> > > >> > > > as I want to create the proper tests as well.
> > > >> > > >
> > > >> > > > I have a doubt about editing the KIP, I mean should I add this
> > as
> > > a new
> > > >> > > > feature as well?, should I describe this as a side effect
> > > finding? I
> > > >> > > don't
> > > >> > > > want this to minimize the main feature I want to deploy as I
> > > think the
> > > >> > > > message size limit is not as important as the limiting the
> > amount
> > > of
> > > >> > > > batches.
> > > >> > > >
> > > >> > > > It is up to you, if you guys consider we must add this in this
> > > KIP then I
> > > >> > > > will be happy to do it. 😀
> > > >> > > >
> > > >> > > > Best regards.
> > > >> > > > Sergio Troiano
> > > >> > > >
> > > >> > > > On Mon, 7 Mar 2022 at 02:01, Luke Chen <show...@gmail.com>
> > wrote:
> > > >> > > >
> > > >> > > > > Hi Sergio,
> > > >> > > > >
> > > >> > > > > Thanks for your explanation.
> > > >> > > > > Make sense to me.
> > > >> > > > >
> > > >> > > > > > Only interesting thing that I have just found is
> > > *max-message-size
> > > >> > > *is
> > > >> > > > > not
> > > >> > > > > used while dump logs are requested, instead it is used by
> > > dumpIndex
> > > >> > > > >
> > > >> > > > > Are you interested in expanding the scope of this KIP to
> > > include the
> > > >> > > > > *max-message-size* in dumping logs?
> > > >> > > > > I think it's good because it will also be a small change,
> and
> > > no need
> > > >> > > to
> > > >> > > > go
> > > >> > > > > through another KIP discussing/voting process. But I'm fine
> if
> > > you want
> > > >> > > > to
> > > >> > > > > keep this KIP as is, and create another JIRA ticket for
> future
> > > work.
> > > >> > > > >
> > > >> > > > > Thank you.
> > > >> > > > > Luke
> > > >> > > > >
> > > >> > > > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
> > > >> > > > > <sergio.troi...@adevinta.com.invalid> wrote:
> > > >> > > > >
> > > >> > > > > > hey Luke,
> > > >> > > > > >
> > > >> > > > > > Let me answer them:
> > > >> > > > > > 1. If the *max-batches-size* is too small that results in
> no
> > > records
> > > >> > > > > > output, will we output any information to the user?
> > > >> > > > > >
> > > >> > > > > > If the  *max-batches-size*is even smaller than the first
> > > batch then
> > > >> > > > there
> > > >> > > > > > won't be any output, this is handled by FileRecords
> class, I
> > > think
> > > >> > > this
> > > >> > > > > is
> > > >> > > > > > correct as this is the expected behaviour.
> > > >> > > > > >
> > > >> > > > > > 2. After your explanation, I guess the use of
> > > *max-batches-size*
> > > >> > > won't
> > > >> > > > > > conflict with *max-message-size*, right?
> > > >> > > > > >
> > > >> > > > > > Only interesting thing that I have just found is
> > > *max-message-size
> > > >> > > *is
> > > >> > > > > not
> > > >> > > > > > used while dump logs are requested, instead it is used by
> > > dumpIndex
> > > >> > > > > > <
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
> > > >> > > > > > >
> > > >> > > > > > so,
> > > >> > > > > > this feature is not working for dumping logs, even though
> I
> > > checked
> > > >> > > if
> > > >> > > > > > there is a unit test for this and there is not any. Maybe
> we
> > > could
> > > >> > > > > create a
> > > >> > > > > > ticket for this?
> > > >> > > > > >
> > > >> > > > > > Regards.
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Sat, 5 Mar 2022 at 10:36, Luke Chen <show...@gmail.com
> >
> > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Sergio,
> > > >> > > > > > >
> > > >> > > > > > > Thanks for the explanation! Very clear!
> > > >> > > > > > > I think we should put this example and explanation into
> > KIP.
> > > >> > > > > > >
> > > >> > > > > > > Other comments:
> > > >> > > > > > > 1. If the *max-batches-size* is too small that results
> in
> > no
> > > >> > > records
> > > >> > > > > > > output, will we output any information to the user?
> > > >> > > > > > > 2. After your explanation, I guess the use of
> > > *max-batches-size*
> > > >> > > > won't
> > > >> > > > > > > conflict with *max-message-size*, right?
> > > >> > > > > > > That is, user can set the 2 arguments at the same time.
> Is
> > > that
> > > >> > > > > correct?
> > > >> > > > > > >
> > > >> > > > > > > Thank you.
> > > >> > > > > > > Luke
> > > >> > > > > > >
> > > >> > > > > > > Thank you.
> > > >> > > > > > > Luke
> > > >> > > > > > >
> > > >> > > > > > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
> > > >> > > > > > > <sergio.troi...@adevinta.com.invalid> wrote:
> > > >> > > > > > >
> > > >> > > > > > > > hey Luke,
> > > >> > > > > > > >
> > > >> > > > > > > > thanks for the interest, it is a good question, please
> > > let me
> > > >> > > > explain
> > > >> > > > > > > you:
> > > >> > > > > > > >
> > > >> > > > > > > > *max-message-size *a filter for the size of each
> batch,
> > > so for
> > > >> > > > > example
> > > >> > > > > > if
> > > >> > > > > > > > Iset --max-message-size 1000 bytes and my segment log
> > has
> > > 300
> > > >> > > > > batches,
> > > >> > > > > > > 150
> > > >> > > > > > > > of them has a size of 500 bytes  and the other 150
> has a
> > > size of
> > > >> > > > 2000
> > > >> > > > > > > bytes
> > > >> > > > > > > > then the script will skip the las 150 ones as each
> batch
> > > is
> > > >> > > heavier
> > > >> > > > > > than
> > > >> > > > > > > > the limit.
> > > >> > > > > > > >
> > > >> > > > > > > > In the other hand following the same example above
> with
> > > >> > > > > > *max-batches-size
> > > >> > > > > > > > *set
> > > >> > > > > > > > to 1000 bytes it will only print out the first 2
> batches
> > > (500
> > > >> > > bytes
> > > >> > > > > > each)
> > > >> > > > > > > > and stop, This will avoid reading the whole file
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > Also if all of them are smaller than 1000 bytes it
> will
> > > end up
> > > >> > > > > printing
> > > >> > > > > > > out
> > > >> > > > > > > > all the batches.
> > > >> > > > > > > > The idea of my change is to limit the *amount* of
> > batches
> > > no
> > > >> > > matter
> > > >> > > > > > their
> > > >> > > > > > > > size.
> > > >> > > > > > > >
> > > >> > > > > > > > I hope this reply helps.
> > > >> > > > > > > > Best regards.
> > > >> > > > > > > >
> > > >> > > > > > > > On Sat, 5 Mar 2022 at 08:00, Luke Chen <
> > show...@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > Hi Sergio,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks for the KIP!
> > > >> > > > > > > > >
> > > >> > > > > > > > > One question:
> > > >> > > > > > > > > I saw there's a `max-message-size` argument that
> seems
> > > to do
> > > >> > > the
> > > >> > > > > same
> > > >> > > > > > > > thing
> > > >> > > > > > > > > as you want.
> > > >> > > > > > > > > Could you help explain what's the difference between
> > > >> > > > > > `max-message-size`
> > > >> > > > > > > > and
> > > >> > > > > > > > > `max-batches-size`?
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thank you.
> > > >> > > > > > > > > Luke
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True <
> > > >> > > k...@mustardgrain.com>
> > > >> > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Hi Sergio,
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Thanks for the KIP. I don't know anything about
> the
> > > log
> > > >> > > segment
> > > >> > > > > > > > > internals,
> > > >> > > > > > > > > > but the logic and implementation seem sound.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Three questions:
> > > >> > > > > > > > > >  1. Since the --max-batches-size unit is bytes,
> does
> > > it
> > > >> > > matter
> > > >> > > > if
> > > >> > > > > > > that
> > > >> > > > > > > > > > size doesn't align to a record boundary?
> > > >> > > > > > > > > >  2. Can you add a check to make sure that
> > > --max-batches-size
> > > >> > > > > > doesn't
> > > >> > > > > > > > > allow
> > > >> > > > > > > > > > the user to pass in a negative number?
> > > >> > > > > > > > > >  3. Can you add/update any unit tests related to
> the
> > > >> > > > > > DumpLogSegments
> > > >> > > > > > > > > > arguments?
> > > >> > > > > > > > > > Thanks,
> > > >> > > > > > > > > > Kirk
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel
> > > Troiano wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > >
> >
>

Reply via email to