+1 from me as well.

On Mon, Jun 25, 2018 at 8:16 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

> +1 from my side for using `compaction.strategy` with values "offset",
> "timestamp" and "header" and `compaction.strategy.header`
>
> -Matthias
>
> On 6/25/18 1:25 AM, Luís Cabral wrote:
> >  Hi,
> >
> > So, is everyone OK using the approach with 2 properties?
> >
> > E.g.:
> >
> > Scenario 1:
> >     compaction.strategy: offset
> >
> >     :- Behaviour is the same as what currently exists, where the
> compaction is done only via the 'offset'
> >
> >
> > Scenario 2:
> >     compaction.strategy: timestamp
> >
> >     :- Similar to 'offset', but the record timestamp is used instead
> >
> >
> > Scenario 3:
> >     compaction.strategy: header
> >
> >     compaction.strategy.header: xyz
> >
> >     :- Searches the headers for 'xyz' key when performing the
> compaction. Defaults to 'offset' strategy if this header does not exist
> (special note on the '.header' suffix, as this would allow additional
> strategies to add whatever extra configuration they need).
> >
> > Scenario 4 (hypothetical future):
> >     compaction.strategy: foo
> >
> >     compaction.strategy.foo.name: bar
> >     compaction.strategy.foo.order: DESC
> >     compaction.strategy.foo.fallback: timestamp
> >
> >
> >     :- This one is just to show what I meant with the '.header' suffix
> mentioned in {Scenario 3}
> >
> >
> >
> > Regards,
> > Luís
> >
> >
> >     On Monday, June 18, 2018, 11:56:51 PM GMT+2, Guozhang Wang <
> wangg...@gmail.com> wrote:
> >
> >  Hi Matthias,
> >
> > Yes, we are effectively assigning the the whole space of Strings minus
> > current preserved ones as header keys; honestly I think in practice users
> > wanting to use `_something_` would be very rare, but I admit it may still
> > be possible in theory.
> >
> > I think Luis' point about "header=<header-value>" is that having a
> expression
> > evaluation as the config value is a bit weird, and thinking about it
> twice
> > it is still not flawless: we can still argue that we are effectively
> > assigning the whole sub-space of "header=*" of Strings for headers, and
> > what if users want to use preserved value falling into that sub-space
> > (again, should not really happen in practice, just being paranoid here).
> >
> > It seems that two configs are the common choice that everyone is happy
> with.
> >
> > Guozhang
> >
> >
> > On Mon, Jun 18, 2018 at 2:35 PM, Matthias J. Sax <matth...@confluent.io>
> > wrote:
> >
> >> Luis,
> >>
> >> I meant to update the "Rejected Alternative" sections, what you have
> >> done already. Thx.
> >>
> >> Originally, I also had the idea about a second config, but thought it
> >> might be easier to just change the allowed values to be `offset`,
> >> `timestamp`, `header=<key>`. (We try to keep the number of configs small
> >> if possible, as more configs are more confusing to users.)
> >>
> >> I don't think that using `_offset_`, `_timestamp_` and `<key>` solves
> >> the problem because users still might use `_something_` as header key --
> >> and if we want to introduce a new compaction strategy "something" later
> >> we face the same issues as without the underscores. We only reduce the
> >> likelihood that it happens.
> >>
> >> Using `header=` as prefix or introducing a second config, that is only
> >> effective if the strategy is set to `header` seems to be a cleaner
> >> solution.
> >>
> >> @Luis: why do you think that using `header=<key>` is an "incorrect
> >> approach"?
> >>
> >>> Though I would still prefer to keep it as it is, as its a much simple>
> >> and cleaner approach – I’m not so sure that a potential client would
> >>> really be so inconvenienced for having to use “_offset” or
> >>> “_timestamp_” as a header
> >>
> >> I don't think that it's about the issue that people cannot use
> >> `_offset_` or `_timestamp_` in their header (by "use" I mean for
> >> compaction). With the current KIP, they cannot use `offset` or
> >> `timestamp` either. The issue is, that we cannot introduce a new system
> >> supported compaction strategy in the future without potentially breaking
> >> something, as we basically assign the whole space of Strings (minus
> >> `offset`, `timestamp`) as valid configs to enable header based
> compaction.
> >>
> >> Personally, I prefer either adding a config or going with
> >> `header=<key>`. Using `_timestamp_`, `_offset_`, and `<key>` might be
> >> good enough (even if this is the solution I like least)---for this case,
> >> we should state explicitly, that the whole space of `_*_` is reserved
> >> and users are not allowed to set those for header compaction. In fact, I
> >> would also add a check for the config that only allows for `_offset_`
> >> and `_timestamp_` and throws an exception for all other `_*_` configs.
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 6/18/18 2:03 PM, Luís Cabral wrote:
> >>> I’m ok with that...
> >>>
> >>> Ted / Matthias?
> >>>
> >>>
> >>> From: Guozhang Wang
> >>> Sent: 18 June 2018 22:49
> >>> To: dev@kafka.apache.org
> >>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >>>
> >>> How about make the preserved values to be "_offset_" and "_timestamp_"
> >>> then? Currently in the KIP they are reserved as "offset" and
> "timestamp".
> >>>
> >>>
> >>> Guozhang
> >>>
> >>> On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral
> >> <luis_cab...@yahoo.com.invalid>
> >>> wrote:
> >>>
> >>>> Hi Guozhang,
> >>>>
> >>>> Yes, that is what I meant (separate configs).
> >>>> Though I would still prefer to keep it as it is, as its a much simpler
> >> and
> >>>> cleaner approach – I’m not so sure that a potential client would
> really
> >> be
> >>>> so inconvenienced for having to use “_offset” or “_timestamp_” as a
> >> header
> >>>>
> >>>> Cheers,
> >>>> Luís
> >>>>
> >>>>
> >>>> From: Guozhang Wang
> >>>> Sent: 18 June 2018 19:35
> >>>> To: dev@kafka.apache.org
> >>>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >>>>
> >>>> Hello Luís,
> >>>>
> >>>> I agree that having an expression evaluation as a config value is not
> >> the
> >>>> best approach; if there are better ideas to allow users to specify the
> >>>> header key which happen to be the same as the preserved config values
> >>>> "offset" and "timestamp" (although the likelihood may be small, as Ted
> >>>> mentioned there may be more preserved config values added in the
> >> future),
> >>>> then I'd be happily follow the suggestions. For example, we could have
> >> the
> >>>> config value for header keys as "header-<key-name>"? Is that what
> you've
> >>>> suggested? Or do you suggest using two configs instead, and the second
> >>>> config specifying the key name, and will only be considered if the
> first
> >>>> (i.e. current proposed) config's value is `header`, otherwise be
> >> ignored?
> >>>>
> >>>>
> >>>> Guozhang
> >>>>
> >>>>
> >>>> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral
> >>>> <luis_cab...@yahoo.com.invalid
> >>>>> wrote:
> >>>>
> >>>>>   Hi Ted / Guozhang / Matthias,
> >>>>>
> >>>>> @Ted: I've now added your argument to the "Rejected Alternatives"
> >> portion
> >>>>> of the KIP. Please keep in mind that I would like to keep this as
> >>>> backwards
> >>>>> compatible as possible, so a lot of decisions are inferred from that
> >>>> intent.
> >>>>>
> >>>>> @Guozhang: IMHO, adding expression evaluation to configuration is an
> >>>>> incorrect approach. If you absolutely insist on having this clear
> >>>>> distinction between header/key, then I would suggest instead to have
> a
> >>>>> dedicated property for the "key" part. Of course, this is your
> project
> >> so
> >>>>> I'll just continue whatever approach moves this KIP forward...
> >>>>>
> >>>>> @Matthias: Sorry, but update the KIP according to what?
> >>>>>
> >>>>> Cheers,
> >>>>> Luís
> >>>>>
> >>>>>     On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax <
> >>>>> matth...@confluent.io> wrote:
> >>>>>
> >>>>>   Well, for "offset" and "timestamp" policy, not communication
> between
> >>>>> both is required.
> >>>>>
> >>>>> Only if headers are used, user A should communicate the corresponding
> >>>>> header key to user B.
> >>>>>
> >>>>>
> >>>>> @Luis: can you update the KIP accordingly?
> >>>>>
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 6/17/18 5:36 PM, Ted Yu wrote:
> >>>>>> My previous reply was just an alternative for consideration.
> >>>>>>
> >>>>>> bq.  than a second user B can add a header with key "offset" and
> thus
> >>>>> break
> >>>>>> the intention of user A
> >>>>>>
> >>>>>> I didn't see such scenario after reading the KIP. Maybe add this as
> >>>>>> reasoning for the current approach ?
> >>>>>>
> >>>>>> I wonder how user B gets to know the intention of user A. Meaning,
> if
> >>>>> user
> >>>>>> B doesn't follow the norm set by user A, there still would be issue,
> >>>>> right ?
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax <
> >>>> matth...@confluent.io>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Let me rephrase your answer to make sure I understand what you
> >>>> suggest:
> >>>>>>>
> >>>>>>> If compaction strategy is configured to use "offset", and if there
> is
> >>>> a
> >>>>>>> header in the record with `key == offset`, than we should use the
> >>>> value
> >>>>>>> of the record header instead of the actual record offset?
> >>>>>>>
> >>>>>>> Do I understand this correctly? If yes, what is the advantage of
> >> doing
> >>>>>>> this? From my point of view, it might be problematic, because if
> user
> >>>> A
> >>>>>>> creates a topic and configures "offset" compaction (with the intend
> >>>> that
> >>>>>>> the record offset should be uses), than a second user B can add a
> >>>> header
> >>>>>>> with key "offset" and thus break the intention of user A.
> >>>>>>>
> >>>>>>> Also, if existing topics might have data with record header key
> >>>>>>> "offset", the change would not be backward compatible either.
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>> On 6/16/18 6:59 PM, Ted Yu wrote:
> >>>>>>>> Pardon the brevity in my previous reply.
> >>>>>>>> I was talking about this bullet:
> >>>>>>>>
> >>>>>>>> bq. When this configuration is set to anything other than
> "*offset*"
> >>>>> or "
> >>>>>>>> *timestamp*", then the record headers are scanned for a key
> matching
> >>>>> this
> >>>>>>>> value.
> >>>>>>>>
> >>>>>>>> My point is that if matching key in the header is found, its value
> >>>>> should
> >>>>>>>> take precedence over the value of the configuration.
> >>>>>>>> I understand that such interpretation may have slight performance
> >>>> cost.
> >>>>>>>>
> >>>>>>>> Cheers
> >>>>>>>>
> >>>>>>>> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax <
> >>>>> matth...@confluent.io>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Ted,
> >>>>>>>>>
> >>>>>>>>> I am also not sure what you mean by "Shouldn't the selection in
> >>>> header
> >>>>>>>>> have higher precedence over the configuration"? What selection do
> >>>> you
> >>>>>>>>> mean? And want configuration?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> About the first point, I think this is actually a valid concern:
> To
> >>>>>>>>> address this issue, it seems that we would need to change the
> >>>> accepted
> >>>>>>>>> format of the config. Instead of "offset", "timestamp",
> >>>>> "<header-key>",
> >>>>>>>>> we could replace the last one with "header=<header-key>".
> >>>>>>>>>
> >>>>>>>>> WDYT?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>> On 6/15/18 3:06 AM, Ted Yu wrote:
> >>>>>>>>>> If selection exists in header, the selection should override the
> >>>>> config
> >>>>>>>>> value.
> >>>>>>>>>> Cheers
> >>>>>>>>>> -------- Original message --------From: Luis Cabral
> >>>>>>>>> <luis_cab...@yahoo.com.INVALID> Date: 6/15/18  1:40 AM
> >> (GMT-08:00)
> >>>>> To:
> >>>>>>>>> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log
> >>>>>>> compaction
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> bq. Can the value be determined now ? My thinking is that what
> if
> >>>>> there
> >>>>>>>>> is a third compaction strategy proposed in the future ? We should
> >>>>> guard
> >>>>>>>>> against user unknowingly choosing the 'future' strategy.
> >>>>>>>>>>
> >>>>>>>>>> The idea is that the header name to use is flexible, which
> >> protects
> >>>>>>>>> current clients that may want to use this from having to adapt
> >> their
> >>>>>>>>> already existing header names (they can just specify a new name).
> >>>>>>>>>>
> >>>>>>>>>> bq. Shouldn't the selection in header have higher precedence
> over
> >>>> the
> >>>>>>>>> configuration ?
> >>>>>>>>>>
> >>>>>>>>>> Not sure what you mean here, could you clarify?
> >>>>>>>>>>
> >>>>>>>>>> bq. Please create JIRA if you haven't already.
> >>>>>>>>>>
> >>>>>>>>>> Done: https://issues.apache.org/jira/browse/KAFKA-7061
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Luís
> >>>>>>>>>>
> >>>>>>>>>>> On 11 Jun 2018, at 01:50, Ted Yu <yuzhih...@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> bq. When this configuration is set to anything other than
> >>>> "*offset*"
> >>>>>>> or
> >>>>>>>>> "
> >>>>>>>>>>> *timestamp*", then the record headers are scanned for a key
> >>>> matching
> >>>>>>>>> this
> >>>>>>>>>>> value.
> >>>>>>>>>>>
> >>>>>>>>>>> Can the value be determined now ? My thinking is that what if
> >>>> there
> >>>>>>> is a
> >>>>>>>>>>> third compaction strategy proposed in the future ? We should
> >> guard
> >>>>>>>>> against
> >>>>>>>>>>> user unknowingly choosing the 'future' strategy.
> >>>>>>>>>>>
> >>>>>>>>>>> bq. If this header is found
> >>>>>>>>>>>
> >>>>>>>>>>> Shouldn't the selection in header have higher precedence over
> the
> >>>>>>>>> configuration
> >>>>>>>>>>> ?
> >>>>>>>>>>>
> >>>>>>>>>>> Please create JIRA if you haven't already.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>>
> >>>>>>>>>>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
> >>>>>>>>> <luis_cab...@yahoo.com.invalid>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Any takers on having a look at this KIP and voting on it?
> >>>>>>>>>>>>
> >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>> 280%3A+Enhanced+log+compaction
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers,
> >>>>>>>>>>>> Luis
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> -- Guozhang
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>


-- 
-- Guozhang

Reply via email to