Hi,

So, is everyone OK using the approach with 2 properties?

E.g.:

Scenario 1:
    compaction.strategy: offset

    :- Behaviour is the same as what currently exists, where the compaction is 
done only via the 'offset'


Scenario 2:
    compaction.strategy: timestamp

    :- Similar to 'offset', but the record timestamp is used instead


Scenario 3:
    compaction.strategy: header

    compaction.strategy.header: xyz

    :- Searches the headers for 'xyz' key when performing the compaction. 
Defaults to 'offset' strategy if this header does not exist (special note on 
the '.header' suffix, as this would allow additional strategies to add whatever 
extra configuration they need).

Scenario 4 (hypothetical future):
    compaction.strategy: foo

    compaction.strategy.foo.name: bar
    compaction.strategy.foo.order: DESC
    compaction.strategy.foo.fallback: timestamp


    :- This one is just to show what I meant with the '.header' suffix 
mentioned in {Scenario 3}



Regards,
Luís


    On Monday, June 18, 2018, 11:56:51 PM GMT+2, Guozhang Wang 
<wangg...@gmail.com> wrote:  
 
 Hi Matthias,

Yes, we are effectively assigning the the whole space of Strings minus
current preserved ones as header keys; honestly I think in practice users
wanting to use `_something_` would be very rare, but I admit it may still
be possible in theory.

I think Luis' point about "header=<header-value>" is that having a expression
evaluation as the config value is a bit weird, and thinking about it twice
it is still not flawless: we can still argue that we are effectively
assigning the whole sub-space of "header=*" of Strings for headers, and
what if users want to use preserved value falling into that sub-space
(again, should not really happen in practice, just being paranoid here).

It seems that two configs are the common choice that everyone is happy with.

Guozhang


On Mon, Jun 18, 2018 at 2:35 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Luis,
>
> I meant to update the "Rejected Alternative" sections, what you have
> done already. Thx.
>
> Originally, I also had the idea about a second config, but thought it
> might be easier to just change the allowed values to be `offset`,
> `timestamp`, `header=<key>`. (We try to keep the number of configs small
> if possible, as more configs are more confusing to users.)
>
> I don't think that using `_offset_`, `_timestamp_` and `<key>` solves
> the problem because users still might use `_something_` as header key --
> and if we want to introduce a new compaction strategy "something" later
> we face the same issues as without the underscores. We only reduce the
> likelihood that it happens.
>
> Using `header=` as prefix or introducing a second config, that is only
> effective if the strategy is set to `header` seems to be a cleaner
> solution.
>
> @Luis: why do you think that using `header=<key>` is an "incorrect
> approach"?
>
> > Though I would still prefer to keep it as it is, as its a much simple>
> and cleaner approach – I’m not so sure that a potential client would
> > really be so inconvenienced for having to use “_offset” or
> > “_timestamp_” as a header
>
> I don't think that it's about the issue that people cannot use
> `_offset_` or `_timestamp_` in their header (by "use" I mean for
> compaction). With the current KIP, they cannot use `offset` or
> `timestamp` either. The issue is, that we cannot introduce a new system
> supported compaction strategy in the future without potentially breaking
> something, as we basically assign the whole space of Strings (minus
> `offset`, `timestamp`) as valid configs to enable header based compaction.
>
> Personally, I prefer either adding a config or going with
> `header=<key>`. Using `_timestamp_`, `_offset_`, and `<key>` might be
> good enough (even if this is the solution I like least)---for this case,
> we should state explicitly, that the whole space of `_*_` is reserved
> and users are not allowed to set those for header compaction. In fact, I
> would also add a check for the config that only allows for `_offset_`
> and `_timestamp_` and throws an exception for all other `_*_` configs.
>
>
> -Matthias
>
>
> On 6/18/18 2:03 PM, Luís Cabral wrote:
> > I’m ok with that...
> >
> > Ted / Matthias?
> >
> >
> > From: Guozhang Wang
> > Sent: 18 June 2018 22:49
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >
> > How about make the preserved values to be "_offset_" and "_timestamp_"
> > then? Currently in the KIP they are reserved as "offset" and "timestamp".
> >
> >
> > Guozhang
> >
> > On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral
> <luis_cab...@yahoo.com.invalid>
> > wrote:
> >
> >> Hi Guozhang,
> >>
> >> Yes, that is what I meant (separate configs).
> >> Though I would still prefer to keep it as it is, as its a much simpler
> and
> >> cleaner approach – I’m not so sure that a potential client would really
> be
> >> so inconvenienced for having to use “_offset” or “_timestamp_” as a
> header
> >>
> >> Cheers,
> >> Luís
> >>
> >>
> >> From: Guozhang Wang
> >> Sent: 18 June 2018 19:35
> >> To: dev@kafka.apache.org
> >> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> >>
> >> Hello Luís,
> >>
> >> I agree that having an expression evaluation as a config value is not
> the
> >> best approach; if there are better ideas to allow users to specify the
> >> header key which happen to be the same as the preserved config values
> >> "offset" and "timestamp" (although the likelihood may be small, as Ted
> >> mentioned there may be more preserved config values added in the
> future),
> >> then I'd be happily follow the suggestions. For example, we could have
> the
> >> config value for header keys as "header-<key-name>"? Is that what you've
> >> suggested? Or do you suggest using two configs instead, and the second
> >> config specifying the key name, and will only be considered if the first
> >> (i.e. current proposed) config's value is `header`, otherwise be
> ignored?
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral
> >> <luis_cab...@yahoo.com.invalid
> >>> wrote:
> >>
> >>>  Hi Ted / Guozhang / Matthias,
> >>>
> >>> @Ted: I've now added your argument to the "Rejected Alternatives"
> portion
> >>> of the KIP. Please keep in mind that I would like to keep this as
> >> backwards
> >>> compatible as possible, so a lot of decisions are inferred from that
> >> intent.
> >>>
> >>> @Guozhang: IMHO, adding expression evaluation to configuration is an
> >>> incorrect approach. If you absolutely insist on having this clear
> >>> distinction between header/key, then I would suggest instead to have a
> >>> dedicated property for the "key" part. Of course, this is your project
> so
> >>> I'll just continue whatever approach moves this KIP forward...
> >>>
> >>> @Matthias: Sorry, but update the KIP according to what?
> >>>
> >>> Cheers,
> >>> Luís
> >>>
> >>>    On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax <
> >>> matth...@confluent.io> wrote:
> >>>
> >>>  Well, for "offset" and "timestamp" policy, not communication between
> >>> both is required.
> >>>
> >>> Only if headers are used, user A should communicate the corresponding
> >>> header key to user B.
> >>>
> >>>
> >>> @Luis: can you update the KIP accordingly?
> >>>
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 6/17/18 5:36 PM, Ted Yu wrote:
> >>>> My previous reply was just an alternative for consideration.
> >>>>
> >>>> bq.  than a second user B can add a header with key "offset" and thus
> >>> break
> >>>> the intention of user A
> >>>>
> >>>> I didn't see such scenario after reading the KIP. Maybe add this as
> >>>> reasoning for the current approach ?
> >>>>
> >>>> I wonder how user B gets to know the intention of user A. Meaning, if
> >>> user
> >>>> B doesn't follow the norm set by user A, there still would be issue,
> >>> right ?
> >>>>
> >>>>
> >>>> On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax <
> >> matth...@confluent.io>
> >>>> wrote:
> >>>>
> >>>>> Let me rephrase your answer to make sure I understand what you
> >> suggest:
> >>>>>
> >>>>> If compaction strategy is configured to use "offset", and if there is
> >> a
> >>>>> header in the record with `key == offset`, than we should use the
> >> value
> >>>>> of the record header instead of the actual record offset?
> >>>>>
> >>>>> Do I understand this correctly? If yes, what is the advantage of
> doing
> >>>>> this? From my point of view, it might be problematic, because if user
> >> A
> >>>>> creates a topic and configures "offset" compaction (with the intend
> >> that
> >>>>> the record offset should be uses), than a second user B can add a
> >> header
> >>>>> with key "offset" and thus break the intention of user A.
> >>>>>
> >>>>> Also, if existing topics might have data with record header key
> >>>>> "offset", the change would not be backward compatible either.
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 6/16/18 6:59 PM, Ted Yu wrote:
> >>>>>> Pardon the brevity in my previous reply.
> >>>>>> I was talking about this bullet:
> >>>>>>
> >>>>>> bq. When this configuration is set to anything other than "*offset*"
> >>> or "
> >>>>>> *timestamp*", then the record headers are scanned for a key matching
> >>> this
> >>>>>> value.
> >>>>>>
> >>>>>> My point is that if matching key in the header is found, its value
> >>> should
> >>>>>> take precedence over the value of the configuration.
> >>>>>> I understand that such interpretation may have slight performance
> >> cost.
> >>>>>>
> >>>>>> Cheers
> >>>>>>
> >>>>>> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax <
> >>> matth...@confluent.io>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Ted,
> >>>>>>>
> >>>>>>> I am also not sure what you mean by "Shouldn't the selection in
> >> header
> >>>>>>> have higher precedence over the configuration"? What selection do
> >> you
> >>>>>>> mean? And want configuration?
> >>>>>>>
> >>>>>>>
> >>>>>>> About the first point, I think this is actually a valid concern: To
> >>>>>>> address this issue, it seems that we would need to change the
> >> accepted
> >>>>>>> format of the config. Instead of "offset", "timestamp",
> >>> "<header-key>",
> >>>>>>> we could replace the last one with "header=<header-key>".
> >>>>>>>
> >>>>>>> WDYT?
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>> On 6/15/18 3:06 AM, Ted Yu wrote:
> >>>>>>>> If selection exists in header, the selection should override the
> >>> config
> >>>>>>> value.
> >>>>>>>> Cheers
> >>>>>>>> -------- Original message --------From: Luis Cabral
> >>>>>>> <luis_cab...@yahoo.com.INVALID> Date: 6/15/18  1:40 AM
> (GMT-08:00)
> >>> To:
> >>>>>>> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log
> >>>>> compaction
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> bq. Can the value be determined now ? My thinking is that what if
> >>> there
> >>>>>>> is a third compaction strategy proposed in the future ? We should
> >>> guard
> >>>>>>> against user unknowingly choosing the 'future' strategy.
> >>>>>>>>
> >>>>>>>> The idea is that the header name to use is flexible, which
> protects
> >>>>>>> current clients that may want to use this from having to adapt
> their
> >>>>>>> already existing header names (they can just specify a new name).
> >>>>>>>>
> >>>>>>>> bq. Shouldn't the selection in header have higher precedence over
> >> the
> >>>>>>> configuration ?
> >>>>>>>>
> >>>>>>>> Not sure what you mean here, could you clarify?
> >>>>>>>>
> >>>>>>>> bq. Please create JIRA if you haven't already.
> >>>>>>>>
> >>>>>>>> Done: https://issues.apache.org/jira/browse/KAFKA-7061
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Luís
> >>>>>>>>
> >>>>>>>>> On 11 Jun 2018, at 01:50, Ted Yu <yuzhih...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> bq. When this configuration is set to anything other than
> >> "*offset*"
> >>>>> or
> >>>>>>> "
> >>>>>>>>> *timestamp*", then the record headers are scanned for a key
> >> matching
> >>>>>>> this
> >>>>>>>>> value.
> >>>>>>>>>
> >>>>>>>>> Can the value be determined now ? My thinking is that what if
> >> there
> >>>>> is a
> >>>>>>>>> third compaction strategy proposed in the future ? We should
> guard
> >>>>>>> against
> >>>>>>>>> user unknowingly choosing the 'future' strategy.
> >>>>>>>>>
> >>>>>>>>> bq. If this header is found
> >>>>>>>>>
> >>>>>>>>> Shouldn't the selection in header have higher precedence over the
> >>>>>>> configuration
> >>>>>>>>> ?
> >>>>>>>>>
> >>>>>>>>> Please create JIRA if you haven't already.
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
> >>>>>>> <luis_cab...@yahoo.com.invalid>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> Any takers on having a look at this KIP and voting on it?
> >>>>>>>>>>
> >>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>> 280%3A+Enhanced+log+compaction
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Luis
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >>
> >
> >
>
>


-- 
-- Guozhang  

Reply via email to