+1 from my side for using `compaction.strategy` with values "offset", "timestamp" and "header" and `compaction.strategy.header`
-Matthias On 6/25/18 1:25 AM, Luís Cabral wrote: > Hi, > > So, is everyone OK using the approach with 2 properties? > > E.g.: > > Scenario 1: > compaction.strategy: offset > > :- Behaviour is the same as what currently exists, where the compaction > is done only via the 'offset' > > > Scenario 2: > compaction.strategy: timestamp > > :- Similar to 'offset', but the record timestamp is used instead > > > Scenario 3: > compaction.strategy: header > > compaction.strategy.header: xyz > > :- Searches the headers for 'xyz' key when performing the compaction. > Defaults to 'offset' strategy if this header does not exist (special note on > the '.header' suffix, as this would allow additional strategies to add > whatever extra configuration they need). > > Scenario 4 (hypothetical future): > compaction.strategy: foo > > compaction.strategy.foo.name: bar > compaction.strategy.foo.order: DESC > compaction.strategy.foo.fallback: timestamp > > > :- This one is just to show what I meant with the '.header' suffix > mentioned in {Scenario 3} > > > > Regards, > Luís > > > On Monday, June 18, 2018, 11:56:51 PM GMT+2, Guozhang Wang > <wangg...@gmail.com> wrote: > > Hi Matthias, > > Yes, we are effectively assigning the the whole space of Strings minus > current preserved ones as header keys; honestly I think in practice users > wanting to use `_something_` would be very rare, but I admit it may still > be possible in theory. > > I think Luis' point about "header=<header-value>" is that having a expression > evaluation as the config value is a bit weird, and thinking about it twice > it is still not flawless: we can still argue that we are effectively > assigning the whole sub-space of "header=*" of Strings for headers, and > what if users want to use preserved value falling into that sub-space > (again, should not really happen in practice, just being paranoid here). > > It seems that two configs are the common choice that everyone is happy with. > > Guozhang > > > On Mon, Jun 18, 2018 at 2:35 PM, Matthias J. Sax <matth...@confluent.io> > wrote: > >> Luis, >> >> I meant to update the "Rejected Alternative" sections, what you have >> done already. Thx. >> >> Originally, I also had the idea about a second config, but thought it >> might be easier to just change the allowed values to be `offset`, >> `timestamp`, `header=<key>`. (We try to keep the number of configs small >> if possible, as more configs are more confusing to users.) >> >> I don't think that using `_offset_`, `_timestamp_` and `<key>` solves >> the problem because users still might use `_something_` as header key -- >> and if we want to introduce a new compaction strategy "something" later >> we face the same issues as without the underscores. We only reduce the >> likelihood that it happens. >> >> Using `header=` as prefix or introducing a second config, that is only >> effective if the strategy is set to `header` seems to be a cleaner >> solution. >> >> @Luis: why do you think that using `header=<key>` is an "incorrect >> approach"? >> >>> Though I would still prefer to keep it as it is, as its a much simple> >> and cleaner approach – I’m not so sure that a potential client would >>> really be so inconvenienced for having to use “_offset” or >>> “_timestamp_” as a header >> >> I don't think that it's about the issue that people cannot use >> `_offset_` or `_timestamp_` in their header (by "use" I mean for >> compaction). With the current KIP, they cannot use `offset` or >> `timestamp` either. The issue is, that we cannot introduce a new system >> supported compaction strategy in the future without potentially breaking >> something, as we basically assign the whole space of Strings (minus >> `offset`, `timestamp`) as valid configs to enable header based compaction. >> >> Personally, I prefer either adding a config or going with >> `header=<key>`. Using `_timestamp_`, `_offset_`, and `<key>` might be >> good enough (even if this is the solution I like least)---for this case, >> we should state explicitly, that the whole space of `_*_` is reserved >> and users are not allowed to set those for header compaction. In fact, I >> would also add a check for the config that only allows for `_offset_` >> and `_timestamp_` and throws an exception for all other `_*_` configs. >> >> >> -Matthias >> >> >> On 6/18/18 2:03 PM, Luís Cabral wrote: >>> I’m ok with that... >>> >>> Ted / Matthias? >>> >>> >>> From: Guozhang Wang >>> Sent: 18 June 2018 22:49 >>> To: dev@kafka.apache.org >>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction >>> >>> How about make the preserved values to be "_offset_" and "_timestamp_" >>> then? Currently in the KIP they are reserved as "offset" and "timestamp". >>> >>> >>> Guozhang >>> >>> On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral >> <luis_cab...@yahoo.com.invalid> >>> wrote: >>> >>>> Hi Guozhang, >>>> >>>> Yes, that is what I meant (separate configs). >>>> Though I would still prefer to keep it as it is, as its a much simpler >> and >>>> cleaner approach – I’m not so sure that a potential client would really >> be >>>> so inconvenienced for having to use “_offset” or “_timestamp_” as a >> header >>>> >>>> Cheers, >>>> Luís >>>> >>>> >>>> From: Guozhang Wang >>>> Sent: 18 June 2018 19:35 >>>> To: dev@kafka.apache.org >>>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction >>>> >>>> Hello Luís, >>>> >>>> I agree that having an expression evaluation as a config value is not >> the >>>> best approach; if there are better ideas to allow users to specify the >>>> header key which happen to be the same as the preserved config values >>>> "offset" and "timestamp" (although the likelihood may be small, as Ted >>>> mentioned there may be more preserved config values added in the >> future), >>>> then I'd be happily follow the suggestions. For example, we could have >> the >>>> config value for header keys as "header-<key-name>"? Is that what you've >>>> suggested? Or do you suggest using two configs instead, and the second >>>> config specifying the key name, and will only be considered if the first >>>> (i.e. current proposed) config's value is `header`, otherwise be >> ignored? >>>> >>>> >>>> Guozhang >>>> >>>> >>>> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral >>>> <luis_cab...@yahoo.com.invalid >>>>> wrote: >>>> >>>>> Hi Ted / Guozhang / Matthias, >>>>> >>>>> @Ted: I've now added your argument to the "Rejected Alternatives" >> portion >>>>> of the KIP. Please keep in mind that I would like to keep this as >>>> backwards >>>>> compatible as possible, so a lot of decisions are inferred from that >>>> intent. >>>>> >>>>> @Guozhang: IMHO, adding expression evaluation to configuration is an >>>>> incorrect approach. If you absolutely insist on having this clear >>>>> distinction between header/key, then I would suggest instead to have a >>>>> dedicated property for the "key" part. Of course, this is your project >> so >>>>> I'll just continue whatever approach moves this KIP forward... >>>>> >>>>> @Matthias: Sorry, but update the KIP according to what? >>>>> >>>>> Cheers, >>>>> Luís >>>>> >>>>> On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax < >>>>> matth...@confluent.io> wrote: >>>>> >>>>> Well, for "offset" and "timestamp" policy, not communication between >>>>> both is required. >>>>> >>>>> Only if headers are used, user A should communicate the corresponding >>>>> header key to user B. >>>>> >>>>> >>>>> @Luis: can you update the KIP accordingly? >>>>> >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> On 6/17/18 5:36 PM, Ted Yu wrote: >>>>>> My previous reply was just an alternative for consideration. >>>>>> >>>>>> bq. than a second user B can add a header with key "offset" and thus >>>>> break >>>>>> the intention of user A >>>>>> >>>>>> I didn't see such scenario after reading the KIP. Maybe add this as >>>>>> reasoning for the current approach ? >>>>>> >>>>>> I wonder how user B gets to know the intention of user A. Meaning, if >>>>> user >>>>>> B doesn't follow the norm set by user A, there still would be issue, >>>>> right ? >>>>>> >>>>>> >>>>>> On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax < >>>> matth...@confluent.io> >>>>>> wrote: >>>>>> >>>>>>> Let me rephrase your answer to make sure I understand what you >>>> suggest: >>>>>>> >>>>>>> If compaction strategy is configured to use "offset", and if there is >>>> a >>>>>>> header in the record with `key == offset`, than we should use the >>>> value >>>>>>> of the record header instead of the actual record offset? >>>>>>> >>>>>>> Do I understand this correctly? If yes, what is the advantage of >> doing >>>>>>> this? From my point of view, it might be problematic, because if user >>>> A >>>>>>> creates a topic and configures "offset" compaction (with the intend >>>> that >>>>>>> the record offset should be uses), than a second user B can add a >>>> header >>>>>>> with key "offset" and thus break the intention of user A. >>>>>>> >>>>>>> Also, if existing topics might have data with record header key >>>>>>> "offset", the change would not be backward compatible either. >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> On 6/16/18 6:59 PM, Ted Yu wrote: >>>>>>>> Pardon the brevity in my previous reply. >>>>>>>> I was talking about this bullet: >>>>>>>> >>>>>>>> bq. When this configuration is set to anything other than "*offset*" >>>>> or " >>>>>>>> *timestamp*", then the record headers are scanned for a key matching >>>>> this >>>>>>>> value. >>>>>>>> >>>>>>>> My point is that if matching key in the header is found, its value >>>>> should >>>>>>>> take precedence over the value of the configuration. >>>>>>>> I understand that such interpretation may have slight performance >>>> cost. >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax < >>>>> matth...@confluent.io> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Ted, >>>>>>>>> >>>>>>>>> I am also not sure what you mean by "Shouldn't the selection in >>>> header >>>>>>>>> have higher precedence over the configuration"? What selection do >>>> you >>>>>>>>> mean? And want configuration? >>>>>>>>> >>>>>>>>> >>>>>>>>> About the first point, I think this is actually a valid concern: To >>>>>>>>> address this issue, it seems that we would need to change the >>>> accepted >>>>>>>>> format of the config. Instead of "offset", "timestamp", >>>>> "<header-key>", >>>>>>>>> we could replace the last one with "header=<header-key>". >>>>>>>>> >>>>>>>>> WDYT? >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> On 6/15/18 3:06 AM, Ted Yu wrote: >>>>>>>>>> If selection exists in header, the selection should override the >>>>> config >>>>>>>>> value. >>>>>>>>>> Cheers >>>>>>>>>> -------- Original message --------From: Luis Cabral >>>>>>>>> <luis_cab...@yahoo.com.INVALID> Date: 6/15/18 1:40 AM >> (GMT-08:00) >>>>> To: >>>>>>>>> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log >>>>>>> compaction >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> bq. Can the value be determined now ? My thinking is that what if >>>>> there >>>>>>>>> is a third compaction strategy proposed in the future ? We should >>>>> guard >>>>>>>>> against user unknowingly choosing the 'future' strategy. >>>>>>>>>> >>>>>>>>>> The idea is that the header name to use is flexible, which >> protects >>>>>>>>> current clients that may want to use this from having to adapt >> their >>>>>>>>> already existing header names (they can just specify a new name). >>>>>>>>>> >>>>>>>>>> bq. Shouldn't the selection in header have higher precedence over >>>> the >>>>>>>>> configuration ? >>>>>>>>>> >>>>>>>>>> Not sure what you mean here, could you clarify? >>>>>>>>>> >>>>>>>>>> bq. Please create JIRA if you haven't already. >>>>>>>>>> >>>>>>>>>> Done: https://issues.apache.org/jira/browse/KAFKA-7061 >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Luís >>>>>>>>>> >>>>>>>>>>> On 11 Jun 2018, at 01:50, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> bq. When this configuration is set to anything other than >>>> "*offset*" >>>>>>> or >>>>>>>>> " >>>>>>>>>>> *timestamp*", then the record headers are scanned for a key >>>> matching >>>>>>>>> this >>>>>>>>>>> value. >>>>>>>>>>> >>>>>>>>>>> Can the value be determined now ? My thinking is that what if >>>> there >>>>>>> is a >>>>>>>>>>> third compaction strategy proposed in the future ? We should >> guard >>>>>>>>> against >>>>>>>>>>> user unknowingly choosing the 'future' strategy. >>>>>>>>>>> >>>>>>>>>>> bq. If this header is found >>>>>>>>>>> >>>>>>>>>>> Shouldn't the selection in header have higher precedence over the >>>>>>>>> configuration >>>>>>>>>>> ? >>>>>>>>>>> >>>>>>>>>>> Please create JIRA if you haven't already. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral >>>>>>>>> <luis_cab...@yahoo.com.invalid> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> Any takers on having a look at this KIP and voting on it? >>>>>>>>>>>> >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>>>> 280%3A+Enhanced+log+compaction >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Luis >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> -- Guozhang >>>> >>>> >>> >>> >> >> > >
signature.asc
Description: OpenPGP digital signature