+1 from my side for using `compaction.strategy` with values "offset",
"timestamp" and "header" and `compaction.strategy.header`

-Matthias

On 6/25/18 1:25 AM, Luís Cabral wrote:
>  Hi,
> 
> So, is everyone OK using the approach with 2 properties?
> 
> E.g.:
> 
> Scenario 1:
>     compaction.strategy: offset
> 
>     :- Behaviour is the same as what currently exists, where the compaction 
> is done only via the 'offset'
> 
> 
> Scenario 2:
>     compaction.strategy: timestamp
> 
>     :- Similar to 'offset', but the record timestamp is used instead
> 
> 
> Scenario 3:
>     compaction.strategy: header
> 
>     compaction.strategy.header: xyz
> 
>     :- Searches the headers for 'xyz' key when performing the compaction. 
> Defaults to 'offset' strategy if this header does not exist (special note on 
> the '.header' suffix, as this would allow additional strategies to add 
> whatever extra configuration they need).
> 
> Scenario 4 (hypothetical future):
>     compaction.strategy: foo
> 
>     compaction.strategy.foo.name: bar
>     compaction.strategy.foo.order: DESC
>     compaction.strategy.foo.fallback: timestamp
> 
> 
>     :- This one is just to show what I meant with the '.header' suffix 
> mentioned in {Scenario 3}
> 
> 
> 
> Regards,
> Luís
> 
> 
>     On Monday, June 18, 2018, 11:56:51 PM GMT+2, Guozhang Wang 
> <wangg...@gmail.com> wrote:  
>  
>  Hi Matthias,
> 
> Yes, we are effectively assigning the the whole space of Strings minus
> current preserved ones as header keys; honestly I think in practice users
> wanting to use `_something_` would be very rare, but I admit it may still
> be possible in theory.
> 
> I think Luis' point about "header=<header-value>" is that having a expression
> evaluation as the config value is a bit weird, and thinking about it twice
> it is still not flawless: we can still argue that we are effectively
> assigning the whole sub-space of "header=*" of Strings for headers, and
> what if users want to use preserved value falling into that sub-space
> (again, should not really happen in practice, just being paranoid here).
> 
> It seems that two configs are the common choice that everyone is happy with.
> 
> Guozhang
> 
> 
> On Mon, Jun 18, 2018 at 2:35 PM, Matthias J. Sax <matth...@confluent.io>
> wrote:
> 
>> Luis,
>>
>> I meant to update the "Rejected Alternative" sections, what you have
>> done already. Thx.
>>
>> Originally, I also had the idea about a second config, but thought it
>> might be easier to just change the allowed values to be `offset`,
>> `timestamp`, `header=<key>`. (We try to keep the number of configs small
>> if possible, as more configs are more confusing to users.)
>>
>> I don't think that using `_offset_`, `_timestamp_` and `<key>` solves
>> the problem because users still might use `_something_` as header key --
>> and if we want to introduce a new compaction strategy "something" later
>> we face the same issues as without the underscores. We only reduce the
>> likelihood that it happens.
>>
>> Using `header=` as prefix or introducing a second config, that is only
>> effective if the strategy is set to `header` seems to be a cleaner
>> solution.
>>
>> @Luis: why do you think that using `header=<key>` is an "incorrect
>> approach"?
>>
>>> Though I would still prefer to keep it as it is, as its a much simple>
>> and cleaner approach – I’m not so sure that a potential client would
>>> really be so inconvenienced for having to use “_offset” or
>>> “_timestamp_” as a header
>>
>> I don't think that it's about the issue that people cannot use
>> `_offset_` or `_timestamp_` in their header (by "use" I mean for
>> compaction). With the current KIP, they cannot use `offset` or
>> `timestamp` either. The issue is, that we cannot introduce a new system
>> supported compaction strategy in the future without potentially breaking
>> something, as we basically assign the whole space of Strings (minus
>> `offset`, `timestamp`) as valid configs to enable header based compaction.
>>
>> Personally, I prefer either adding a config or going with
>> `header=<key>`. Using `_timestamp_`, `_offset_`, and `<key>` might be
>> good enough (even if this is the solution I like least)---for this case,
>> we should state explicitly, that the whole space of `_*_` is reserved
>> and users are not allowed to set those for header compaction. In fact, I
>> would also add a check for the config that only allows for `_offset_`
>> and `_timestamp_` and throws an exception for all other `_*_` configs.
>>
>>
>> -Matthias
>>
>>
>> On 6/18/18 2:03 PM, Luís Cabral wrote:
>>> I’m ok with that...
>>>
>>> Ted / Matthias?
>>>
>>>
>>> From: Guozhang Wang
>>> Sent: 18 June 2018 22:49
>>> To: dev@kafka.apache.org
>>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>>>
>>> How about make the preserved values to be "_offset_" and "_timestamp_"
>>> then? Currently in the KIP they are reserved as "offset" and "timestamp".
>>>
>>>
>>> Guozhang
>>>
>>> On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral
>> <luis_cab...@yahoo.com.invalid>
>>> wrote:
>>>
>>>> Hi Guozhang,
>>>>
>>>> Yes, that is what I meant (separate configs).
>>>> Though I would still prefer to keep it as it is, as its a much simpler
>> and
>>>> cleaner approach – I’m not so sure that a potential client would really
>> be
>>>> so inconvenienced for having to use “_offset” or “_timestamp_” as a
>> header
>>>>
>>>> Cheers,
>>>> Luís
>>>>
>>>>
>>>> From: Guozhang Wang
>>>> Sent: 18 June 2018 19:35
>>>> To: dev@kafka.apache.org
>>>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>>>>
>>>> Hello Luís,
>>>>
>>>> I agree that having an expression evaluation as a config value is not
>> the
>>>> best approach; if there are better ideas to allow users to specify the
>>>> header key which happen to be the same as the preserved config values
>>>> "offset" and "timestamp" (although the likelihood may be small, as Ted
>>>> mentioned there may be more preserved config values added in the
>> future),
>>>> then I'd be happily follow the suggestions. For example, we could have
>> the
>>>> config value for header keys as "header-<key-name>"? Is that what you've
>>>> suggested? Or do you suggest using two configs instead, and the second
>>>> config specifying the key name, and will only be considered if the first
>>>> (i.e. current proposed) config's value is `header`, otherwise be
>> ignored?
>>>>
>>>>
>>>> Guozhang
>>>>
>>>>
>>>> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral
>>>> <luis_cab...@yahoo.com.invalid
>>>>> wrote:
>>>>
>>>>>   Hi Ted / Guozhang / Matthias,
>>>>>
>>>>> @Ted: I've now added your argument to the "Rejected Alternatives"
>> portion
>>>>> of the KIP. Please keep in mind that I would like to keep this as
>>>> backwards
>>>>> compatible as possible, so a lot of decisions are inferred from that
>>>> intent.
>>>>>
>>>>> @Guozhang: IMHO, adding expression evaluation to configuration is an
>>>>> incorrect approach. If you absolutely insist on having this clear
>>>>> distinction between header/key, then I would suggest instead to have a
>>>>> dedicated property for the "key" part. Of course, this is your project
>> so
>>>>> I'll just continue whatever approach moves this KIP forward...
>>>>>
>>>>> @Matthias: Sorry, but update the KIP according to what?
>>>>>
>>>>> Cheers,
>>>>> Luís
>>>>>
>>>>>     On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax <
>>>>> matth...@confluent.io> wrote:
>>>>>
>>>>>   Well, for "offset" and "timestamp" policy, not communication between
>>>>> both is required.
>>>>>
>>>>> Only if headers are used, user A should communicate the corresponding
>>>>> header key to user B.
>>>>>
>>>>>
>>>>> @Luis: can you update the KIP accordingly?
>>>>>
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 6/17/18 5:36 PM, Ted Yu wrote:
>>>>>> My previous reply was just an alternative for consideration.
>>>>>>
>>>>>> bq.  than a second user B can add a header with key "offset" and thus
>>>>> break
>>>>>> the intention of user A
>>>>>>
>>>>>> I didn't see such scenario after reading the KIP. Maybe add this as
>>>>>> reasoning for the current approach ?
>>>>>>
>>>>>> I wonder how user B gets to know the intention of user A. Meaning, if
>>>>> user
>>>>>> B doesn't follow the norm set by user A, there still would be issue,
>>>>> right ?
>>>>>>
>>>>>>
>>>>>> On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax <
>>>> matth...@confluent.io>
>>>>>> wrote:
>>>>>>
>>>>>>> Let me rephrase your answer to make sure I understand what you
>>>> suggest:
>>>>>>>
>>>>>>> If compaction strategy is configured to use "offset", and if there is
>>>> a
>>>>>>> header in the record with `key == offset`, than we should use the
>>>> value
>>>>>>> of the record header instead of the actual record offset?
>>>>>>>
>>>>>>> Do I understand this correctly? If yes, what is the advantage of
>> doing
>>>>>>> this? From my point of view, it might be problematic, because if user
>>>> A
>>>>>>> creates a topic and configures "offset" compaction (with the intend
>>>> that
>>>>>>> the record offset should be uses), than a second user B can add a
>>>> header
>>>>>>> with key "offset" and thus break the intention of user A.
>>>>>>>
>>>>>>> Also, if existing topics might have data with record header key
>>>>>>> "offset", the change would not be backward compatible either.
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>> On 6/16/18 6:59 PM, Ted Yu wrote:
>>>>>>>> Pardon the brevity in my previous reply.
>>>>>>>> I was talking about this bullet:
>>>>>>>>
>>>>>>>> bq. When this configuration is set to anything other than "*offset*"
>>>>> or "
>>>>>>>> *timestamp*", then the record headers are scanned for a key matching
>>>>> this
>>>>>>>> value.
>>>>>>>>
>>>>>>>> My point is that if matching key in the header is found, its value
>>>>> should
>>>>>>>> take precedence over the value of the configuration.
>>>>>>>> I understand that such interpretation may have slight performance
>>>> cost.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax <
>>>>> matth...@confluent.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Ted,
>>>>>>>>>
>>>>>>>>> I am also not sure what you mean by "Shouldn't the selection in
>>>> header
>>>>>>>>> have higher precedence over the configuration"? What selection do
>>>> you
>>>>>>>>> mean? And want configuration?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> About the first point, I think this is actually a valid concern: To
>>>>>>>>> address this issue, it seems that we would need to change the
>>>> accepted
>>>>>>>>> format of the config. Instead of "offset", "timestamp",
>>>>> "<header-key>",
>>>>>>>>> we could replace the last one with "header=<header-key>".
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>> On 6/15/18 3:06 AM, Ted Yu wrote:
>>>>>>>>>> If selection exists in header, the selection should override the
>>>>> config
>>>>>>>>> value.
>>>>>>>>>> Cheers
>>>>>>>>>> -------- Original message --------From: Luis Cabral
>>>>>>>>> <luis_cab...@yahoo.com.INVALID> Date: 6/15/18  1:40 AM
>> (GMT-08:00)
>>>>> To:
>>>>>>>>> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log
>>>>>>> compaction
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> bq. Can the value be determined now ? My thinking is that what if
>>>>> there
>>>>>>>>> is a third compaction strategy proposed in the future ? We should
>>>>> guard
>>>>>>>>> against user unknowingly choosing the 'future' strategy.
>>>>>>>>>>
>>>>>>>>>> The idea is that the header name to use is flexible, which
>> protects
>>>>>>>>> current clients that may want to use this from having to adapt
>> their
>>>>>>>>> already existing header names (they can just specify a new name).
>>>>>>>>>>
>>>>>>>>>> bq. Shouldn't the selection in header have higher precedence over
>>>> the
>>>>>>>>> configuration ?
>>>>>>>>>>
>>>>>>>>>> Not sure what you mean here, could you clarify?
>>>>>>>>>>
>>>>>>>>>> bq. Please create JIRA if you haven't already.
>>>>>>>>>>
>>>>>>>>>> Done: https://issues.apache.org/jira/browse/KAFKA-7061
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Luís
>>>>>>>>>>
>>>>>>>>>>> On 11 Jun 2018, at 01:50, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> bq. When this configuration is set to anything other than
>>>> "*offset*"
>>>>>>> or
>>>>>>>>> "
>>>>>>>>>>> *timestamp*", then the record headers are scanned for a key
>>>> matching
>>>>>>>>> this
>>>>>>>>>>> value.
>>>>>>>>>>>
>>>>>>>>>>> Can the value be determined now ? My thinking is that what if
>>>> there
>>>>>>> is a
>>>>>>>>>>> third compaction strategy proposed in the future ? We should
>> guard
>>>>>>>>> against
>>>>>>>>>>> user unknowingly choosing the 'future' strategy.
>>>>>>>>>>>
>>>>>>>>>>> bq. If this header is found
>>>>>>>>>>>
>>>>>>>>>>> Shouldn't the selection in header have higher precedence over the
>>>>>>>>> configuration
>>>>>>>>>>> ?
>>>>>>>>>>>
>>>>>>>>>>> Please create JIRA if you haven't already.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
>>>>>>>>> <luis_cab...@yahoo.com.invalid>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> Any takers on having a look at this KIP and voting on it?
>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>> 280%3A+Enhanced+log+compaction
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Luis
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -- Guozhang
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to