To come back about a point Chris made:

1. Instead of the new "ConfigDef config(Map<String, String> props)" method,
> what would you think about adopting a similar approach as the framework
> uses with connectors, by adding a "Config validate(Map<String, String>
> props)" method that can perform custom validation outside of what can be
> performed by the ConfigDef's single-property-at-a-time validation? It may
> be a little heavyweight for use with this particular SMT, but it'd provide
> more flexibility for other SMT implementations and would mirror an API that
> developers targeting the framework are likely already familiar with.
>

The validate() + config() approach taken for Connectors doesn't quite work
for Transformations.

The default Connector.validate() basically just calls
`config().validate(connectorConfigs)` and returns a Config. Separately the
AbstractHeader also calls `config()` and uses the Config and ConfigDef to
build the ConfigInfos. The contract is not really defined well enough in
the javadoc, but this means a connector can use validate() to build the
ConfigDef which it will return from config().

For Transformations we don't really want validate() to perform validation
at all since ultimately the transformations config will be embedded in the
connector's and will be validated by the connector itself. It wouldn't be
harmful if it _did_ perform validation, just unnecessary. But having a
validate() method that ought not to validate() seems like a recipe for
confusion.

Also problematic is that there's no use for the Config returned from
Transformations.validate().

So I'm not convinced that the superficial similarity really gains anything.

Kind regards,

Tom

On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <tbent...@redhat.com> wrote:

> Hi,
>
> Hi all,
>
> Thanks for the discussion so far.
>
> It seems a bit weird to me that when configuring the Conditional SMT with
> a DSL you would use a concise, intuitive DSL for expressing the condition,
> but not for the transforms that it's guarding. It also seems natural, if
> you support this for conditionally applying SMTs, that you'd soon want to
> support the same thing for a generic filter transformer. Then, if you can
> configure a filter transformation using this DSL, it becomes odd that you
> can't do this for mapping transformations. I think it would be a mistake to
> go specifying an expression language for conditions when really that might
> just be part of a language for transformations.
>
> I think it would be possible today to write an SMT which allowed you to
> express the transformation in a DSL. Concretely, it's possible to imagine a
> single transformation taking a DSL something like this:
>
>   compose(
>     Flatten.Key(delimiter: ':'),
>     If(condition: TopicMatches(pattern: "blah-.*"),
>        then: Flatten.Value(delimiter: '/')))
>
> All I've really done here, beyond what was already proposed, is rename
> Conditional to If, imagine a couple more bits of syntax
> (SMTs being constructed by class name and named argument invocation syntax
> to support SMT constructors) and add a higher level compose() function for
> chaining SMTs (which could easily be replaced with brace delimited blocks).
>
> That may be a discussion we should have, but I think in _this_ KIP we
> should focus on the conditional part, since the above example hopefully
> shows that it would be possible to reuse it in a DSL if there was appetite
> for that.
>
> With that in mind, and since my original suggestion for an abbreviated
> config syntax didn't appeal to people, I propose that we stick with the
> existing norms for configuring this.
> The previous example would look like this:
>
> transformation: flattenKey,if
> transformation.flattenKey.type: Flatten.Key
> transformation.flattenKey.delimiter: :
> transformation.if.type: If
> transformation.if.condition.type: TopicMatches
> transformation.if.condition.pattern: blah-.*
> transformation.if.then: flattenValue
> transformation.if.then.flattenValue.type: Flatten.Value
> transformation.if.then.flattenValue.delimiter: /
>
> Also, I'm inclined to think we should stick to just supporting
> TopicMatches and Not, since we've not identified an actual need for
> 'has-header', 'or' and 'and'.
>
> An example of the usage of Not would look like this:
>
> transformation: flattenKey,if
> transformation.flattenKey.type: Flatten.Key
> transformation.flattenKey.delimiter: :
> transformation.if.type: If
> transformation.if.condition.type: Not
> transformation.if.condition.operand: startsWithBlah
> transformation.if.condition.operand.startsWithBlah.type: TopicMatches
> transformation.if.condition.operand.startsWithBlah.pattern: blah-.*
> transformation.if.then: flattenValue
> transformation.if.then.flattenValue.type: Flatten.Value
> transformation.if.then.flattenValue.delimiter: /
>
> Hopefully we can agree that this allows the conditional SMT part to make
> progress without getting bogged down.
>
> Thoughts? If this seems broadly reasonable, I'll update the KIP.
>
> Kind regards,
>
> Tom
>
> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <gun...@hibernate.org>
> wrote:
>
>> Hi all,
>>
>> Thanks a lot for this initiative, Tom!
>>
>> To shed some light, the use case where this first came up, were issues
>> we saw with SMTs being applied to the different topics produced by the
>> Debezium change data capture connectors. There are different kinds of
>> topics (for change data, schema history, heartbeats etc.) and the
>> record structure to expect may vary between those. Hence we saw issues
>> with SMTs like ExtractField, which for instance only should be applied
>> to all change data topics but not the other ones.
>>
>> I like the overall approach; for Debezium's purposes, the simple topic
>> matching and negation operators would be sufficient already. I agree
>> with Chris and would prefer one single condition attribute, which
>> contains a single condition or potentially a logical expression with
>> not, and, etc. I think it's less ambiguous, in particular when it
>> comes to ordering of the different conditions and determining their
>> precedence.
>>
>> Would love to see this feature in one or another way in Connect.
>>
>> Best,
>>
>> --Gunnar
>>
>>
>>
>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
>> tbent...@redhat.com>:
>> >
>> > Hi Chris and Sönke,
>> >
>> > Using the numbering from Chris's email...
>> >
>> > 1. That's a good point, I'll see what is needed to make that work.
>> >
>> > 2. I'm happy enough to add support for "and" and "or" as part of this
>> KIP
>> > if people can see a need for it.
>> >
>> > In a similar vein, I was wondering about whether it would be worthwhile
>> > having the equivalent of an "else" clause (what's in the KIP is
>> basically
>> > an "if" statement). Without support for "else" I think people would
>> often
>> > need two conditionals, with the condition of one being the negation of
>> the
>> > condition of another.
>> >
>> > 3. I can see the attraction of an expression language. The pros include
>> > being terse and familiar to programmers and potentially very flexible if
>> > that's needed in the future. I had a play and implemented it using ANTLR
>> > and it's not difficult to write a grammar and implement the functions
>> we've
>> > already discussed and get decent error messages when the expression is
>> > malformed. So on the one hand I quite like the idea. On the other hand
>> it
>> > feels like overkill for the use cases that have actually been
>> identified so
>> > far.
>> >
>> > @Sönke what do you make of the expression language idea?
>> >
>> > Kind regards,
>> >
>> > Tom
>> >
>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <chr...@confluent.io
>> >
>> > wrote:
>> >
>> > > Hi Tom,
>> > >
>> > > This looks great and I'd love to see the out-of-the-box SMTs become
>> even
>> > > more powerful with the improvements you've proposed! Got a few
>> remarks and
>> > > would be interested in your thoughts:
>> > >
>> > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
>> method,
>> > > what would you think about adopting a similar approach as the
>> framework
>> > > uses with connectors, by adding a "Config validate(Map<String, String>
>> > > props)" method that can perform custom validation outside of what can
>> be
>> > > performed by the ConfigDef's single-property-at-a-time validation? It
>> may
>> > > be a little heavyweight for use with this particular SMT, but it'd
>> provide
>> > > more flexibility for other SMT implementations and would mirror an
>> API that
>> > > developers targeting the framework are likely already familiar with.
>> > > 2. The possibility for adding the logical operators "and" and "or" is
>> > > mentioned, but only as a potential future change and not one proposed
>> by
>> > > this KIP. Why not include those operators sooner rather than later?
>> > > 3. The syntax for named conditions that are then referenced in logical
>> > > operators is... tricky. It took me a few attempts to grok the example
>> > > provided in the KIP after reading Sönke's question about the example
>> for
>> > > negation. What about a more sophisticated but less verbose syntax that
>> > > supports a single configuration for the condition, even with logical
>> > > operators? I'm thinking something like
>> > > "transforms.conditionalExtract.condition: not(has-header:my-header)"
>> > > instead of the "transforms.conditionalExtract.condition:
>> not:hasMyHeader"
>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
>> > > has-header:my-header" properties. If support for a logical "and" is
>> added,
>> > > this could then be expanded to something like
>> > > "transforms.conditionalExtract.condition: and(has-header(my-header),
>> > > not(topic-matches(my-prefix-.*)))". There would be additional
>> complexity
>> > > here with the need to escape parentheses and commas that are intended
>> to be
>> > > treated literally (as part of a header name, for example) instead of
>> as
>> > > part of the syntax for the condition itself, but a little additional
>> > > complexity for edge cases like that may be warranted if it heavily
>> reduces
>> > > complexity for the common cases. The rationale for the proposed
>> > > parentheses-based syntax here instead of what's mentioned in the KIP
>> > > (something like "and: <condition1>, <condition2>") is to help with
>> > > readability; we probably wouldn't need that with the approach of
>> naming
>> > > conditions via separate properties, but things may get a little nasty
>> with
>> > > literal conditions included there, especially with the possibility of
>> > > nesting. Finally, the shift in syntax here should make it easier to
>> handle
>> > > cases like the header value matching brought up by Sönke; it might
>> look
>> > > something like "header-matches(header-name, header-value-pattern)".
>> > >
>> > > Cheers,
>> > >
>> > > Chris
>> > >
>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tbent...@redhat.com>
>> wrote:
>> > >
>> > > > Hi Sönke,
>> > > >
>> > > > Thanks for taking a look.
>> > > >
>> > > > Let me answer in reverse order, since I think it might make more
>> sense
>> > > that
>> > > > way...
>> > > >
>> > > > Also, while writing that, I noticed that you have a version with and
>> > > > > without "name" for your transformation in the KIP:
>> > > > >
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > has-header:my-header
>> > > > > and
>> > > > > transforms.conditionalExtract.condition: has-header:my-header
>> > > > >
>> > > >
>> > > > The example
>> > > >     transforms.conditionalExtract.condition: has-header:my-header
>> > > > is a "has-header" condition (the prefix of the config value), which
>> will
>> > > > match records with a "my-header" header (given in the suffix).
>> > > >
>> > > > The other example given is:
>> > > >     transforms.conditionalExtract.condition: not:hasMyHeader
>> > > >     transforms.conditionalExtract.condition.hasMyHeader:
>> > > > has-header:my-header
>> > > > The root of the condition is a "not" condition (the prefix of the
>> value
>> > > for
>> > > > the transforms.conditionalExtract.condition key) of another named
>> > > condition
>> > > > called "hasMyHeader" (the suffix). Any name could be used for the
>> other
>> > > > condition. That other condition is configured at
>> > > > "transforms.conditionalExtract.condition.<conditionName>". That
>> condition
>> > > > is a "has-header" condition (the prefix), which will match records
>> with a
>> > > > "my-header" header (given in the suffix). So the "has-header:"
>> condition
>> > > > type would always require a suffix, as would the "not:" condition
>> type.
>> > > > Hypothetically you could have a "true" condition type (which would
>> not
>> > > > require a suffix), and the hypothetical binary conditions "and:" and
>> > > "or:"
>> > > > would require a pair of other condition names.
>> > > >
>> > > > So what's proposed is a scheme for encoding conditions where the
>> > > condition
>> > > > type is the prefix of the value of some "....condition" config key,
>> and
>> > > the
>> > > > optional suffix provides parameters for the condition. This makes
>> those
>> > > > parameters a bit inflexible, but is relatively succinct.
>> > > >
>> > > > This leads on to your first point. You're right that use cases might
>> > > appear
>> > > > which need other conditions, and we should make it flexible enough
>> to be
>> > > > able to cope with future use cases. On the other hand, I was
>> concerned
>> > > that
>> > > > we end up with something which is quite complicated to configure.
>> (There
>> > > > comes a point where it might makes more sense for the user to write
>> their
>> > > > own SMT).
>> > > >
>> > > > Just of the top of my head it might look like:
>> > > > >
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> type:has-header
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > > header-name:my-header
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > field-value:my-value
>> > > > >
>> > > >
>> > > > That won't work because the format is basically a
>> Properties/Map<String,
>> > > > String> and what you've suggested has duplicate keys.
>> > > >
>> > > > One thing I did briefly consider what the ability to treat
>> conditions as
>> > > > Configurable objects in their own right (opening up the possibility
>> of
>> > > > people supplying their own Conditions, just like they can supply
>> their
>> > > own
>> > > > SMTs). That might be configured something like this:
>> > > >
>> > > >     transforms.conditionalExtract.condition: not
>> > > >     transforms.conditionalExtract.condition.not.type: Not
>> > > >     transforms.conditionalExtract.condition.not.negated: foo
>> > > >     transforms.conditionalExtract.condition.foo.type:
>> HasHeaderWithValue
>> > > >     transforms.conditionalExtract.condition.foo.header: my-header
>> > > >     transforms.conditionalExtract.condition.foo.value: my-value
>> > > >
>> > > > I didn't propose that I couldn't see the use cases to justify this
>> kind
>> > > of
>> > > > complexity, especially as the common case would surely be matching
>> > > against
>> > > > topic name (to be honest I wasn't completely convinced by the need
>> for
>> > > > "has-header"). In the current version of the KIP that's just
>> > > >
>> > > >     transforms.conditionalExtract.condition: topic-matches:
>> my-prefix-.*
>> > > >
>> > > > but using the more flexible scheme that would require something
>> more like
>> > > > this:
>> > > >
>> > > >     transforms.conditionalExtract.condition: bar
>> > > >     transforms.conditionalExtract.condition.bar.type: TopicMatch
>> > > >     transforms.conditionalExtract.condition.bar.pattern:
>> my-prefix-.*
>> > > >
>> > > > If people know of use cases which would justify more
>> sophistication, I'm
>> > > > happy to reconsider.
>> > > >
>> > > > Thanks again for taking a look!
>> > > >
>> > > > Tom
>> > > >
>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
>> > > > <soenke.lie...@opencore.com.invalid> wrote:
>> > > >
>> > > > > Hi Tom,
>> > > > >
>> > > > > sounds useful to me, thanks for the KIP!
>> > > > > The only thought that I had while reading was that this will
>> probably
>> > > > raise
>> > > > > questions about more involved conditions fairly quickly. For
>> example
>> > > the
>> > > > > "has-header" will cause an appetite for conditions like
>> > > > > "this-header-has-that-value".
>> > > > > This would necessitate two parameters to be passed into the
>> condition,
>> > > > > which I think is not currently included in the KIP. I am not
>> saying add
>> > > > > this now, but might it make sense to discuss a concept of how that
>> > > might
>> > > > > look now, to avoid potential changes later on.
>> > > > >
>> > > > > Just of the top of my head it might look like:
>> > > > >
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> type:has-header
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > > header-name:my-header
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > field-value:my-value
>> > > > >
>> > > > >
>> > > > > Also, while writing that, I noticed that you have a version with
>> and
>> > > > > without "name" for your transformation in the KIP:
>> > > > >
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > has-header:my-header
>> > > > > and
>> > > > > transforms.conditionalExtract.condition: has-header:my-header
>> > > > >
>> > > > >
>> > > > > Is this intentional and the name is optional?
>> > > > >
>> > > > >
>> > > > > Best regards,
>> > > > >
>> > > > > Sönke
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tbent...@redhat.com>
>> wrote:
>> > > > > >
>> > > > > > Does anyone have any comments, feedback or thoughts about this?
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > > Tom
>> > > > > >
>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
>> tbent...@redhat.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > I've opened KIP-585 which is intended to provide a mechanism
>> to
>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful
>> for any
>> > > > > > > feedback:
>> > > > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
>> > > > > > >
>> > > > > > > Many thanks,
>> > > > > > >
>> > > > > > > Tom
>> > > > > > >
>> > > > >
>> > > >
>> > >
>>
>>

Reply via email to