Re: [DISCUSS] KIP-585: Conditional SMT

Andrew Schofield Sun, 26 Apr 2020 06:54:44 -0700

Hi,
I'm suggesting that RecordPredicate be another pluggable interface that users
could conceivably implement on their own, and that the KIP introduces the
most likely ones as supplied implementations, much as there are SMT
implementations such as HoistField. I don't really see users implementing
their own RecordPredicate, but the supplied ones would probably have a common
interface and it seems a pity not to expose that as an external API.


SMT is essentially a toolkit for Kafka Connect and I think RecordPredicate
would be a nice minor enhancement and I'd be happy to contribute code.

In order to make it simpler to configure, I suggest that the class name 
configured
using "?type" would be assumed to be in package 
org.apache.kafka.connect.predicates
in the absence of a package name so that shorter names such as "TopicMatches"
could be used.

Cheers,
Andrew Schofield
IBM Event Streams

On 25/04/2020, 21:22, "Christopher Egerton" <[email protected]> wrote:

    Hi Andrew,

    I know a DSL seems like overkill, and I'm not attached to it by any means,
    but I do think it serves a vital purpose in that it allows people who don't
    know how or have the time to write Java code to act on data being processed
    by connectors.

    Are you proposing that the "RecordPredicate" be another pluggable interface
    that users could then implement on their own? Or, would this be purely a
    syntactic expansion to SMT configuration with some subset of hard-coded
    predicates provided by the framework? I think there's value in the latter,
    but the former doesn't seem like it'd bring much to the table as far as
    users are concerned, and for developers, it is an improvement, but not a
    major one.

    Cheers,

    Chris

    On Fri, Apr 24, 2020 at 12:04 PM Andrew Schofield <
    [email protected]> wrote:

    > I wonder whether we're getting a bit overcomplicated with this. I think 
all
    > that's required here is to add an optional guard predicate for a
    > Transformation.
    > The predicate cannot end the Transformation chain, but it can allow or
    > prevent a particular
    > Transformation from running.
    >
    > How about this as syntax?
    > transforms: extractInt
    > transforms.extractInt.?type:
    > org.apache.kafka.connect.predicates.TopicMatches
    > transforms.extractInt.?regex: my-prefix-.*
    > transforms.extractInt.type:
    > org.apache.kafka.connect.transforms.ExtractField$Key
    > transforms.extractInt.field: c1
    >
    > The idea is that a Transformation can have an optional RecordPredicate.
    > The RecordPredicate
    > can have configuration items, in a similar way to a Transformation. The
    > syntax of
    > using a '?' prefix to separate configuration for the RecordPredicate from
    > the configuration
    > for the Transformation could conceivably clash with an existing
    > Transformation
    > but the chances are minimal.
    >
    > This syntax doesn't allow for an 'else', but if the KIP offers say a
    > TopicMatches predicate
    > then that can be configured to return FALSE if the predicate matches.
    >
    > I feel that a DSL for SMTs is overkill. If you need something that
    > complex, it's
    > perhaps too complex for a transformation chain and it's really a streaming
    > application.
    >
    > Andrew Schofield
    > IBM Event Streams
    >
    > On 2020/04/08 21:39:31, Christopher Egerton <[email protected]> wrote:
    > > Hi Tom,
    > >
    > > With regards to the potential Transformation::validate method, I don't
    > > quite follow your objections. The AbstractHerder class, ConnectorConfig
    > > class, and embedding of ConfigDefs that happens with the two is all
    > > internal logic and we're allowed to modify it however we want, as long 
as
    > > it doesn't alter any public-facing APIs (and if it does, we just have to
    > > document those changes in a KIP). We don't have to embed the ConfigDef
    > for
    > > a transformation chain inside another ConfigDef if the API we want to
    > > present to our users doesn't play well with that part of the code base.
    > > Additionally, beyond aligning with the existing API provided for
    > > connectors, another advantage is that it becomes possible to validate
    > > properties for a configuration in the context of all other properties, 
so
    > > to be clear, it's not just about preserving what may be perceived as a
    > > superficial similarity and comes with real functional benefits that 
can't
    > > be provided (at least, not as easily) by dynamic construction of a
    > > ConfigDef object.
    > >
    > > As far as the new proposal goes, I hate to say it, but I think we're
    > stuck
    > > with the worst of both worlds here. Adding the new RecordPredicate
    > > interface seems like it defeats the whole purpose of SMTs, which is to
    > > allow manipulation of a connector's event stream by users who don't
    > > necessarily know how or have the time to write Java code of their own.
    > This
    > > is also why I'm in favor of adding a lightweight DSL for the condition;
    > > emphasizing readability for people who aren't very familiar with Connect
    > > and just want to get something going quickly should be a priority for
    > SMTs.
    > > But if that's not going to happen with this KIP, I'd still choose the
    > > simpler, less-flexible approach initially outlined, in order to keep
    > things
    > > simple for people creating connectors and try to let them accomplish 
what
    > > they want via configuration, not code.
    > >
    > > With regards to the question about where the line should be drawn and 
how
    > > much is too much and comparisons to other stream processing frameworks, 
I
    > > think the nature of SMTs draws the line quite nicely: you can only
    > process
    > > one message at a time. There's plenty of use cases out there for
    > > heavier-duty processing frameworks like Kafka Streams, with aggregate
    > > operations, joining of streams, expanding a single message into multiple
    > > messages, etc. With SMTs, none of this is possible; the general use case
    > is
    > > to filter and clean a stream of data. If any of the heavier-weight logic
    > > provided by, e.g., Streams, isn't required for a project, it should be
    > > possible to get along with just a collection of sink connectors, source
    > > connectors, a converter or two, and SMTs that smooth over any 
differences
    > > in data format between what source connectors produce and what sink
    > > connectors expect. This is why I'm comfortable suggesting heavy 
expansion
    > > of the out-of-the-box SMTs that we provide with Connect; as long as
    > they're
    > > reasonable to configure, they can greatly reduce the operational burden
    > on
    > > anyone running and/or using a Connect cluster since they might entirely
    > > replace additional services that would otherwise be required.
    > >
    > > All that said--this is a huge ask of someone who just wants to support
    > the
    > > SMT equivalent of an "if" statement, and it's totally understandable if
    > > that's too much to ask. The one concern I have left is that if we expand
    > > the SMT in the future, there become compatibility concerns since SMTs 
are
    > > pretty tightly-coupled with the worker on which they run (although
    > > technically, with some classpath/plugin path shuffling, you can run
    > > different versions of the out-of-the-box SMTs from the Connect worker on
    > > which they're run). Someone might write a connector config with the
    > > If/Conditional/Whatever SMT with a condition type that works on one
    > worker,
    > > but that doesn't work on another worker that's running an earlier 
version
    > > of Connect. This is why I'm in favor of adding extra predicates now
    > instead
    > > of later; if we're going to implement what has the potential to be a bit
    > of
    > > a Swiss army knife SMT with room for future expansion of configuration,
    > and
    > > we can think of a reasonable way to add that functionality now, it seems
    > > better for users and administrators of Connect to try to do that now
    > > instead of later.
    > >
    > > Cheers,
    > >
    > > Chris
    > >
    > > On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <[email protected]> wrote:
    > >
    > > > Since no one objected I've updated the KIP with the revised way to
    > > > configure this transformation.
    > > >
    > > > On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <[email protected]>
    > wrote:
    > > >
    > > > > To come back about a point Chris made:
    > > > >
    > > > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
    > > > method,
    > > > >> what would you think about adopting a similar approach as the
    > framework
    > > > >> uses with connectors, by adding a "Config validate(Map<String,
    > String>
    > > > >> props)" method that can perform custom validation outside of what
    > can be
    > > > >> performed by the ConfigDef's single-property-at-a-time validation?
    > It
    > > > may
    > > > >> be a little heavyweight for use with this particular SMT, but it'd
    > > > provide
    > > > >> more flexibility for other SMT implementations and would mirror an
    > API
    > > > >> that
    > > > >> developers targeting the framework are likely already familiar 
with.
    > > > >>
    > > > >
    > > > > The validate() + config() approach taken for Connectors doesn't 
quite
    > > > work
    > > > > for Transformations.
    > > > >
    > > > > The default Connector.validate() basically just calls
    > > > > `config().validate(connectorConfigs)` and returns a Config.
    > Separately
    > > > the
    > > > > AbstractHeader also calls `config()` and uses the Config and
    > ConfigDef to
    > > > > build the ConfigInfos. The contract is not really defined well
    > enough in
    > > > > the javadoc, but this means a connector can use validate() to build
    > the
    > > > > ConfigDef which it will return from config().
    > > > >
    > > > > For Transformations we don't really want validate() to perform
    > validation
    > > > > at all since ultimately the transformations config will be embedded
    > in
    > > > the
    > > > > connector's and will be validated by the connector itself. It
    > wouldn't be
    > > > > harmful if it _did_ perform validation, just unnecessary. But having
    > a
    > > > > validate() method that ought not to validate() seems like a recipe
    > for
    > > > > confusion.
    > > > >
    > > > > Also problematic is that there's no use for the Config returned from
    > > > > Transformations.validate().
    > > > >
    > > > > So I'm not convinced that the superficial similarity really gains
    > > > anything.
    > > > >
    > > > > Kind regards,
    > > > >
    > > > > Tom
    > > > >
    > > > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <[email protected]>
    > wrote:
    > > > >
    > > > >> Hi,
    > > > >>
    > > > >> Hi all,
    > > > >>
    > > > >> Thanks for the discussion so far.
    > > > >>
    > > > >> It seems a bit weird to me that when configuring the Conditional 
SMT
    > > > with
    > > > >> a DSL you would use a concise, intuitive DSL for expressing the
    > > > condition,
    > > > >> but not for the transforms that it's guarding. It also seems
    > natural, if
    > > > >> you support this for conditionally applying SMTs, that you'd soon
    > want
    > > > to
    > > > >> support the same thing for a generic filter transformer. Then, if
    > you
    > > > can
    > > > >> configure a filter transformation using this DSL, it becomes odd
    > that
    > > > you
    > > > >> can't do this for mapping transformations. I think it would be a
    > > > mistake to
    > > > >> go specifying an expression language for conditions when really 
that
    > > > might
    > > > >> just be part of a language for transformations.
    > > > >>
    > > > >> I think it would be possible today to write an SMT which allowed
    > you to
    > > > >> express the transformation in a DSL. Concretely, it's possible to
    > > > imagine a
    > > > >> single transformation taking a DSL something like this:
    > > > >>
    > > > >>   compose(
    > > > >>     Flatten.Key(delimiter: ':'),
    > > > >>     If(condition: TopicMatches(pattern: "blah-.*"),
    > > > >>        then: Flatten.Value(delimiter: '/')))
    > > > >>
    > > > >> All I've really done here, beyond what was already proposed, is
    > rename
    > > > >> Conditional to If, imagine a couple more bits of syntax
    > > > >> (SMTs being constructed by class name and named argument invocation
    > > > >> syntax to support SMT constructors) and add a higher level 
compose()
    > > > >> function for chaining SMTs (which could easily be replaced with
    > brace
    > > > >> delimited blocks).
    > > > >>
    > > > >> That may be a discussion we should have, but I think in _this_ KIP
    > we
    > > > >> should focus on the conditional part, since the above example
    > hopefully
    > > > >> shows that it would be possible to reuse it in a DSL if there was
    > > > appetite
    > > > >> for that.
    > > > >>
    > > > >> With that in mind, and since my original suggestion for an
    > abbreviated
    > > > >> config syntax didn't appeal to people, I propose that we stick with
    > the
    > > > >> existing norms for configuring this.
    > > > >> The previous example would look like this:
    > > > >>
    > > > >> transformation: flattenKey,if
    > > > >> transformation.flattenKey.type: Flatten.Key
    > > > >> transformation.flattenKey.delimiter: :
    > > > >> transformation.if.type: If
    > > > >> transformation.if.condition.type: TopicMatches
    > > > >> transformation.if.condition.pattern: blah-.*
    > > > >> transformation.if.then: flattenValue
    > > > >> transformation.if.then.flattenValue.type: Flatten.Value
    > > > >> transformation.if.then.flattenValue.delimiter: /
    > > > >>
    > > > >> Also, I'm inclined to think we should stick to just supporting
    > > > >> TopicMatches and Not, since we've not identified an actual need for
    > > > >> 'has-header', 'or' and 'and'.
    > > > >>
    > > > >> An example of the usage of Not would look like this:
    > > > >>
    > > > >> transformation: flattenKey,if
    > > > >> transformation.flattenKey.type: Flatten.Key
    > > > >> transformation.flattenKey.delimiter: :
    > > > >> transformation.if.type: If
    > > > >> transformation.if.condition.type: Not
    > > > >> transformation.if.condition.operand: startsWithBlah
    > > > >> transformation.if.condition.operand.startsWithBlah.type:
    > TopicMatches
    > > > >> transformation.if.condition.operand.startsWithBlah.pattern: blah-.*
    > > > >> transformation.if.then: flattenValue
    > > > >> transformation.if.then.flattenValue.type: Flatten.Value
    > > > >> transformation.if.then.flattenValue.delimiter: /
    > > > >>
    > > > >> Hopefully we can agree that this allows the conditional SMT part to
    > make
    > > > >> progress without getting bogged down.
    > > > >>
    > > > >> Thoughts? If this seems broadly reasonable, I'll update the KIP.
    > > > >>
    > > > >> Kind regards,
    > > > >>
    > > > >> Tom
    > > > >>
    > > > >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <[email protected]
    > >
    > > > >> wrote:
    > > > >>
    > > > >>> Hi all,
    > > > >>>
    > > > >>> Thanks a lot for this initiative, Tom!
    > > > >>>
    > > > >>> To shed some light, the use case where this first came up, were
    > issues
    > > > >>> we saw with SMTs being applied to the different topics produced by
    > the
    > > > >>> Debezium change data capture connectors. There are different kinds
    > of
    > > > >>> topics (for change data, schema history, heartbeats etc.) and the
    > > > >>> record structure to expect may vary between those. Hence we saw
    > issues
    > > > >>> with SMTs like ExtractField, which for instance only should be
    > applied
    > > > >>> to all change data topics but not the other ones.
    > > > >>>
    > > > >>> I like the overall approach; for Debezium's purposes, the simple
    > topic
    > > > >>> matching and negation operators would be sufficient already. I
    > agree
    > > > >>> with Chris and would prefer one single condition attribute, which
    > > > >>> contains a single condition or potentially a logical expression
    > with
    > > > >>> not, and, etc. I think it's less ambiguous, in particular when it
    > > > >>> comes to ordering of the different conditions and determining 
their
    > > > >>> precedence.
    > > > >>>
    > > > >>> Would love to see this feature in one or another way in Connect.
    > > > >>>
    > > > >>> Best,
    > > > >>>
    > > > >>> --Gunnar
    > > > >>>
    > > > >>>
    > > > >>>
    > > > >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
    > > > >>> [email protected]>:
    > > > >>> >
    > > > >>> > Hi Chris and Sönke,
    > > > >>> >
    > > > >>> > Using the numbering from Chris's email...
    > > > >>> >
    > > > >>> > 1. That's a good point, I'll see what is needed to make that
    > work.
    > > > >>> >
    > > > >>> > 2. I'm happy enough to add support for "and" and "or" as part of
    > this
    > > > >>> KIP
    > > > >>> > if people can see a need for it.
    > > > >>> >
    > > > >>> > In a similar vein, I was wondering about whether it would be
    > > > worthwhile
    > > > >>> > having the equivalent of an "else" clause (what's in the KIP is
    > > > >>> basically
    > > > >>> > an "if" statement). Without support for "else" I think people
    > would
    > > > >>> often
    > > > >>> > need two conditionals, with the condition of one being the
    > negation
    > > > of
    > > > >>> the
    > > > >>> > condition of another.
    > > > >>> >
    > > > >>> > 3. I can see the attraction of an expression language. The pros
    > > > include
    > > > >>> > being terse and familiar to programmers and potentially very
    > flexible
    > > > >>> if
    > > > >>> > that's needed in the future. I had a play and implemented it
    > using
    > > > >>> ANTLR
    > > > >>> > and it's not difficult to write a grammar and implement the
    > functions
    > > > >>> we've
    > > > >>> > already discussed and get decent error messages when the
    > expression
    > > > is
    > > > >>> > malformed. So on the one hand I quite like the idea. On the 
other
    > > > hand
    > > > >>> it
    > > > >>> > feels like overkill for the use cases that have actually been
    > > > >>> identified so
    > > > >>> > far.
    > > > >>> >
    > > > >>> > @Sönke what do you make of the expression language idea?
    > > > >>> >
    > > > >>> > Kind regards,
    > > > >>> >
    > > > >>> > Tom
    > > > >>> >
    > > > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
    > > > >>> [email protected]>
    > > > >>> > wrote:
    > > > >>> >
    > > > >>> > > Hi Tom,
    > > > >>> > >
    > > > >>> > > This looks great and I'd love to see the out-of-the-box SMTs
    > become
    > > > >>> even
    > > > >>> > > more powerful with the improvements you've proposed! Got a few
    > > > >>> remarks and
    > > > >>> > > would be interested in your thoughts:
    > > > >>> > >
    > > > >>> > > 1. Instead of the new "ConfigDef config(Map<String, String>
    > props)"
    > > > >>> method,
    > > > >>> > > what would you think about adopting a similar approach as the
    > > > >>> framework
    > > > >>> > > uses with connectors, by adding a "Config validate(Map<String,
    > > > >>> String>
    > > > >>> > > props)" method that can perform custom validation outside of
    > what
    > > > >>> can be
    > > > >>> > > performed by the ConfigDef's single-property-at-a-time
    > validation?
    > > > >>> It may
    > > > >>> > > be a little heavyweight for use with this particular SMT, but
    > it'd
    > > > >>> provide
    > > > >>> > > more flexibility for other SMT implementations and would
    > mirror an
    > > > >>> API that
    > > > >>> > > developers targeting the framework are likely already familiar
    > > > with.
    > > > >>> > > 2. The possibility for adding the logical operators "and" and
    > "or"
    > > > is
    > > > >>> > > mentioned, but only as a potential future change and not one
    > > > >>> proposed by
    > > > >>> > > this KIP. Why not include those operators sooner rather than
    > later?
    > > > >>> > > 3. The syntax for named conditions that are then referenced in
    > > > >>> logical
    > > > >>> > > operators is... tricky. It took me a few attempts to grok the
    > > > example
    > > > >>> > > provided in the KIP after reading Sönke's question about the
    > > > example
    > > > >>> for
    > > > >>> > > negation. What about a more sophisticated but less verbose
    > syntax
    > > > >>> that
    > > > >>> > > supports a single configuration for the condition, even with
    > > > logical
    > > > >>> > > operators? I'm thinking something like
    > > > >>> > > "transforms.conditionalExtract.condition:
    > > > not(has-header:my-header)"
    > > > >>> > > instead of the "transforms.conditionalExtract.condition:
    > > > >>> not:hasMyHeader"
    > > > >>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > has-header:my-header" properties. If support for a logical
    > "and" is
    > > > >>> added,
    > > > >>> > > this could then be expanded to something like
    > > > >>> > > "transforms.conditionalExtract.condition:
    > > > and(has-header(my-header),
    > > > >>> > > not(topic-matches(my-prefix-.*)))". There would be additional
    > > > >>> complexity
    > > > >>> > > here with the need to escape parentheses and commas that are
    > > > >>> intended to be
    > > > >>> > > treated literally (as part of a header name, for example)
    > instead
    > > > of
    > > > >>> as
    > > > >>> > > part of the syntax for the condition itself, but a little
    > > > additional
    > > > >>> > > complexity for edge cases like that may be warranted if it
    > heavily
    > > > >>> reduces
    > > > >>> > > complexity for the common cases. The rationale for the 
proposed
    > > > >>> > > parentheses-based syntax here instead of what's mentioned in
    > the
    > > > KIP
    > > > >>> > > (something like "and: <condition1>, <condition2>") is to help
    > with
    > > > >>> > > readability; we probably wouldn't need that with the approach
    > of
    > > > >>> naming
    > > > >>> > > conditions via separate properties, but things may get a 
little
    > > > >>> nasty with
    > > > >>> > > literal conditions included there, especially with the
    > possibility
    > > > of
    > > > >>> > > nesting. Finally, the shift in syntax here should make it
    > easier to
    > > > >>> handle
    > > > >>> > > cases like the header value matching brought up by Sönke; it
    > might
    > > > >>> look
    > > > >>> > > something like "header-matches(header-name,
    > header-value-pattern)".
    > > > >>> > >
    > > > >>> > > Cheers,
    > > > >>> > >
    > > > >>> > > Chris
    > > > >>> > >
    > > > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <
    > [email protected]>
    > > > >>> wrote:
    > > > >>> > >
    > > > >>> > > > Hi Sönke,
    > > > >>> > > >
    > > > >>> > > > Thanks for taking a look.
    > > > >>> > > >
    > > > >>> > > > Let me answer in reverse order, since I think it might make
    > more
    > > > >>> sense
    > > > >>> > > that
    > > > >>> > > > way...
    > > > >>> > > >
    > > > >>> > > > Also, while writing that, I noticed that you have a version
    > with
    > > > >>> and
    > > > >>> > > > > without "name" for your transformation in the KIP:
    > > > >>> > > > >
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > has-header:my-header
    > > > >>> > > > > and
    > > > >>> > > > > transforms.conditionalExtract.condition:
    > has-header:my-header
    > > > >>> > > > >
    > > > >>> > > >
    > > > >>> > > > The example
    > > > >>> > > >     transforms.conditionalExtract.condition:
    > has-header:my-header
    > > > >>> > > > is a "has-header" condition (the prefix of the config 
value),
    > > > >>> which will
    > > > >>> > > > match records with a "my-header" header (given in the
    > suffix).
    > > > >>> > > >
    > > > >>> > > > The other example given is:
    > > > >>> > > >     transforms.conditionalExtract.condition: not:hasMyHeader
    > > > >>> > > >     transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > > has-header:my-header
    > > > >>> > > > The root of the condition is a "not" condition (the prefix
    > of the
    > > > >>> value
    > > > >>> > > for
    > > > >>> > > > the transforms.conditionalExtract.condition key) of another
    > named
    > > > >>> > > condition
    > > > >>> > > > called "hasMyHeader" (the suffix). Any name could be used
    > for the
    > > > >>> other
    > > > >>> > > > condition. That other condition is configured at
    > > > >>> > > > "transforms.conditionalExtract.condition.<conditionName>".
    > That
    > > > >>> condition
    > > > >>> > > > is a "has-header" condition (the prefix), which will match
    > > > records
    > > > >>> with a
    > > > >>> > > > "my-header" header (given in the suffix). So the
    > "has-header:"
    > > > >>> condition
    > > > >>> > > > type would always require a suffix, as would the "not:"
    > condition
    > > > >>> type.
    > > > >>> > > > Hypothetically you could have a "true" condition type (which
    > > > would
    > > > >>> not
    > > > >>> > > > require a suffix), and the hypothetical binary conditions
    > "and:"
    > > > >>> and
    > > > >>> > > "or:"
    > > > >>> > > > would require a pair of other condition names.
    > > > >>> > > >
    > > > >>> > > > So what's proposed is a scheme for encoding conditions where
    > the
    > > > >>> > > condition
    > > > >>> > > > type is the prefix of the value of some "....condition"
    > config
    > > > >>> key, and
    > > > >>> > > the
    > > > >>> > > > optional suffix provides parameters for the condition. This
    > makes
    > > > >>> those
    > > > >>> > > > parameters a bit inflexible, but is relatively succinct.
    > > > >>> > > >
    > > > >>> > > > This leads on to your first point. You're right that use
    > cases
    > > > >>> might
    > > > >>> > > appear
    > > > >>> > > > which need other conditions, and we should make it flexible
    > > > enough
    > > > >>> to be
    > > > >>> > > > able to cope with future use cases. On the other hand, I was
    > > > >>> concerned
    > > > >>> > > that
    > > > >>> > > > we end up with something which is quite complicated to
    > configure.
    > > > >>> (There
    > > > >>> > > > comes a point where it might makes more sense for the user 
to
    > > > >>> write their
    > > > >>> > > > own SMT).
    > > > >>> > > >
    > > > >>> > > > Just of the top of my head it might look like:
    > > > >>> > > > >
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> type:has-header
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > > header-name:my-header
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > field-value:my-value
    > > > >>> > > > >
    > > > >>> > > >
    > > > >>> > > > That won't work because the format is basically a
    > > > >>> Properties/Map<String,
    > > > >>> > > > String> and what you've suggested has duplicate keys.
    > > > >>> > > >
    > > > >>> > > > One thing I did briefly consider what the ability to treat
    > > > >>> conditions as
    > > > >>> > > > Configurable objects in their own right (opening up the
    > > > >>> possibility of
    > > > >>> > > > people supplying their own Conditions, just like they can
    > supply
    > > > >>> their
    > > > >>> > > own
    > > > >>> > > > SMTs). That might be configured something like this:
    > > > >>> > > >
    > > > >>> > > >     transforms.conditionalExtract.condition: not
    > > > >>> > > >     transforms.conditionalExtract.condition.not.type: Not
    > > > >>> > > >     transforms.conditionalExtract.condition.not.negated: foo
    > > > >>> > > >     transforms.conditionalExtract.condition.foo.type:
    > > > >>> HasHeaderWithValue
    > > > >>> > > >     transforms.conditionalExtract.condition.foo.header:
    > my-header
    > > > >>> > > >     transforms.conditionalExtract.condition.foo.value:
    > my-value
    > > > >>> > > >
    > > > >>> > > > I didn't propose that I couldn't see the use cases to 
justify
    > > > this
    > > > >>> kind
    > > > >>> > > of
    > > > >>> > > > complexity, especially as the common case would surely be
    > > > matching
    > > > >>> > > against
    > > > >>> > > > topic name (to be honest I wasn't completely convinced by 
the
    > > > need
    > > > >>> for
    > > > >>> > > > "has-header"). In the current version of the KIP that's just
    > > > >>> > > >
    > > > >>> > > >     transforms.conditionalExtract.condition: topic-matches:
    > > > >>> my-prefix-.*
    > > > >>> > > >
    > > > >>> > > > but using the more flexible scheme that would require
    > something
    > > > >>> more like
    > > > >>> > > > this:
    > > > >>> > > >
    > > > >>> > > >     transforms.conditionalExtract.condition: bar
    > > > >>> > > >     transforms.conditionalExtract.condition.bar.type:
    > TopicMatch
    > > > >>> > > >     transforms.conditionalExtract.condition.bar.pattern:
    > > > >>> my-prefix-.*
    > > > >>> > > >
    > > > >>> > > > If people know of use cases which would justify more
    > > > >>> sophistication, I'm
    > > > >>> > > > happy to reconsider.
    > > > >>> > > >
    > > > >>> > > > Thanks again for taking a look!
    > > > >>> > > >
    > > > >>> > > > Tom
    > > > >>> > > >
    > > > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
    > > > >>> > > > <[email protected]> wrote:
    > > > >>> > > >
    > > > >>> > > > > Hi Tom,
    > > > >>> > > > >
    > > > >>> > > > > sounds useful to me, thanks for the KIP!
    > > > >>> > > > > The only thought that I had while reading was that this
    > will
    > > > >>> probably
    > > > >>> > > > raise
    > > > >>> > > > > questions about more involved conditions fairly quickly.
    > For
    > > > >>> example
    > > > >>> > > the
    > > > >>> > > > > "has-header" will cause an appetite for conditions like
    > > > >>> > > > > "this-header-has-that-value".
    > > > >>> > > > > This would necessitate two parameters to be passed into 
the
    > > > >>> condition,
    > > > >>> > > > > which I think is not currently included in the KIP. I am
    > not
    > > > >>> saying add
    > > > >>> > > > > this now, but might it make sense to discuss a concept of
    > how
    > > > >>> that
    > > > >>> > > might
    > > > >>> > > > > look now, to avoid potential changes later on.
    > > > >>> > > > >
    > > > >>> > > > > Just of the top of my head it might look like:
    > > > >>> > > > >
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> type:has-header
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > > header-name:my-header
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > field-value:my-value
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > > > Also, while writing that, I noticed that you have a 
version
    > > > with
    > > > >>> and
    > > > >>> > > > > without "name" for your transformation in the KIP:
    > > > >>> > > > >
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > has-header:my-header
    > > > >>> > > > > and
    > > > >>> > > > > transforms.conditionalExtract.condition:
    > has-header:my-header
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > > > Is this intentional and the name is optional?
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > > > Best regards,
    > > > >>> > > > >
    > > > >>> > > > > Sönke
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <
    > [email protected]>
    > > > >>> wrote:
    > > > >>> > > > > >
    > > > >>> > > > > > Does anyone have any comments, feedback or thoughts 
about
    > > > this?
    > > > >>> > > > > >
    > > > >>> > > > > > Thanks,
    > > > >>> > > > > >
    > > > >>> > > > > > Tom
    > > > >>> > > > > >
    > > > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
    > > > >>> [email protected]>
    > > > >>> > > > > wrote:
    > > > >>> > > > > >
    > > > >>> > > > > > > Hi,
    > > > >>> > > > > > >
    > > > >>> > > > > > > I've opened KIP-585 which is intended to provide a
    > > > mechanism
    > > > >>> to
    > > > >>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be
    > grateful
    > > > >>> for any
    > > > >>> > > > > > > feedback:
    > > > >>> > > > > > >
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > >
    > > > >>> > >
    > > > >>>
    > > >
    > 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
    > > > >>> > > > > > >
    > > > >>> > > > > > > Many thanks,
    > > > >>> > > > > > >
    > > > >>> > > > > > > Tom
    > > > >>> > > > > > >
    > > > >>> > > > >
    > > > >>> > > >
    > > > >>> > >
    > > > >>>
    > > > >>>
    > > >
    > >
    >

Re: [DISCUSS] KIP-585: Conditional SMT

Reply via email to