Hi Andrew,

That works nicely enough for the proposal where the predicate is configured
directly on the transformation. But I thought there was more consensus
around the proposal to have the transformation configuration refer to a
predicate indirectly, defined with the ?predicates key. I guess an example
would look like this:

transforms: t2
transforms.t2?predicate: has-prefix
transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
transforms.t2.field: c1
?predicates: has-prefix
?predicates.has-prefix.type:
org.apache.kafka.connect.transforms.predicates.TopicNameMatch
?predicates.has-prefix.negate: true
?predicates.has-prefix.pattern: my-prefix-.*

I agree that this seems to reduce the chance of conflict to practically
nothing. I'm happy enough to go with this and I've updated the KIP
accordingly.

Assuming no one raises more concerns I'll start a vote soon.

Kind regards,

Tom

On Mon, May 11, 2020 at 10:54 AM Andrew Schofield <andrew_schofi...@live.com>
wrote:

> Hi,
> I have implemented some of this and configured some predicates and I
> prefer this slight modification of the ? syntax:
>
> transforms: t2
> transforms.t2?type:
> org.apache.kafka.connect.transforms.predicates.TopicNameMatch
> transforms.t2?negate: true
> transforms.t2?pattern: my-prefix-.*
> transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
> transforms.t2.field: c1
>
> So, "transforms.<alias>." introduces the config for the transformation,
> while "transforms.<alias>?" introduces the
> config for the predicate. The risk of a clash now is that someone has used
> a ? in the alias name, which is unlikely.
>
> Also, there's a slight typo in the Java definition of the Predicate
> interface. It should be:
>
> public interface Predicate<R extends ConnectRecord<R>> extends
> Configurable, Closeable
>
> Looks like discussion has died down so I wonder whether it's time to begin
> a vote.
>
> Cheers,
> Andrew
>
> On 06/05/2020, 08:57, "Andrew Schofield" <andrew_schofi...@live.com>
> wrote:
>
> >    Hi Tom,
> >    I think that either proposal is sufficiently readable and they both
> have elements of
> >    cryptic syntax. I remember talking to an engineering director once
> who said he didn't hire
> >    "curly braces people" any more. This is an interface for curly braces
> people whichever
> >    way we go here.
>
> >    I slightly prefer the predicates as a top-level concept rather than
> as a feature
> >    of the If transform. I also don't like the namespacing behaviour of
> the If transform which
> >    I guess works like variable declaration so each scope has its own
> namespace of transform
> >    aliases. I expect the following is valid but ill-advised.
>
> >        transforms: t1
> >        transforms.t1.type: If
> >        transforms.t1.!test.type: TopicNameMatch
> >        transforms.t1.!test.pattern: my-prefix-.*
> >        transforms.t1.then: t1
> >        transforms.t1.then.t1.type: ExtractField$Key
> >        transforms.t1.then.t1.field: c1
>
> >    I would love to see conditional application of transforms in SMT in
> Kafka 2.6.
> >    Let's see what other people think and get a consensus view on how to
> spell it.
>
> >    Cheers,
> >    Andrew
>
> >    On 05/05/2020, 15:20, "Tom Bentley" <tbent...@redhat.com> wrote:
>
> >        Hi,
>
> >        Again, reusing the existing numbering...
>
> >        3) I had exactly the same thought. The reason I didn't already
> rename it
> >        was because :
>
> >        * Filter is not exactly wrong (by default it just filters out all
> the
> >        messages)
> >        * Filter is a useful hint about how it's intended to be used.
>
> >        But I don't have a strong opinion, so I'm happy to rename it if
> people
> >        think that Drop is better.
>
> >        4) Putting the negation on the use site–rather than the
> declaration
> >        site–seems like a good idea, since it makes it easier to get
> 'else'-like
> >        behaviour.
>
> >        But, playing devil's advocate, I wondered how all this compares
> with the
> >        original proposal, so I went to the trouble of writing out a few
> examples
> >        to compare the two
>
> >        Example of a basic 'if'
> >        Current proposal:
> >        transforms: t2
> >        transforms.t2.?predicate: !has-my-prefix
> >        transforms.t2.type: ExtractField$Key
> >        transforms.t2.field: c1
> >        ?predicates: has-my-prefix
> >        ?predicates.has-my-prefix.type: TopicNameMatch
> >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> >        Original (or originalish, see notes below):
> >        transforms: t1
> >        transforms.t1.type: If
> >        transforms.t1.!test.type: TopicNameMatch
> >        transforms.t1.!test.pattern: my-prefix-.*
> >        transforms.t1.then: t2
> >        transforms.t1.then.t2.type: ExtractField$Key
> >        transforms.t1.then.t2.field: c1
>
> >        Example of 'if/else':
> >        Current:
> >        transforms: t2,t3
> >        transforms.t2.?predicate: !has-my-prefix
> >        transforms.t2.type: ExtractField$Key
> >        transforms.t2.field: c1
> >        transforms.t3.?predicate: has-my-prefix
> >        transforms.t3.type: ExtractField$Key
> >        transforms.t3.field: c2
> >        ?predicates: has-my-prefix
> >        ?predicates.has-my-prefix.type: TopicNameMatch
> >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> >        Original:
> >        transforms: t1
> >        transforms.t1.type: If
> >        transforms.t1.!test.type: TopicNameMatch
> >        transforms.t1.!test.pattern: my-prefix-.*
> >        transforms.t1.then: t2
> >        transforms.t1.then.t2.type: ExtractField$Key
> >        transforms.t1.then.t2.field: c1
> >        transforms.t1.else: t3
> >        transforms.t1.else.t3.type: ExtractField$Key
> >        transforms.t1.else.t3.field: c2
>
> >        Example of guarding a >1 SMT
> >        Current:
> >        transforms: t2,t3
> >        transforms.t2.?predicate: !has-my-prefix
> >        transforms.t2.type: ExtractField$Key
> >        transforms.t2.field: c1
> >        transforms.t3.?predicate: !has-my-prefix
> >        transforms.t3.type: ExtractField$Value
> >        transforms.t3.field: c2
> >        ?predicates: has-my-prefix
> >        ?predicates.has-my-prefix.type: TopicNameMatch
> >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> >        Original:
> >        transforms: t1
> >        transforms.t1.type: If
> >        transforms.t1.!test.type: TopicNameMatch
> >        transforms.t1.!test.pattern: my-prefix-.*
> >        transforms.t1.then: t2,t3
> >        transforms.t1.then.t2.type: ExtractField$Key
> >        transforms.t1.then.t2.field: c1
> >        transforms.t1.then.t3.type: ExtractField$Value
> >        transforms.t1.then.t3.field: c2
>
> >        Notes:
> >        * I'm assuming we would negate predicates in the Current proposal
> by
> >        prefixing the predicate name with !
> >        * I'm using "test" rather than "predicate" in the Original
> because it looks
> >        a bit nicer having the same number of letters as "then" and
> "else".
> >        * I'm negating the test in the Original by using "!test" rather
> than "test".
>
> >        Those notes aside, I'd make the following assertions:
>
> >        1. The Original examples require the same number of keys (or one
> less line
> >        in the last example).
> >        2. The Original examples are more guessable if you'd never seen
> this KIP
> >        before. The SMT name (If) and its config parameter names ('test'
> and 'then'
> >        and maybe also 'else') all give quite strong clues about what's
> happening.
> >        I'd also argue that using !test is less cryptic than
> '?predicate', since
> >        the purpose of the '!' fits with its use in most programming
> languages. The
> >        purpose of the '?' in '?predicate' is to reduce the chance of a
> naming
> >        collision, which is not something which people could guess.
> >        3. The Original examples cannot have config parameters naming
> conflicts
> >        with either Connectors or Transformations.
>
> >        Although I don't think it's a strong argument I would also add:
>
> >        4. Because the original syntax supports nesting of 'If' SMTs it's
> actually
> >        more powerful.
>
> >        So, I know people really didn't like the original suggestion, but
> if you
> >        accept these assertions then the current proposal is worse. Can
> anyone
> >        point to some objective ways in which the current proposal is
> better than
> >        the original one?
>
> >        If people really think the current proposal is better then I'm
> happy to go
> >        with that, call a vote and see if it has any traction with people
> outside
> >        this discussion. Ultimately I just want to get support for this
> feature
> >        into Kafka, however it's configured.
>
> >        Kind regards,
>
> >        Tom
>
> >        On Mon, May 4, 2020 at 12:35 PM Andrew Schofield <
> andrew_schofi...@live.com>
> >        wrote:
>
> >        > Hi Tom,
> >        > Referring back to Chris's points.
> >        >
> >        > 1) Exception handling - yes, agreed. This KIP can just say the
> exception
> >        > handling for
> >        > predicates is the same as transformations.
> >        >
> >        > 2) Worker plugins - agree too.
> >        >
> >        > 3) Filter SMT - If the Filter SMT is just becoming a
> transformation that
> >        > drops any records that
> >        > make it through the predicate, it's probably actually a Drop
> >        > transformation. If it's
> >        > not guarded with a predicate, it drops every record, but with a
> predicate
> >        > it just
> >        > drops records that match.
> >        >
> >        > 4) Syntax - I quite like the idea of a separate namespace for
> predicates,
> >        > just like for transformations.
> >        > I particularly like how you could have 2 transformations
> referencing the
> >        > same predicate and config.
> >        >
> >        > There is the scope for collision with connector configuration,
> but my
> >        > hunch is that
> >        > "predicates" is less risky that simply "predicate" since it
> would only
> >        > clash with a connector
> >        > that happened to have multiple predicates.
> >        >
> >        > But I think there is a non-zero chance that an existing
> transformation
> >        > might have used
> >        > "predicate" as a config. I have actually written a filter
> transformation
> >        > myself. It didn't use
> >        > "predicate" but it almost did. That's why I suggested
> "?predicate" which I
> >        > know is a bit
> >        > cryptic, but it's also not going to clash.
> >        >
> >        > Here are two serving suggestions:
> >        >
> >        > A) "?predicate" to name the predicate alias to use
> >        > transforms: t2
> >        > transforms.t2.?predicate: has-my-prefix
> >        > transforms.t2.type:ExtractField$Key
> >        > transforms.t2.field: c1
> >        >
> >        > B) "?" to name the predicate alias to use
> >        > transforms: t2
> >        > transforms.t2.?: has-my-prefix
> >        > transforms.t2.type:ExtractField$Key
> >        > transforms.t2.field: c1
> >        >
> >        >
> >        > One more question.
> >        >
> >        > Do people think the negation ought to be in the predicate
> config or in the
> >        > referring transformation
> >        > config? We might want to make:
> >        >
> >        >   predicate(a)=>transform(T1)
> >        >   predicate(negate(a))=>transform(T2)
> >        >
> >        > easier than having to use two separate predicate aliases with
> identical
> >        > configuration, except for
> >        > the negation.
> >        >
> >        > Thanks,
> >        > Andrew
> >        >
> >        > On 04/05/2020, 10:22, "Tom Bentley" <tbent...@redhat.com>
> wrote:
> >        >
> >        >     Hi Chris,
> >        >
> >        >     Thanks for the feedback. On points 1 and 2 I agree.
> >        >
> >        >     3. This is a great point. I'm happy to add this to the KIP
> too, but
> >        > note it
> >        >     removes the need to add validate(Map<String, String>) to
> both
> >        >     Transformation and Predicate. That doesn't mean we
> shouldn't still do
> >        > it,
> >        >     just that the change is not motivated by the features being
> added in
> >        > the
> >        >     KIP. So I've removed validate(Map<String, String>) for now.
> >        >
> >        >     4. Your proposal would further reduce the chance of
> Transformation
> >        > config
> >        >     conflict (because just the 'predicate' parameter might
> conflict), but
> >        > it
> >        >     also adds the chance of conflict with Connector parameters
> (all the
> >        >     'predicates.*' parameters). Connector implementations are
> far more
> >        > numerous
> >        >     than Transformation implementations, so the risk is
> proportionately
> >        >     greater. As you say we could try to reduce the chances of
> conflict
> >        > (e.g.
> >        >     '?predicate' parameter on transformations and '?predicates'
> prefix on
> >        >     connectors), but I'm still left with uneasy feelings. I'd
> like to hear
> >        >     about what others think before I change this one.
> >        >
> >        >     Cheers,
> >        >
> >        >     Tom
> >        >
> >        >
> >        >     On Mon, May 4, 2020 at 4:57 AM Christopher Egerton <
> >        > chr...@confluent.io>
> >        >     wrote:
> >        >
> >        >     > Hi all,
> >        >     >
> >        >     > This has certainly taken quite a turn! I wish we could
> get this done
> >        >     > without adding another pluggable interface but if the
> benefit is
> >        > that now
> >        >     > any SMT--new or pre-existing--can be conditionally
> applied, it seems
> >        > like
> >        >     > it might be worth it.
> >        >     >
> >        >     > As far as the proposed design goes, some thoughts:
> >        >     >
> >        >     > 1. There isn't a clear reason why exceptions thrown from
> predicates
> >        > should
> >        >     > be treated differently from exceptions thrown by
> transformations. We
> >        > have
> >        >     > reasonable error-tolerance mechanisms in the Connect
> framework; why
> >        >     > silently swallow errors, especially if the risk is
> publishing
> >        > potentially
> >        >     > corrupted data to Kafka or an external system because an
> SMT that
> >        > should
> >        >     > have been applied wasn't?
> >        >     >
> >        >     > 2. It might be worth noting that the interface is a
> worker plugin,
> >        > and that
> >        >     > presumably it would be loaded in the same way as other
> worker
> >        > plugins such
> >        >     > as converters, connectors, and REST extensions. This
> would include
> >        > aliasing
> >        >     > behavior that would allow users to specify predicates
> using their
> >        > simple
> >        >     > class names as long as no two predicate plugins with the
> same simple
> >        > name
> >        >     > were available on the worker.
> >        >     >
> >        >     > 3. Why would the Filter SMT explicitly take in a
> predicate in its
> >        >     > configuration if predicates can be applied to all SMTs?
> Just reading
> >        > the
> >        >     > idea for a filter SMT, it seemed like the behavior would
> be that the
> >        > SMT
> >        >     > would take in no configuration parameters and just
> blindly drop
> >        > everything
> >        >     > that it sees, but typical use would involve pairing it
> with a
> >        > predicate.
> >        >     >
> >        >     > 4. The question mark syntax seems a little cryptic, and
> combining the
> >        >     > configuration properties for the predicate and the
> transformation in
> >        > the
> >        >     > same namespace ("transforms.<name>.*) seems a little
> noisy. What do
> >        > you
> >        >     > think about allowing predicates to be configured in their
> own
> >        > namespace,
> >        >     > and referenced by name in a single "predicate" (or maybe
> >        > "?predicate" if we
> >        >     > really want to avoid risking backwards compatibility
> concerns)
> >        > property?
> >        >     > This would also enable them to be more easily reused
> across several
> >        > SMTs.
> >        >     >
> >        >     > For example, you might configure a worker with these
> properties
> >        > (assuming
> >        >     > plugin aliasing is supported for predicates in the same
> way that it
> >        > is for
> >        >     > transformations):
> >        >     >
> >        >     > transforms: t2
> >        >     > transforms.t2.predicate: has-my-prefix
> >        >     > transforms.t2.type:ExtractField$Key
> >        >     > transforms.t2.field: c1
> >        >     >
> >        >     > predicates: has-my-prefix
> >        >     > predicates.has-my-prefix.type: TopicNameMatch
> >        >     > predicates.has-my-prefix.negate: true
> >        >     > predicates.has-my-prefix.pattern: my-prefix-.*
> >        >     >
> >        >     > Excited to see how this is evolving and I think we're
> heading towards
> >        >     > something really useful for the Connect framework.
> >        >     >
> >        >     > Cheers,
> >        >     >
> >        >     > Chris
> >        >     >
> >        >     >
> >        >     > On Fri, May 1, 2020 at 9:51 AM Andrew Schofield <
> >        > andrew_schofi...@live.com
> >        >     > >
> >        >     > wrote:
> >        >     >
> >        >     > > Hi Tom,
> >        >     > > Looks good to me.
> >        >     > >
> >        >     > > Thanks,
> >        >     > > Andrew
> >        >     > >
> >        >     > > On 01/05/2020, 16:24, "Tom Bentley" <
> tbent...@redhat.com> wrote:
> >        >     > >
> >        >     > >     Hi,
> >        >     > >
> >        >     > >     I've updated the KIP to reflect recent discussions.
> Please
> >        > note the
> >        >     > >     following:
> >        >     > >
> >        >     > >     1. I've described a HasHeaderKey predicate rather
> than
> >        >     > HeaderKeyMatches
> >        >     > >     because at the moment Headers doesn't expose the
> whole set of
> >        >     > headers,
> >        >     > > only
> >        >     > >     those with a specific name. Exposing this would
> significantly
> >        >     > increase
> >        >     > > the
> >        >     > >     scope of this KIP but relatively little extra
> benefit.
> >        >     > >     2. I used
> org.apache.kafka.connect.transformer.predicates
> >        > rather than
> >        >     > >     org.apache.kafka.connect.predicates, since I
> thought it better
> >        >     > > reflected
> >        >     > >     the use of predicates within transforms. I'm
> flexible on this
> >        > one if
> >        >     > >     someone has a good case for the original name. For
> example the
> >        >     > original
> >        >     > >     package name would be more appropriate if we were
> expecting
> >        >     > Connectors
> >        >     > > to
> >        >     > >     make use of Predicates somehow.
> >        >     > >     3. I've dropped the special case of not needing to
> provide a
> >        > FQN for
> >        >     > >     predicate classes in that package, since this isn't
> something
> >        >     > > supported by
> >        >     > >     transformations themselves as far as I know. I like
> the idea,
> >        > but it
> >        >     > > seemed
> >        >     > >     inconsistent to need it for transformations but not
> for
> >        > predicates.
> >        >     > >     4. Chris, I've used your suggestions for a
> validate() method
> >        > for the
> >        >     > > extra
> >        >     > >     configurability needed. I've also added this to
> Predicate (even
> >        >     > though
> >        >     > > I
> >        >     > >     don't need it) for consistency with Connector and
> >        > Transformation.
> >        >     > >     5. For negating the result of a predicate I decided
> it was
> >        > clearer to
> >        >     > > just
> >        >     > >     have a "negate" config parameter, supported by all
> predicates,
> >        > than
> >        >     > to
> >        >     > >     reach for more punctuation.
> >        >     > >
> >        >     > >     I'd be grateful for any more feedback people might
> have.
> >        >     > >
> >        >     > >     Kind regards,
> >        >     > >
> >        >     > >     Tom
> >        >     > >
> >        >     > >     On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <
> >        >     > > andrew_schofi...@live.com>
> >        >     > >     wrote:
> >        >     > >
> >        >     > >     > Hi Tom,
> >        >     > >     > I think we should go for a consistent naming
> convention for
> >        > the
> >        >     > >     > predicates, maybe so they
> >        >     > >     > read nicely if you say "IF" first. Here's a
> suggestion for
> >        >     > > predicates that
> >        >     > >     > the KIP could introduce:
> >        >     > >     >  - TopicNameMatches
> >        >     > >     >  - HeaderKeyMatches
> >        >     > >     >  - RecordIsTombstone
> >        >     > >     >
> >        >     > >     > On naming, I also suggest using
> >        >     > >     > org.apache.kafka.connect.predicates.Predicate
> >        >     > >     > as the name for the interface.
> >        >     > >     >
> >        >     > >     > I hadn't settled on a preference for negation. I
> suppose
> >        > there are
> >        >     > > three
> >        >     > >     > results
> >        >     > >     > from evaluating a predicate - true, false or an
> exception
> >        > caught by
> >        >     > > Kafka
> >        >     > >     > Connect.
> >        >     > >     > I suggest the exception case is always a
> predicate fail so
> >        > the
> >        >     > > guarded
> >        >     > >     > transformation
> >        >     > >     > is skipped. Initially I'd preferred having the
> predicate be
> >        >     > > configured to
> >        >     > >     > return a
> >        >     > >     > negated result to give a "not match" behaviour.
> Thinking
> >        > again
> >        >     > now, I
> >        >     > >     > prefer something like:
> >        >     > >     >
> >        >     > >     > transforms: t1
> >        >     > >     > transforms.t1.?type: RecordIsTombstone
> >        >     > >     > transforms.t1.type:
> >        >     > > org.apache.kafka.connect.transforms.InsertField$Value
> >        >     > >     > transforms.t1.static.field: deleted
> >        >     > >     > transforms.t1.static.value: true
> >        >     > >     >
> >        >     > >     > The "?" means the transform guarded by the
> predicate runs in
> >        > the
> >        >     > > Predicate
> >        >     > >     > returns true.
> >        >     > >     >
> >        >     > >     > transforms: t2
> >        >     > >     > transforms.t2.?!type:
> >        >     > > org.apache.kafka.connect.predicates.TopicNameMatch
> >        >     > >     > transforms.t2.?regex: my-prefix-.*
> >        >     > >     > transforms.t2.type:
> >        >     > > org.apache.kafka.connect.transforms.ExtractField$Key
> >        >     > >     > transforms.t2.field: c1
> >        >     > >     >
> >        >     > >     > The "?!" means the transform guarded by the
> predicate runs
> >        > if the
> >        >     > > Predicate
> >        >     > >     > returns false.
> >        >     > >     >
> >        >     > >     > I think adding a Filter SMT that uses a Predicate
> is a nice
> >        > idea.
> >        >     > >     >
> >        >     > >     > Cheers,
> >        >     > >     > Andrew
> >        >     > >     >
> >        >     >
> >        >     >
> >        >
> >        >
>
>
>

Reply via email to