Hi, I have implemented some of this and configured some predicates and I prefer this slight modification of the ? syntax:
transforms: t2 transforms.t2?type: org.apache.kafka.connect.transforms.predicates.TopicNameMatch transforms.t2?negate: true transforms.t2?pattern: my-prefix-.* transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key transforms.t2.field: c1 So, "transforms.<alias>." introduces the config for the transformation, while "transforms.<alias>?" introduces the config for the predicate. The risk of a clash now is that someone has used a ? in the alias name, which is unlikely. Also, there's a slight typo in the Java definition of the Predicate interface. It should be: public interface Predicate<R extends ConnectRecord<R>> extends Configurable, Closeable Looks like discussion has died down so I wonder whether it's time to begin a vote. Cheers, Andrew On 06/05/2020, 08:57, "Andrew Schofield" <andrew_schofi...@live.com> wrote: > Hi Tom, > I think that either proposal is sufficiently readable and they both have > elements of > cryptic syntax. I remember talking to an engineering director once who > said he didn't hire > "curly braces people" any more. This is an interface for curly braces > people whichever > way we go here. > I slightly prefer the predicates as a top-level concept rather than as a > feature > of the If transform. I also don't like the namespacing behaviour of the If > transform which > I guess works like variable declaration so each scope has its own > namespace of transform > aliases. I expect the following is valid but ill-advised. > transforms: t1 > transforms.t1.type: If > transforms.t1.!test.type: TopicNameMatch > transforms.t1.!test.pattern: my-prefix-.* > transforms.t1.then: t1 > transforms.t1.then.t1.type: ExtractField$Key > transforms.t1.then.t1.field: c1 > I would love to see conditional application of transforms in SMT in Kafka > 2.6. > Let's see what other people think and get a consensus view on how to spell > it. > Cheers, > Andrew > On 05/05/2020, 15:20, "Tom Bentley" <tbent...@redhat.com> wrote: > Hi, > Again, reusing the existing numbering... > 3) I had exactly the same thought. The reason I didn't already rename > it > was because : > * Filter is not exactly wrong (by default it just filters out all the > messages) > * Filter is a useful hint about how it's intended to be used. > But I don't have a strong opinion, so I'm happy to rename it if people > think that Drop is better. > 4) Putting the negation on the use site–rather than the declaration > site–seems like a good idea, since it makes it easier to get > 'else'-like > behaviour. > But, playing devil's advocate, I wondered how all this compares with > the > original proposal, so I went to the trouble of writing out a few > examples > to compare the two > Example of a basic 'if' > Current proposal: > transforms: t2 > transforms.t2.?predicate: !has-my-prefix > transforms.t2.type: ExtractField$Key > transforms.t2.field: c1 > ?predicates: has-my-prefix > ?predicates.has-my-prefix.type: TopicNameMatch > ?predicates.has-my-prefix.pattern: my-prefix-.* > Original (or originalish, see notes below): > transforms: t1 > transforms.t1.type: If > transforms.t1.!test.type: TopicNameMatch > transforms.t1.!test.pattern: my-prefix-.* > transforms.t1.then: t2 > transforms.t1.then.t2.type: ExtractField$Key > transforms.t1.then.t2.field: c1 > Example of 'if/else': > Current: > transforms: t2,t3 > transforms.t2.?predicate: !has-my-prefix > transforms.t2.type: ExtractField$Key > transforms.t2.field: c1 > transforms.t3.?predicate: has-my-prefix > transforms.t3.type: ExtractField$Key > transforms.t3.field: c2 > ?predicates: has-my-prefix > ?predicates.has-my-prefix.type: TopicNameMatch > ?predicates.has-my-prefix.pattern: my-prefix-.* > Original: > transforms: t1 > transforms.t1.type: If > transforms.t1.!test.type: TopicNameMatch > transforms.t1.!test.pattern: my-prefix-.* > transforms.t1.then: t2 > transforms.t1.then.t2.type: ExtractField$Key > transforms.t1.then.t2.field: c1 > transforms.t1.else: t3 > transforms.t1.else.t3.type: ExtractField$Key > transforms.t1.else.t3.field: c2 > Example of guarding a >1 SMT > Current: > transforms: t2,t3 > transforms.t2.?predicate: !has-my-prefix > transforms.t2.type: ExtractField$Key > transforms.t2.field: c1 > transforms.t3.?predicate: !has-my-prefix > transforms.t3.type: ExtractField$Value > transforms.t3.field: c2 > ?predicates: has-my-prefix > ?predicates.has-my-prefix.type: TopicNameMatch > ?predicates.has-my-prefix.pattern: my-prefix-.* > Original: > transforms: t1 > transforms.t1.type: If > transforms.t1.!test.type: TopicNameMatch > transforms.t1.!test.pattern: my-prefix-.* > transforms.t1.then: t2,t3 > transforms.t1.then.t2.type: ExtractField$Key > transforms.t1.then.t2.field: c1 > transforms.t1.then.t3.type: ExtractField$Value > transforms.t1.then.t3.field: c2 > Notes: > * I'm assuming we would negate predicates in the Current proposal by > prefixing the predicate name with ! > * I'm using "test" rather than "predicate" in the Original because it > looks > a bit nicer having the same number of letters as "then" and "else". > * I'm negating the test in the Original by using "!test" rather than > "test". > Those notes aside, I'd make the following assertions: > 1. The Original examples require the same number of keys (or one less > line > in the last example). > 2. The Original examples are more guessable if you'd never seen this > KIP > before. The SMT name (If) and its config parameter names ('test' and > 'then' > and maybe also 'else') all give quite strong clues about what's > happening. > I'd also argue that using !test is less cryptic than '?predicate', > since > the purpose of the '!' fits with its use in most programming > languages. The > purpose of the '?' in '?predicate' is to reduce the chance of a naming > collision, which is not something which people could guess. > 3. The Original examples cannot have config parameters naming conflicts > with either Connectors or Transformations. > Although I don't think it's a strong argument I would also add: > 4. Because the original syntax supports nesting of 'If' SMTs it's > actually > more powerful. > So, I know people really didn't like the original suggestion, but if > you > accept these assertions then the current proposal is worse. Can anyone > point to some objective ways in which the current proposal is better > than > the original one? > If people really think the current proposal is better then I'm happy > to go > with that, call a vote and see if it has any traction with people > outside > this discussion. Ultimately I just want to get support for this feature > into Kafka, however it's configured. > Kind regards, > Tom > On Mon, May 4, 2020 at 12:35 PM Andrew Schofield > <andrew_schofi...@live.com> > wrote: > > Hi Tom, > > Referring back to Chris's points. > > > > 1) Exception handling - yes, agreed. This KIP can just say the > exception > > handling for > > predicates is the same as transformations. > > > > 2) Worker plugins - agree too. > > > > 3) Filter SMT - If the Filter SMT is just becoming a transformation > that > > drops any records that > > make it through the predicate, it's probably actually a Drop > > transformation. If it's > > not guarded with a predicate, it drops every record, but with a > predicate > > it just > > drops records that match. > > > > 4) Syntax - I quite like the idea of a separate namespace for > predicates, > > just like for transformations. > > I particularly like how you could have 2 transformations referencing > the > > same predicate and config. > > > > There is the scope for collision with connector configuration, but my > > hunch is that > > "predicates" is less risky that simply "predicate" since it would > only > > clash with a connector > > that happened to have multiple predicates. > > > > But I think there is a non-zero chance that an existing > transformation > > might have used > > "predicate" as a config. I have actually written a filter > transformation > > myself. It didn't use > > "predicate" but it almost did. That's why I suggested "?predicate" > which I > > know is a bit > > cryptic, but it's also not going to clash. > > > > Here are two serving suggestions: > > > > A) "?predicate" to name the predicate alias to use > > transforms: t2 > > transforms.t2.?predicate: has-my-prefix > > transforms.t2.type:ExtractField$Key > > transforms.t2.field: c1 > > > > B) "?" to name the predicate alias to use > > transforms: t2 > > transforms.t2.?: has-my-prefix > > transforms.t2.type:ExtractField$Key > > transforms.t2.field: c1 > > > > > > One more question. > > > > Do people think the negation ought to be in the predicate config or > in the > > referring transformation > > config? We might want to make: > > > > predicate(a)=>transform(T1) > > predicate(negate(a))=>transform(T2) > > > > easier than having to use two separate predicate aliases with > identical > > configuration, except for > > the negation. > > > > Thanks, > > Andrew > > > > On 04/05/2020, 10:22, "Tom Bentley" <tbent...@redhat.com> wrote: > > > > Hi Chris, > > > > Thanks for the feedback. On points 1 and 2 I agree. > > > > 3. This is a great point. I'm happy to add this to the KIP too, > but > > note it > > removes the need to add validate(Map<String, String>) to both > > Transformation and Predicate. That doesn't mean we shouldn't > still do > > it, > > just that the change is not motivated by the features being > added in > > the > > KIP. So I've removed validate(Map<String, String>) for now. > > > > 4. Your proposal would further reduce the chance of > Transformation > > config > > conflict (because just the 'predicate' parameter might > conflict), but > > it > > also adds the chance of conflict with Connector parameters (all > the > > 'predicates.*' parameters). Connector implementations are far > more > > numerous > > than Transformation implementations, so the risk is > proportionately > > greater. As you say we could try to reduce the chances of > conflict > > (e.g. > > '?predicate' parameter on transformations and '?predicates' > prefix on > > connectors), but I'm still left with uneasy feelings. I'd like > to hear > > about what others think before I change this one. > > > > Cheers, > > > > Tom > > > > > > On Mon, May 4, 2020 at 4:57 AM Christopher Egerton < > > chr...@confluent.io> > > wrote: > > > > > Hi all, > > > > > > This has certainly taken quite a turn! I wish we could get > this done > > > without adding another pluggable interface but if the benefit > is > > that now > > > any SMT--new or pre-existing--can be conditionally applied, it > seems > > like > > > it might be worth it. > > > > > > As far as the proposed design goes, some thoughts: > > > > > > 1. There isn't a clear reason why exceptions thrown from > predicates > > should > > > be treated differently from exceptions thrown by > transformations. We > > have > > > reasonable error-tolerance mechanisms in the Connect > framework; why > > > silently swallow errors, especially if the risk is publishing > > potentially > > > corrupted data to Kafka or an external system because an SMT > that > > should > > > have been applied wasn't? > > > > > > 2. It might be worth noting that the interface is a worker > plugin, > > and that > > > presumably it would be loaded in the same way as other worker > > plugins such > > > as converters, connectors, and REST extensions. This would > include > > aliasing > > > behavior that would allow users to specify predicates using > their > > simple > > > class names as long as no two predicate plugins with the same > simple > > name > > > were available on the worker. > > > > > > 3. Why would the Filter SMT explicitly take in a predicate in > its > > > configuration if predicates can be applied to all SMTs? Just > reading > > the > > > idea for a filter SMT, it seemed like the behavior would be > that the > > SMT > > > would take in no configuration parameters and just blindly drop > > everything > > > that it sees, but typical use would involve pairing it with a > > predicate. > > > > > > 4. The question mark syntax seems a little cryptic, and > combining the > > > configuration properties for the predicate and the > transformation in > > the > > > same namespace ("transforms.<name>.*) seems a little noisy. > What do > > you > > > think about allowing predicates to be configured in their own > > namespace, > > > and referenced by name in a single "predicate" (or maybe > > "?predicate" if we > > > really want to avoid risking backwards compatibility concerns) > > property? > > > This would also enable them to be more easily reused across > several > > SMTs. > > > > > > For example, you might configure a worker with these properties > > (assuming > > > plugin aliasing is supported for predicates in the same way > that it > > is for > > > transformations): > > > > > > transforms: t2 > > > transforms.t2.predicate: has-my-prefix > > > transforms.t2.type:ExtractField$Key > > > transforms.t2.field: c1 > > > > > > predicates: has-my-prefix > > > predicates.has-my-prefix.type: TopicNameMatch > > > predicates.has-my-prefix.negate: true > > > predicates.has-my-prefix.pattern: my-prefix-.* > > > > > > Excited to see how this is evolving and I think we're heading > towards > > > something really useful for the Connect framework. > > > > > > Cheers, > > > > > > Chris > > > > > > > > > On Fri, May 1, 2020 at 9:51 AM Andrew Schofield < > > andrew_schofi...@live.com > > > > > > > wrote: > > > > > > > Hi Tom, > > > > Looks good to me. > > > > > > > > Thanks, > > > > Andrew > > > > > > > > On 01/05/2020, 16:24, "Tom Bentley" <tbent...@redhat.com> > wrote: > > > > > > > > Hi, > > > > > > > > I've updated the KIP to reflect recent discussions. > Please > > note the > > > > following: > > > > > > > > 1. I've described a HasHeaderKey predicate rather than > > > HeaderKeyMatches > > > > because at the moment Headers doesn't expose the whole > set of > > > headers, > > > > only > > > > those with a specific name. Exposing this would > significantly > > > increase > > > > the > > > > scope of this KIP but relatively little extra benefit. > > > > 2. I used org.apache.kafka.connect.transformer.predicates > > rather than > > > > org.apache.kafka.connect.predicates, since I thought it > better > > > > reflected > > > > the use of predicates within transforms. I'm flexible on > this > > one if > > > > someone has a good case for the original name. For > example the > > > original > > > > package name would be more appropriate if we were > expecting > > > Connectors > > > > to > > > > make use of Predicates somehow. > > > > 3. I've dropped the special case of not needing to > provide a > > FQN for > > > > predicate classes in that package, since this isn't > something > > > > supported by > > > > transformations themselves as far as I know. I like the > idea, > > but it > > > > seemed > > > > inconsistent to need it for transformations but not for > > predicates. > > > > 4. Chris, I've used your suggestions for a validate() > method > > for the > > > > extra > > > > configurability needed. I've also added this to > Predicate (even > > > though > > > > I > > > > don't need it) for consistency with Connector and > > Transformation. > > > > 5. For negating the result of a predicate I decided it > was > > clearer to > > > > just > > > > have a "negate" config parameter, supported by all > predicates, > > than > > > to > > > > reach for more punctuation. > > > > > > > > I'd be grateful for any more feedback people might have. > > > > > > > > Kind regards, > > > > > > > > Tom > > > > > > > > On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield < > > > > andrew_schofi...@live.com> > > > > wrote: > > > > > > > > > Hi Tom, > > > > > I think we should go for a consistent naming > convention for > > the > > > > > predicates, maybe so they > > > > > read nicely if you say "IF" first. Here's a suggestion > for > > > > predicates that > > > > > the KIP could introduce: > > > > > - TopicNameMatches > > > > > - HeaderKeyMatches > > > > > - RecordIsTombstone > > > > > > > > > > On naming, I also suggest using > > > > > org.apache.kafka.connect.predicates.Predicate > > > > > as the name for the interface. > > > > > > > > > > I hadn't settled on a preference for negation. I > suppose > > there are > > > > three > > > > > results > > > > > from evaluating a predicate - true, false or an > exception > > caught by > > > > Kafka > > > > > Connect. > > > > > I suggest the exception case is always a predicate > fail so > > the > > > > guarded > > > > > transformation > > > > > is skipped. Initially I'd preferred having the > predicate be > > > > configured to > > > > > return a > > > > > negated result to give a "not match" behaviour. > Thinking > > again > > > now, I > > > > > prefer something like: > > > > > > > > > > transforms: t1 > > > > > transforms.t1.?type: RecordIsTombstone > > > > > transforms.t1.type: > > > > org.apache.kafka.connect.transforms.InsertField$Value > > > > > transforms.t1.static.field: deleted > > > > > transforms.t1.static.value: true > > > > > > > > > > The "?" means the transform guarded by the predicate > runs in > > the > > > > Predicate > > > > > returns true. > > > > > > > > > > transforms: t2 > > > > > transforms.t2.?!type: > > > > org.apache.kafka.connect.predicates.TopicNameMatch > > > > > transforms.t2.?regex: my-prefix-.* > > > > > transforms.t2.type: > > > > org.apache.kafka.connect.transforms.ExtractField$Key > > > > > transforms.t2.field: c1 > > > > > > > > > > The "?!" means the transform guarded by the predicate > runs > > if the > > > > Predicate > > > > > returns false. > > > > > > > > > > I think adding a Filter SMT that uses a Predicate is a > nice > > idea. > > > > > > > > > > Cheers, > > > > > Andrew > > > > > > > > > > > > > > >