Hi Jorge,

I really appreciate the effort you've made to simplify the syntax and
feature set of a JSONPath-based approach as much as possible. I'm still
hesitant to continue with it, though.

1. The syntax is much less friendly. Just compare "top.mid.bottom" to
"$['top']['mid']['bottom']"... not everyone uses JSONPath or even JSON, and
the learning curve for the former is going to be steeper. The examples in
the new KIP draft you published demonstrate this pretty well, and this is
without diving into the details of what escape syntax would look like.

2. The three new major features that this syntax adds (array accesses, deep
scans, and multi-value paths) could all be added pretty easily to the dot
notation syntax without introducing brackets and dollar signs. Array
accesses can be described using the same syntax as struct/map field access,
deep scans can be described using '*', and multi-value paths can be
described by referencing the name of a field that's expected to have
children. These are all top-of-the-head ideas and can probably be refined,
but hopefully they demonstrate that we can keep the syntax simple without
sacrificing features. Of course, the question of leaving room for future
features might arise... given that these are the out-of-the-box SMTs that
are likely to be the first that many people encounter, I'd err on the side
of simplicity and a gentle learning curve; if people need to get more out
of transforms, the option to implement their own is still there. If we can
address 95% of use cases with something easy to use, it's not worth making
the feature harder for everyone to use just to accommodate the remaining 5%.

3. The advantage of leveraging an existing syntax is twofold: users who are
already familiar with that syntax don't need to learn a new syntax, and
maintainers of the syntax get to leverage existing libraries and
documentation for the syntax. With the current proposal, reusing libraries
is off the table, which means that we're going to have to parse this
ourselves (including all the escape syntax edge cases), and we won't be
able to automatically leverage new features added to that syntax. And given
how stripped-down the syntax is in comparison to full JSONPath, there's
still going to be a learning curve for users who are already familiar with
it, and we'll still have to document how Connect's variant of JSONPath
works either instead of or in addition to just linking to a well-maintained
third-party docs site.

On the topic of "field.path" and "include.path" vs. "field.style", I
actually think that a single property per SMT is cleaner and simpler.
Allowing users to mix and match styles within the same SMT config seems
like a recipe for confusion, and with a single property that dictates field
syntax behavior, we leave the door open to change the default at a later
date. We could even fully remove support for plain field notation at some
point and still be able to retain the simple property names of "field",
"include", etc. instead of forcing people to use "field.path" and the like.
That said, the term "field.style" and the permitted values might be a
little ambiguous. One alternative, though a little heavy-handed, is to
change it to "field.syntax.version" with permitted values of "V1" (default,
equivalent to "field.style = plain") and "V2" (equivalent to "field.style =
nested"). This would leave us room in the future to make further changes to
the syntax without having to come up with new names, although it does
sacrifice a little bit in that the permitted values are no longer
self-describing.

Cheers,

Chris

On Sun, May 15, 2022 at 4:51 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Thank you all for your feedback, and sorry for the long wait for a reply.
>
> I would like to explore the idea of JSONPath-inspired/subset notation a bit
> further:
>
> It will need to be a much-reduced version of JSONPath:
> - No full support for JsonPath therefore an additional dependency.
> - All paths must start with '$'
> - No functions support or other operators allowed.
> - JsonPath dotted and square-bracket notations can be supported to avoid
> escaping dots or other characters: `$a.b.c` and `$['a.b']['c']` - or we
> could only support the second one as it's more complete.
> - Add support for arrays with `[<integer>]`, e.g. `$a.[1].b`
> - Add support for multiple-value paths using array access `$a.*.b` or deep
> scan `$a..b`.
>     - Some SMTs that will benefit from this: `MaskField`, `Cast`,
> `ReplaceField`.
> - We could introduce a `path[s]` config under the current configurations to
> apply this feature so no compatibility issues are introduced: e.g.
> `field.path`, `fields.paths`, `exclude.path`.
>
> With these,  100, 101, 102, and 103 will be effectively solved.
>
> The main challenge that I see at the moment is that by being JSON
> path-like, there may be some edge cases that I can't foresee at the moment,
> that could make this hard to implement, test, and maintain in the long run.
>
> I'll appreciate your feedback on how this JSONPath-based alternative
> compares to the dotted notation initially proposed, and if it matches your
> feedback.
> I have drafted a copy of the KIP to change the examples to JSONPath
> approach and validate some differences:
> https://cwiki.apache.org/confluence/x/9BihD
>
> About 104. I agreed with Chris. We can handle this as part of a new KIP.
>
> Thanks,
> Jorge.
>
> On Sun, 24 Apr 2022 at 17:27, Chris Egerton <fearthecel...@gmail.com>
> wrote:
>
> > Hi Joshua,
> >
> > I have a few reservations about using JsonPath notation here.
> >
> > 1. There's likely to be a substantial performance penalty for converting
> > between the Kafka Connect format and something that a JsonPath library
> > would understand.
> >
> > 2. The complexity of the feature will be significantly higher. It will be
> > harder to test, document, and implement. There will be many more edge
> cases
> > to consider and support, and we'll be on the hook to handle any bugs or
> > inconsistencies that arise, either as a result of our use of the JsonPath
> > library we choose, or as a result of bugs in that library.
> >
> > 3. It's not clear that JsonPath is superior or even suitable for some of
> > the SMTs proposed here. What would be the advantages of JsonPath with the
> > InsertField or HoistField SMTs?
> >
> > I also don't think that adding dot notation is unfriendly to users; many
> > have proposed this type of syntax in the past, and it's frequently used
> in
> > informal discussions to refer to nested fields. If the proposed syntax
> was
> > not already intuitive then a case against deciding on our own might be
> more
> > convincing, but as things stand, simple dot notation is likely going to
> be
> > easier for users to understand than JsonPath syntax.
> >
> > Cheers,
> >
> > Chris
> >
> > On Fri, Apr 22, 2022 at 9:31 AM Joshua Grisham <grishamj...@gmail.com>
> > wrote:
> >
> > > Hello all!
> > >
> > > Sorry that I come a bit later to the party here, but I am the one who
> > wrote
> > > KIP-683 [1] for recursive support (just simply looping through all
> child
> > > non-primitive structures for the same matching name(s)) which is a
> > slightly
> > > different way to try and solve a similar requirement -- unfortunately
> at
> > > the time the dev community was not quite as active and then I also got
> > busy
> > > with work and just life in general so wasn't able to follow up or push
> > it.
> > >
> > > I do think it is a very good idea to have some kind of path-like
> > expression
> > > to be able to specifically address a nested field, as I can see that
> the
> > > simple "recursive" case could potentially result in unwanted or
> > unexpected
> > > behavior, plus there is the potential to introduce a bit of a
> performance
> > > hit to always loop through everything in cases where the schemas/values
> > > might be quite large.
> > >
> > > One thing I wanted to ask: instead of creating a new "path parser"
> > > including some bespoke or "borrowed" syntax, why not just use something
> > > that already exists? Specifically here I am thinking about JsonPath (
> > > https://github.com/json-path/JsonPath)
> > >
> > > There is already quite nice support in JsonPath for handling special
> > > characters in field names, for handling different non-primitive types
> > > (arrays etc), for handling multiple levels of nesting, etc etc.  Would
> it
> > > be possible to instead to re-think this and maybe have some kind of
> > > JsonPath-based Schema selector / updater and/or JsonPath-based Value
> > > selector / updater? Conceptually this feels like it makes sense to me,
> as
> > > from the top of my head it would be quite a natural fit to map a Json
> > data
> > > structure to the Connect API data structure (and you could potentially
> > even
> > > try to leverage the existing Json-to-Connect serializer/deserializer to
> > > help out with this even in a more "out of the box"-feeling kind of
> way).
> > >
> > > Maybe also as Tom mentioned, this part (in my example, this
> > JsonPath-based
> > > "thing") could even be a generic API that could be used by any SMT,
> > > including used in custom ones built by the community.  Then I think to
> > use
> > > a completely separate config property somehow related to "path" (as Tom
> > > also mentioned) would also make a lot of sense here as well. This way,
> if
> > > you select based on "path" then this JsonPath-based API would be used,
> > > otherwise it could use something similar to the existing get-field
> based
> > > approach (which I guess could also be refactored into some kind of
> > utility
> > > / API as well if it made sense?)
> > >
> > > And with that in mind, if this was the kind of direction to go, then a
> > > "recursive" capability like I pitched in KIP-683 would also become
> > > unnecessary because you could easily write a JsonPath expression like
> > > "$..someRecuriveField" and it would do the same thing (on top of
> anything
> > > else you would want to do that is already supported by JsonPath). Then
> we
> > > could also kill that older KIP and do a bit of clean-up :)
> > >
> > > [1] -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-683%3A+Add+recursive+support+to+Connect+Cast+and+ReplaceField+transforms%2C+and+support+for+casting+complex+types+to+either+a+native+or+JSON+string
> > >
> > > Just some extra food for thought. All-in-all I think this is a super
> > great
> > > initiative!
> > >
> > > Best,
> > >
> > > Joshua
> > >
> > >
> > > Den fre 22 apr. 2022 kl 14:50 skrev Chris Egerton <
> > fearthecel...@gmail.com
> > > >:
> > >
> > > > Hi Tom,
> > > >
> > > > Thanks for taking a look at this, and for your thoughtful comments.
> > I'll
> > > > leave it up to Jorge to address most of your comments but I wanted to
> > > share
> > > > a couple quick thoughts I had regarding 103 and 104.
> > > >
> > > > 103. Like you, I was envisioning a possible syntax for array access
> > that
> > > > used classic C-style brackets; e.g., `arr[index]`. However, I wonder
> if
> > > we
> > > > could keep things simple and use the same syntax that we're proposing
> > for
> > > > nested field access? In other words, instead of `arr[index]`, you'd
> > write
> > > > `arr.index`. It'd save us and users the headache of reserving
> > characters
> > > > now that would need to be escaped even if their unescaped brethren
> > aren't
> > > > used for anything, and also avoid the question of what exactly we
> > should
> > > do
> > > > when we see a config that uses reserved characters that aren't yet
> > > > supported (throwing an exception seems pretty unfriendly for new
> > users).
> > > >
> > > > 104. This would probably be useful, but it would come with some nasty
> > > > compatibility questions that would need to be addressed if we'd want
> > SMTs
> > > > that leverage this new API to be viable for older versions of
> Connect.
> > If
> > > > we package and distribute this feature as a library (either via an
> > > entirely
> > > > new artifact, or as part of the existing connect-transforms or
> > > connect-api
> > > > artifacts), then we'd have to either sidestep the existing plugin
> > > isolation
> > > > logic [1] that basically makes it impossible for Connect plugins to
> > ship
> > > > their own versions of Connect artifacts, or issue a big warning to
> > people
> > > > that any SMT that uses this API won't work with any older versions of
> > > > Connect. There's also some other features we might want to add in an
> > > > SMT-utils library such as the existing, internal, utils that Connect
> > uses
> > > > right now [2]. It may be worth exploring this in a separate KIP of
> its
> > > own.
> > > >
> > > > [1] -
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/d480c4aa6e513e36050d8e067931de2270525d18/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/PluginUtils.java#L46-L143
> > > >
> > > > [2] -
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/tree/d480c4aa6e513e36050d8e067931de2270525d18/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/util
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Fri, Apr 22, 2022 at 6:55 AM Tom Bentley <tbent...@redhat.com>
> > wrote:
> > > >
> > > > > Hi Jorge,
> > > > >
> > > > > Thanks for the KIP, especially for the examples which are
> > super-clear.
> > > > >
> > > > > 100. The name `field.style` isn't so clear for something like
> > > > ReplaceField:
> > > > > it's not so obvious that field.style applies to `include` and
> > > `exclude`.
> > > > >
> > > > > 101. The permitted values for `field.style` don't seem terribly
> > > intuitive
> > > > > (to me anyway): the meaning of `plain` isn't very guessable. Why
> not
> > > > > `top-level` or `root` instead? Also `nested` could be misconstrued
> to
> > > > mean
> > > > > nested-but-not-top-level, so perhaps `recursive` or `cascading`
> might
> > > be
> > > > > better?
> > > > >
> > > > > 102. I'm torn on whether making the interpretation of existing
> > configs
> > > > like
> > > > > `field` be dependent on `field.style` is a good idea. I can see
> that
> > > it's
> > > > > the simplest thing to do, but it just feels a bit odd that
> sometimes
> > > the
> > > > > `field` would actually be a path and have different escaping rules.
> > An
> > > > > alternative would be to come up with a parallel set of config names
> > > (e.g.
> > > > > as well as "field" an SMT might support "path") which were defined
> to
> > > > > always take paths, thus avoiding the changeable interpretation of
> the
> > > > > existing configs. The SMT's #configure() would need to throw in the
> > > case
> > > > > that both configs were given. I can see that that would be more
> work
> > in
> > > > > implementation, but it feels cleaner.
> > > > >
> > > > > 103. I think in order to allow for supporting arrays in a later KIP
> > > > (which
> > > > > certainly seems like it could be useful), we'd want to specify the
> > > syntax
> > > > > now, even if it wasn't implemented under this KIP. That's because I
> > > don't
> > > > > think you can't exclude the possibility that characters such as `[`
> > and
> > > > `]`
> > > > > appear in field names. So you'd have a compatibility problem if
> > people
> > > > > started using the features of this KIP to access such fields, only
> > for
> > > > > those characters to change their meaning under a later KIP.
> > > > >
> > > > > 104. I also wonder whether making paths into a public Java API, for
> > use
> > > > by
> > > > > 3rd party SMTs, would be valuable.
> > > > >
> > > > > Thanks again,
> > > > >
> > > > > Tom
> > > > >
> > > > >
> > > > >
> > > > > On Wed, 20 Apr 2022 at 17:53, Chris Egerton <
> fearthecel...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > 💯 Thanks Jorge, LGTM!
> > > > > >
> > > > > > On Wed, Apr 20, 2022, 12:40 Jorge Esteban Quilcate Otoya <
> > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > >
> > > > > > > Thank you, Chris! Not possible without your feedback.
> > > > > > >
> > > > > > > On Tue, 19 Apr 2022 at 23:04, Chris Egerton <
> > > fearthecel...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jorge,
> > > > > > > >
> > > > > > > > Thank you for sticking through this. I have one small remark
> > and
> > > > one
> > > > > > > small
> > > > > > > > clarification; assuming you agree with me on them then I'm
> > ready
> > > to
> > > > > > vote
> > > > > > > on
> > > > > > > > the KIP.
> > > > > > > >
> > > > > > > > 1. InsertField: The "field.on.missing.parent" and
> > > > > > > "field.on.existing.field"
> > > > > > > > docs both mention a permitted value of "ingore"; this should
> be
> > > > > > "ignore",
> > > > > > > > right?
> > > > > > > >
> > > > > > >
> > > > > > > Of course, one more typo :)
> > > > > > >
> > > > > > >
> > > > > > > > 2. InsertField: The examples are still missing the
> > "field.style"
> > > > > > property
> > > > > > > > from the configurations. They should all include the property
> > > > > > > > "transforms.smt1.field.style": "nested", correct?
> > > > > > > >
> > > > > > >
> > > > > > > Yes, it is there. I think I know what you mean now, seems that
> > > > > Confluence
> > > > > > > is putting everything in one line when it's in separate lines
> in
> > > the
> > > > > > > editor.
> > > > > > > Hopefully, it's fixed now.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks again for working through this, and congratulations
> on a
> > > > > > > > well-written KIP!
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > > Chris
> > > > > > > >
> > > > > > > > On Tue, Apr 19, 2022 at 2:06 PM Jorge Esteban Quilcate Otoya
> <
> > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Thank you, Chris! I apply these improvements to the KIP,
> let
> > me
> > > > > know
> > > > > > > how
> > > > > > > > it
> > > > > > > > > looks now.
> > > > > > > > >
> > > > > > > > > On Mon, 11 Apr 2022 at 23:43, Chris Egerton <
> > > > > fearthecel...@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Jorge,
> > > > > > > > > >
> > > > > > > > > > Wow, those examples are great! A few more remarks, but I
> > > think
> > > > > > we're
> > > > > > > > > > getting close:
> > > > > > > > > >
> > > > > > > > > > 1. The examples differ across SMTs with the name of the
> > > > > > > > newly-introduced
> > > > > > > > > > style property; some of them use "field.style", and some
> > use
> > > > > > > > > > "fields.style". I think for consistency's sake we should
> > > stick
> > > > > with
> > > > > > > > just
> > > > > > > > > > "field.style"; otherwise it could be painful for users to
> > > have
> > > > to
> > > > > > > > > remember
> > > > > > > > > > which to use.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Great catch. Agree, I fixed the config names to
> > `field.style`.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. Some of the examples are off:
> > > > > > > > > > - TimestampConverter: the input in the second example
> > ("when
> > > > > field
> > > > > > > > names
> > > > > > > > > > include dots") doesn't contain a field with a dotted name
> > > > > > > > > > - ValueToKey: the config in the third example ("when
> field
> > > > names
> > > > > > > > include
> > > > > > > > > > dots") should probably use "parent..child.k2" as the
> > > > > > > > > > "transforms.smt1.fields" property
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Fixed. Thanks!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 3. RE changes to InsertField:
> > > > > > > > > > - The InsertField SMT should also come with the new
> > > > "field.style"
> > > > > > > > > property
> > > > > > > > > > in order to preserve backwards compatibility, right? I
> > don't
> > > > see
> > > > > it
> > > > > > > > > > included in the example configs for that one, just want
> to
> > > make
> > > > > > sure
> > > > > > > > > > - I don't know of any cases where we use snake_case for
> > > > property
> > > > > > > names
> > > > > > > > in
> > > > > > > > > > Kafka; we should probably use "on.missing.parent" and
> > > > > > > > "on.existing.field"
> > > > > > > > > > as the new property names for InsertField.
> > > > > > > > > > - Why is the "on_existing_field" (or "on.existing.field")
> > > > > property
> > > > > > > only
> > > > > > > > > > applied when the field style is nested? Couldn't it be
> > useful
> > > > for
> > > > > > > > > > non-nested fields as well?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Great points! I have applied these suggestions to the KIP.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > >
> > > > > > > > > > Chris
> > > > > > > > > >
> > > > > > > > > > On Sat, Apr 9, 2022 at 12:40 PM Jorge Esteban Quilcate
> > Otoya
> > > <
> > > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > Again, great feedback Chris. Much appreciated.
> > > > > > > > > > > Added my comments below:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 5 Apr 2022 at 20:22, Chris Egerton <
> > > > > > > fearthecel...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Jorge,
> > > > > > > > > > > >
> > > > > > > > > > > > Looking good! I have a few comments left but all but
> > one
> > > or
> > > > > two
> > > > > > > are
> > > > > > > > > > > minor.
> > > > > > > > > > > >
> > > > > > > > > > > > 1. The motivation section states "This KIP is aimed
> to
> > > > > include
> > > > > > > > > support
> > > > > > > > > > > for
> > > > > > > > > > > > nested structures on the existing SMTs... and to
> > include
> > > > the
> > > > > > > > > > abstractions
> > > > > > > > > > > > to reuse this in future SMTs". A good implementation
> of
> > > > this
> > > > > > KIP
> > > > > > > > will
> > > > > > > > > > > > definitely isolate reusable logic into a separate
> > > > abstraction
> > > > > > > that
> > > > > > > > > can
> > > > > > > > > > be
> > > > > > > > > > > > easily pulled in to the SMTs we want to add nested
> > field
> > > > > > support
> > > > > > > > to,
> > > > > > > > > > but
> > > > > > > > > > > > unless we plan on making this kind of abstraction
> > > publicly
> > > > > > > > available
> > > > > > > > > as
> > > > > > > > > > > > some kind of utility method or class that external
> SMT
> > > > > > developers
> > > > > > > > can
> > > > > > > > > > > > leverage, we can probably leave this part out as it's
> > > more
> > > > of
> > > > > > an
> > > > > > > > > > > > implementation detail.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Make sense, will leave this out of the KIP.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 2. The Cast example is a little misleading, isn't it?
> > It
> > > > > > > > demonstrates
> > > > > > > > > > the
> > > > > > > > > > > > escape syntax for fields with dot literals in their
> > > names,
> > > > > but
> > > > > > it
> > > > > > > > > > doesn't
> > > > > > > > > > > > demonstrate a way to actually use the Cast (or any
> > other)
> > > > SMT
> > > > > > to
> > > > > > > > > > access a
> > > > > > > > > > > > nested field in a record, which is the whole point of
> > the
> > > > > KIP.
> > > > > > I
> > > > > > > > like
> > > > > > > > > > the
> > > > > > > > > > > > example of escape syntax but we should probably also
> > add
> > > > one
> > > > > > for
> > > > > > > > > nested
> > > > > > > > > > > > field access.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Agree. I have added examples to each SMT to be more
> clear
> > > > about
> > > > > > how
> > > > > > > > it
> > > > > > > > > > > affects each function
> > > > > > > > > > > .
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 3. With the InsertField SMT, I'm wondering what the
> > > > specific
> > > > > > > > behavior
> > > > > > > > > > > will
> > > > > > > > > > > > be when some or all of the "middle layer" nested
> fields
> > > are
> > > > > > > > missing.
> > > > > > > > > > For
> > > > > > > > > > > > example, if I have a record with a value of { "k1":
> > "v1 }
> > > > > and I
> > > > > > > > apply
> > > > > > > > > > > > InsertField with topic.field = n1.n2.n3.topic, what
> > will
> > > > > > happen?
> > > > > > > > Will
> > > > > > > > > > the
> > > > > > > > > > > > updated value become { "k1": "v1", "n1": { "n2": {
> > "n3":
> > > > > > "topic"
> > > > > > > > }}},
> > > > > > > > > > > will
> > > > > > > > > > > > an exception be thrown, or something else? This seems
> > > > > analogous
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > command line mkdir command, which (at least on some
> > > > operating
> > > > > > > > > systems)
> > > > > > > > > > > > fails by default if you try to create a new nested
> > > > directory
> > > > > > > where
> > > > > > > > > > > anything
> > > > > > > > > > > > but the last element in the path doesn't exist, but
> can
> > > be
> > > > > > > invoked
> > > > > > > > > > with a
> > > > > > > > > > > > flag that instructs it to go ahead and create all
> > levels
> > > of
> > > > > > > nested
> > > > > > > > > > > > directory instead. I'm leaning on the side of "just
> > > create
> > > > > > > > > everything"
> > > > > > > > > > > but
> > > > > > > > > > > > would be interested in your thoughts, and either way,
> > we
> > > > > should
> > > > > > > > > > probably
> > > > > > > > > > > > make sure the intended behavior is well-defined
> before
> > > > > voting.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This is an interesting case, thanks for catching this!
> > > > > > > > > > > The default behavior I see is to create parents if they
> > are
> > > > > > missing
> > > > > > > > and
> > > > > > > > > > > overwrite fields
> > > > > > > > > > > if they already exist.
> > > > > > > > > > > I'm planning to include the following two flags if
> there
> > > is a
> > > > > > need
> > > > > > > to
> > > > > > > > > > > overwrite this behavior:
> > > > > > > > > > > - `on_missing_parent` = (CREATE|IGNORE), default=CREATE
> > > > > > > > > > > - `on_existing_field` = (OVERWRITE|IGNORE),
> > > default=OVERWRITE
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 4. Similarly, what will the behavior be if any of the
> > > field
> > > > > > > > elements
> > > > > > > > > > > > specified with InsertField already exist in the
> record
> > > > value?
> > > > > > > Will
> > > > > > > > we
> > > > > > > > > > > just
> > > > > > > > > > > > overwrite them? What's the behavior of InsertField
> > today
> > > > > under
> > > > > > > > > similar
> > > > > > > > > > > > circumstances?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The current behavior is to overwrite the value.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > >
> > > > > > > > > > > > Chris
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 31, 2022 at 4:15 PM Jorge Esteban
> Quilcate
> > > > Otoya
> > > > > <
> > > > > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks, Chris! Much appreciated all the feedback
> > here.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. You nailed it setting the design goal here: "it
> > > > > shouldn't
> > > > > > be
> > > > > > > > > > > > impossible
> > > > > > > > > > > > > to use this new feature for any field name, no
> matter
> > > how
> > > > > > > > > convoluted.
> > > > > > > > > > > > It's
> > > > > > > > > > > > > fine if edge cases introduce difficulty (such as
> > > > > > less-readable
> > > > > > > > > > > > > configurations), but it's not fine if they can't be
> > > > > addressed
> > > > > > > at
> > > > > > > > > > all."
> > > > > > > > > > > > > Back to the previous proposals (using only dots as
> > > > > > separators)
> > > > > > > we
> > > > > > > > > > have
> > > > > > > > > > > 2
> > > > > > > > > > > > > alternatives:
> > > > > > > > > > > > > 1. escaping with backslashes
> > > > > > > > > > > > > 2. escaping with dots itself
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'll lean for alternative 2, as you proposed
> before.
> > > > Feels
> > > > > to
> > > > > > > me
> > > > > > > > > that
> > > > > > > > > > > > > backslashes have more potential to lead to
> confusion
> > in
> > > > > JSON
> > > > > > > > > configs,
> > > > > > > > > > > and
> > > > > > > > > > > > > CSV seems like a good precedent to use the same
> > > character
> > > > > to
> > > > > > > > escape
> > > > > > > > > > > > itself.
> > > > > > > > > > > > > KIP is updated to reflect this.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. Thanks! I'll add an example, and stick with the
> > > > current
> > > > > > > > approach
> > > > > > > > > > > > > defining the style per individual transform
> > > > configuration.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 3. Yes, thanks! KIP updated.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 4. Of course. KIP updated.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 28 Mar 2022 at 21:59, Chris Egerton <
> > > > > > > > > fearthecel...@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Jorge,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for addressing my comments; the KIP looks
> > > > > up-to-date
> > > > > > > and
> > > > > > > > > > > pretty
> > > > > > > > > > > > > > readable now, and the rejected alternatives
> section
> > > > does
> > > > > a
> > > > > > > > great
> > > > > > > > > > job
> > > > > > > > > > > of
> > > > > > > > > > > > > > outlining the discussion so far and providing
> > context
> > > > for
> > > > > > > > anyone
> > > > > > > > > > else
> > > > > > > > > > > > who
> > > > > > > > > > > > > > might want to join in.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. Thoughts on choice of delimiter:
> > > > > > > > > > > > > > - I like the optimization for simple cases, but I
> > > think
> > > > > the
> > > > > > > new
> > > > > > > > > > > > proposal
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > a little too restrictive. What if there's a field
> > > whose
> > > > > > name
> > > > > > > > > > contains
> > > > > > > > > > > > all
> > > > > > > > > > > > > > of the permitted options (currently just ".",
> ",",
> > > and
> > > > > > "/")?
> > > > > > > > > > > > > > - If we expand the set of permitted delimiters to
> > > allow
> > > > > for
> > > > > > > any
> > > > > > > > > > > > > > single-character string, configuration complexity
> > > will
> > > > > > > increase
> > > > > > > > > and
> > > > > > > > > > > > > > readability may decrease
> > > > > > > > > > > > > > - Also worth pointing out that there is some
> > > convention
> > > > > for
> > > > > > > > > > doubling
> > > > > > > > > > > a
> > > > > > > > > > > > > > delimiter character as an escape mechanism with
> > > formats
> > > > > > like
> > > > > > > > CSV
> > > > > > > > > > [1]
> > > > > > > > > > > > > > - Overall I think we may be approaching the
> > > saturation
> > > > > > point
> > > > > > > > for
> > > > > > > > > > > > > productive
> > > > > > > > > > > > > > discussion on delimiter syntax so I don't want to
> > > spend
> > > > > too
> > > > > > > > much
> > > > > > > > > > more
> > > > > > > > > > > > of
> > > > > > > > > > > > > > your time on it. I think the one point I'd like
> to
> > > > leave
> > > > > > for
> > > > > > > > now
> > > > > > > > > is
> > > > > > > > > > > > that
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > shouldn't be impossible to use this new feature
> for
> > > any
> > > > > > field
> > > > > > > > > name,
> > > > > > > > > > > no
> > > > > > > > > > > > > > matter how convoluted. It's fine if edge cases
> > > > introduce
> > > > > > > > > difficulty
> > > > > > > > > > > > (such
> > > > > > > > > > > > > > as less-readable configurations), but it's not
> fine
> > > if
> > > > > they
> > > > > > > > can't
> > > > > > > > > > be
> > > > > > > > > > > > > > addressed at all.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2.
> > > > > > > > > > > > > > The configuration style where you define
> > > > > > > > "transforms.field.style"
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > connector config, and then this applies to all
> SMTs
> > > for
> > > > > the
> > > > > > > > > > > connector,
> > > > > > > > > > > > is
> > > > > > > > > > > > > > very interesting. However, it doesn't follow
> > > convention
> > > > > for
> > > > > > > > > > existing
> > > > > > > > > > > > > SMTs.
> > > > > > > > > > > > > > Right now, if you want to configure an SMT, you
> > > define
> > > > > its
> > > > > > > name
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > connector config (for example, "transforms":
> > "smt1"),
> > > > and
> > > > > > > then
> > > > > > > > > > define
> > > > > > > > > > > > all
> > > > > > > > > > > > > > of the properties for that SMT in the connector
> > > config
> > > > > > using
> > > > > > > a
> > > > > > > > > > > > > namespacing
> > > > > > > > > > > > > > mechanism specific to that SMT (for example,
> > > > > > > > > > "transforms.smt1.prop1":
> > > > > > > > > > > > > > "val1"). That SMT then sees only the properties
> > > defined
> > > > > in
> > > > > > > that
> > > > > > > > > > > > > namespace,
> > > > > > > > > > > > > > with the prefix stripped (for example, "prop1":
> > > "val1")
> > > > > in
> > > > > > > its
> > > > > > > > > > > > configure
> > > > > > > > > > > > > > [2] [3] method.
> > > > > > > > > > > > > > If we want to continue to follow this convention,
> > > then
> > > > > > > instead
> > > > > > > > of
> > > > > > > > > > > > > > specifying "transforms.field.style" in a
> connector
> > > > > config,
> > > > > > we
> > > > > > > > > would
> > > > > > > > > > > > > expect
> > > > > > > > > > > > > > users to configure
> "transforms.<name>.field.style",
> > > for
> > > > > > each
> > > > > > > > SMT
> > > > > > > > > > that
> > > > > > > > > > > > > they
> > > > > > > > > > > > > > want to configure a field style for. This would
> > > require
> > > > > > more
> > > > > > > > work
> > > > > > > > > > on
> > > > > > > > > > > > the
> > > > > > > > > > > > > > part of the user, but would be simpler to reason
> > > about
> > > > > and
> > > > > > > > easier
> > > > > > > > > > to
> > > > > > > > > > > > > > implement.
> > > > > > > > > > > > > > If we want to explore an alternative where users
> > can
> > > > > > specify
> > > > > > > > > global
> > > > > > > > > > > > > > properties that apply to all transforms in a
> > > connector
> > > > > > > config,
> > > > > > > > > then
> > > > > > > > > > > the
> > > > > > > > > > > > > > semantics for this need to be defined in the KIP.
> > > This
> > > > > > would
> > > > > > > > have
> > > > > > > > > > to
> > > > > > > > > > > > > > include whether this will apply only for the
> > special
> > > > case
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > > > > "field.style" and possibly "field.separator"
> > > properties
> > > > > or
> > > > > > if
> > > > > > > > it
> > > > > > > > > > > would
> > > > > > > > > > > > be
> > > > > > > > > > > > > > available more generally for other properties,
> > > whether
> > > > it
> > > > > > > will
> > > > > > > > > > apply
> > > > > > > > > > > > only
> > > > > > > > > > > > > > for the SMTs outlined in the KIP or if the
> > > > "field.style"
> > > > > > and
> > > > > > > > > > possibly
> > > > > > > > > > > > > > "field.separator" properties would also be passed
> > > into
> > > > > > custom
> > > > > > > > > SMTs
> > > > > > > > > > so
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > they could choose to act on them if applicable,
> how
> > > > edge
> > > > > > > cases
> > > > > > > > > like
> > > > > > > > > > > > > having
> > > > > > > > > > > > > > an SMT named "field" in your connector config
> would
> > > be
> > > > > > > handled,
> > > > > > > > > > etc.
> > > > > > > > > > > > > > Either way, it might help to have an example in
> the
> > > KIP
> > > > > > > > outlining
> > > > > > > > > > how
> > > > > > > > > > > > one
> > > > > > > > > > > > > > of the to-be-augmented SMTs can be configured
> with
> > > this
> > > > > new
> > > > > > > > > feature
> > > > > > > > > > > > and a
> > > > > > > > > > > > > > before/after of how a record value would be
> > > transformed
> > > > > > with
> > > > > > > > that
> > > > > > > > > > > > > > configuration.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 3. The docstring for the "transforms.field.style"
> > > > > property
> > > > > > > > > mentions
> > > > > > > > > > > > that
> > > > > > > > > > > > > > the permitted values are "plain" and "nested",
> but
> > > then
> > > > > > > > describes
> > > > > > > > > > > > > behavior
> > > > > > > > > > > > > > with a value of "root". Should that be "plain"
> > > instead?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 4. The docstring for the
> > "transforms.field.separator"
> > > > > > > property
> > > > > > > > > > > > > exclusively
> > > > > > > > > > > > > > mentions structs, but the feature is intended to
> > work
> > > > > with
> > > > > > > maps
> > > > > > > > > as
> > > > > > > > > > > > well.
> > > > > > > > > > > > > > Can we update it to reflect this?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > References:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1] - https://stackoverflow.com/a/17808731
> > > > > > > > > > > > > > [2] -
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/7243facb8d69a7252e6b9556b5eaee13e41bab7f/connect/api/src/main/java/org/apache/kafka/connect/transforms/Transformation.java#L30
> > > > > > > > > > > > > > [3] -
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/7243facb8d69a7252e6b9556b5eaee13e41bab7f/clients/src/main/java/org/apache/kafka/common/Configurable.java#L26-L29
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Chris
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Mar 28, 2022 at 1:32 PM Jorge Esteban
> > > Quilcate
> > > > > > Otoya
> > > > > > > <
> > > > > > > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks, Chris!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. I'd argue "this..field.child" could be
> harder
> > to
> > > > > grasp
> > > > > > > > than
> > > > > > > > > > > > > > > "this.field/child" + separator: "/".
> > > > > > > > > > > > > > > Even though this represents additional
> > information,
> > > > it
> > > > > > > > follows
> > > > > > > > > a
> > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > approach as the "Flatten#delimeter"
> > configuration.
> > > > > > > > > > > > > > > I want to give the separator approach another
> > try,
> > > > so I
> > > > > > > have
> > > > > > > > > > > updated
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > KIP with the separator proposal, sticking to
> > only 2
> > > > > > > > > alternatives
> > > > > > > > > > > that
> > > > > > > > > > > > > > > should hopefully cover most scenarios.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. Agree. KIP has been updated with this
> > > improvement.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 3. You're right. I have updated this section
> > > > > accordingly.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 4. Good catch! I've replaced it with
> > `DropHeaders`.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Looking forward to your feedback.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Jorge.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, 9 Mar 2022 at 21:33, Chris Egerton <
> > > > > > > > > > > fearthecel...@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Jorge,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Looking good! Got a few more thoughts.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1. Sorry to revisit this, but I think we may
> > want
> > > > to
> > > > > > > adopt
> > > > > > > > a
> > > > > > > > > > > > slightly
> > > > > > > > > > > > > > > > different escape syntax style. Backslashes
> are
> > > > great,
> > > > > > but
> > > > > > > > > since
> > > > > > > > > > > > > they're
> > > > > > > > > > > > > > > > already used by JSON, using them as an escape
> > > > > sequence
> > > > > > in
> > > > > > > > > field
> > > > > > > > > > > > > > notation
> > > > > > > > > > > > > > > > would also lead to some pretty ugly connector
> > > > > configs.
> > > > > > > > Anyone
> > > > > > > > > > > who's
> > > > > > > > > > > > > had
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > write regular expressions with backslashes in
> > > Java
> > > > is
> > > > > > > > > probably
> > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > familiar with this:
> > > > > > > > > > "this\\\\.is\\\\.not\\\\.very\\\\.readable".
> > > > > > > > > > > > What
> > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > > you think about using the dot character to
> > escape
> > > > > > itself?
> > > > > > > > In
> > > > > > > > > > > other
> > > > > > > > > > > > > > words,
> > > > > > > > > > > > > > > > to access a single field named "this.field",
> > > > instead
> > > > > of
> > > > > > > > using
> > > > > > > > > > the
> > > > > > > > > > > > > > syntax
> > > > > > > > > > > > > > > > "this\.field" (which in JSON would have to be
> > > > > expressed
> > > > > > > as
> > > > > > > > > > > > > > > "this\\.field"),
> > > > > > > > > > > > > > > > we could use "this..field", and for a single
> > > field
> > > > > > named
> > > > > > > > > > > > > "this\field",
> > > > > > > > > > > > > > > > instead of using the syntax "this\\field"
> (or,
> > in
> > > > > JSON,
> > > > > > > > > > > > > > "this\\\\field"),
> > > > > > > > > > > > > > > > we could use "this\field" (or, in JSON,
> > > > > "this\\field").
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. Could you flesh out the details on the new
> > > > > > > "field.style"
> > > > > > > > > > > > property,
> > > > > > > > > > > > > > > > including the type, default value,
> importance,
> > > and
> > > > a
> > > > > > > > > > preliminary
> > > > > > > > > > > > > > > docstring?
> > > > > > > > > > > > > > > > See
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors#KIP618:ExactlyOnceSupportforSourceConnectors-Newproperties
> > > > > > > > > > > > > > > > for an example.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 3. Is the "Compatibility, Deprecation, and
> > > > Migration
> > > > > > > Plan"
> > > > > > > > > > > section
> > > > > > > > > > > > > > still
> > > > > > > > > > > > > > > > accurate after the latest update? Seems like
> > it's
> > > > > still
> > > > > > > > > written
> > > > > > > > > > > > with
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > assumption that nested field syntax will be
> > > > hardcoded
> > > > > > or
> > > > > > > > > > opt-in,
> > > > > > > > > > > > > which
> > > > > > > > > > > > > > > IIUC
> > > > > > > > > > > > > > > > isn't the case anymore.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 4. Nit: The "These SMTs do not require nested
> > > > > structure
> > > > > > > > > > support"
> > > > > > > > > > > > > > section
> > > > > > > > > > > > > > > > mentions a "Drop" SMT. I think this may be
> > > > referring
> > > > > to
> > > > > > > the
> > > > > > > > > > > > Confluent
> > > > > > > > > > > > > > > Drop
> > > > > > > > > > > > > > > > SMT, which isn't a part of Apache Kafka.
> Should
> > > we
> > > > > drop
> > > > > > > > (heh)
> > > > > > > > > > > that
> > > > > > > > > > > > > SMT
> > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > the list? Or perhaps just replace it with
> > > > > > "DropHeaders",
> > > > > > > > > which
> > > > > > > > > > is
> > > > > > > > > > > > > > > currently
> > > > > > > > > > > > > > > > missing from the list and shouldn't require
> any
> > > > > > > > nested-field
> > > > > > > > > > > > related
> > > > > > > > > > > > > > > > updates?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Chris
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Feb 28, 2022 at 2:12 PM Jorge Esteban
> > > > > Quilcate
> > > > > > > > Otoya
> > > > > > > > > <
> > > > > > > > > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you, Chris! and sorry for the delayed
> > > > > response.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Please, find my comments below:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, 14 Feb 2022 at 17:34, Chris Egerton
> > > > > > > > > > > > > > > <chr...@confluent.io.invalid
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Jorge,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the KIP! I'd love to see
> support
> > > for
> > > > > > > nested
> > > > > > > > > > fields
> > > > > > > > > > > > > added
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > out-of-the-box SMTs provided with
> Connect.
> > > Here
> > > > > are
> > > > > > > my
> > > > > > > > > > > initial
> > > > > > > > > > > > > > > > thoughts:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 1. I agree that there's a case to be made
> > for
> > > > > > > expanding
> > > > > > > > > > > > > HoistField
> > > > > > > > > > > > > > > > with a
> > > > > > > > > > > > > > > > > > new config property for identifying a
> > nested,
> > > > > > > > > to-be-hoisted
> > > > > > > > > > > > > field,
> > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > example in the KIP doesn't really
> > demonstrate
> > > > why
> > > > > > > this
> > > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > > > > > > > valuable. I
> > > > > > > > > > > > > > > > > > think it'd be helpful to expand the
> example
> > > to
> > > > > add
> > > > > > > > other
> > > > > > > > > > > fields
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > order
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > show how adding nested field support
> > enables
> > > > > users
> > > > > > to
> > > > > > > > > > hoist a
> > > > > > > > > > > > > > nested
> > > > > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > > without dropping other fields from the
> > value.
> > > > > Maybe
> > > > > > > > > > something
> > > > > > > > > > > > > like
> > > > > > > > > > > > > > > > this:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >     source = nested.val
> > > > > > > > > > > > > > > > > >     field = line
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >     value (before):
> > > > > > > > > > > > > > > > > >         {
> > > > > > > > > > > > > > > > > >             "nested": {
> > > > > > > > > > > > > > > > > >                 "val": 42,
> > > > > > > > > > > > > > > > > >                 "other val": 96
> > > > > > > > > > > > > > > > > >             }
> > > > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >     value (after):
> > > > > > > > > > > > > > > > > >         {
> > > > > > > > > > > > > > > > > >             "nested": {
> > > > > > > > > > > > > > > > > >                 "line": {
> > > > > > > > > > > > > > > > > >                     "val": 42,
> > > > > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > > > >                 "other val": 96
> > > > > > > > > > > > > > > > > >             }
> > > > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 2. Nit: I think "source" is a little
> > strange
> > > > for
> > > > > > the
> > > > > > > > new
> > > > > > > > > > > > > HoistField
> > > > > > > > > > > > > > > > > > property name. Maybe "hoisted" or
> > > > "hoisted.field"
> > > > > > > would
> > > > > > > > > be
> > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > descriptive?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > About 1. and 2.:
> > > > > > > > > > > > > > > > > Agree. The example for this SMT is updated
> > and
> > > > have
> > > > > > > added
> > > > > > > > > the
> > > > > > > > > > > > > > `hoisted`
> > > > > > > > > > > > > > > > > configuration.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 3. Is there a reasonable use case for
> > > expanding
> > > > > > > Flatten
> > > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > > > able
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > flatten specific fields? My understanding
> > is
> > > > that
> > > > > > > it's
> > > > > > > > > > mostly
> > > > > > > > > > > > > > useful
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > writing to systems like databases that
> > don't
> > > > > > support
> > > > > > > > > nested
> > > > > > > > > > > > > values
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > require everything to be a flat list of
> > > > key-value
> > > > > > > > pairs.
> > > > > > > > > > > Being
> > > > > > > > > > > > > able
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > flatten a nested field wouldn't provide
> any
> > > > > > advantage
> > > > > > > > for
> > > > > > > > > > > that
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > > > > > Are there other cases where it would?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 4. I don't think we should
> unconditionally
> > > > change
> > > > > > the
> > > > > > > > > > default
> > > > > > > > > > > > > > > delimiter
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > Flatten. It's a backwards-incompatible,
> > > > breaking
> > > > > > > change
> > > > > > > > > > that
> > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > cause
> > > > > > > > > > > > > > > > > > headaches for users. It might be
> reasonable
> > > to
> > > > > > change
> > > > > > > > the
> > > > > > > > > > > > default
> > > > > > > > > > > > > > > value
> > > > > > > > > > > > > > > > > > dynamically based on whether the user has
> > > > > > specified a
> > > > > > > > > value
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > "field"
> > > > > > > > > > > > > > > > > > property, but considering the motivation
> > for
> > > > > > changing
> > > > > > > > the
> > > > > > > > > > > > default
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > it creates conflicts with the
> > > to-be-introduced
> > > > > > nested
> > > > > > > > > field
> > > > > > > > > > > > > syntax
> > > > > > > > > > > > > > > > (which
> > > > > > > > > > > > > > > > > > could arise with downstream SMTs
> regardless
> > > of
> > > > > > > whether
> > > > > > > > > the
> > > > > > > > > > > user
> > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > explicitly configured Flatten with the
> > > "field"
> > > > > > > > > property), I
> > > > > > > > > > > > don't
> > > > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > this would be too useful either. I have
> > some
> > > > > > thoughts
> > > > > > > > > below
> > > > > > > > > > > on
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > handle possible conflicts between names
> > with
> > > > dots
> > > > > > in
> > > > > > > > > their
> > > > > > > > > > > > names
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > dotted
> > > > > > > > > > > > > > > > > > syntax for nested field references that
> > > should
> > > > > > > > hopefully
> > > > > > > > > > make
> > > > > > > > > > > > > > either
> > > > > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > > > unnecessary.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Fair enough. With the support for nested
> > fields
> > > > in
> > > > > > > other
> > > > > > > > > > SMTs,
> > > > > > > > > > > > > > Flatten
> > > > > > > > > > > > > > > > > could stay as it is.
> > > > > > > > > > > > > > > > > This removes the need for (4) changing
> > Flatten
> > > > > config
> > > > > > > as
> > > > > > > > > > well.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 5. I think it's fine to expand
> ExtractField
> > > to
> > > > > > > support
> > > > > > > > > > nested
> > > > > > > > > > > > > > > notation,
> > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > it might be worth noting in the rejected
> > > > > > alternatives
> > > > > > > > > > section
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > isn't strictly necessary since you can
> > > replace
> > > > > any
> > > > > > > > single
> > > > > > > > > > > > > > invocation
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > that SMT that uses nested field notation
> > with
> > > > > > > multiple
> > > > > > > > > > > > > invocations
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > that use non-nested notation.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Agree. Adding it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 6. Nit: "RegerRouter" should be
> > "RegexRouter"
> > > > in
> > > > > > the
> > > > > > > > list
> > > > > > > > > > of
> > > > > > > > > > > > SMTs
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > > > > not require nested structure support.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Ack. Fixing it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 7. It may be rare for dots in field names
> > to
> > > > > occur
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > wild
> > > > > > > > > > > > > > > > (although
> > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > wouldn't be so certain of this), but
> unless
> > > we
> > > > > want
> > > > > > > to
> > > > > > > > > > > inflict
> > > > > > > > > > > > > > > > headaches
> > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > users of Flatten, I think we're going to
> > have
> > > > to
> > > > > > > think
> > > > > > > > > > about
> > > > > > > > > > > > > > > conflicts
> > > > > > > > > > > > > > > > > > between dotted notation and non-nested
> > fields
> > > > > whose
> > > > > > > > names
> > > > > > > > > > > > contain
> > > > > > > > > > > > > > > > dots. I
> > > > > > > > > > > > > > > > > > don't think this is actually such a bad
> > > thing,
> > > > > > > though.
> > > > > > > > I
> > > > > > > > > > > agree
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > dotted
> > > > > > > > > > > > > > > > > > notation is intuitive and pretty
> > commonplace
> > > > (in
> > > > > > > tools
> > > > > > > > > like
> > > > > > > > > > > jq,
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > example), so I'd like it if we could
> stick
> > to
> > > > it.
> > > > > > > What
> > > > > > > > > > about
> > > > > > > > > > > > > > > > introducing
> > > > > > > > > > > > > > > > > > escape syntax, using a backslash? That
> way,
> > > > users
> > > > > > > could
> > > > > > > > > > > > > > disambiguate
> > > > > > > > > > > > > > > > > > between "this.field" (which would refer
> to
> > > the
> > > > > > nested
> > > > > > > > > field
> > > > > > > > > > > > > "field"
> > > > > > > > > > > > > > > > under
> > > > > > > > > > > > > > > > > > the top-level "this" field), and
> > > "this\.field"
> > > > > > (which
> > > > > > > > > would
> > > > > > > > > > > > refer
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > field named "this.field"). Like with most
> > > > > languages
> > > > > > > > that
> > > > > > > > > > use
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > backslash
> > > > > > > > > > > > > > > > > > for escape sequences, it could also be
> used
> > > to
> > > > > > escape
> > > > > > > > > > itself,
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > event
> > > > > > > > > > > > > > > > > > that a field name contains a backslash. I
> > > think
> > > > > > this
> > > > > > > is
> > > > > > > > > > more
> > > > > > > > > > > > > > > intuitive
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > simpler than, e.g., adding a new config
> > > > property
> > > > > to
> > > > > > > > > toggle
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > delimiter
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > be used when parsing nested field
> > references.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I like this approach. Adding to the KIP.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 8. I don't think we can unconditionally
> > turn
> > > > this
> > > > > > > > feature
> > > > > > > > > > on.
> > > > > > > > > > > > The
> > > > > > > > > > > > > > > risk
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > breaking existing pipelines (especially
> > ones
> > > > that
> > > > > > > > > involve,
> > > > > > > > > > > for
> > > > > > > > > > > > > > > > example, a
> > > > > > > > > > > > > > > > > > combination of the Flatten and Cast SMTs)
> > is
> > > > > pretty
> > > > > > > > > high. I
> > > > > > > > > > > > think
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > should be an opt-in feature, at least
> until
> > > the
> > > > > > next
> > > > > > > > > major
> > > > > > > > > > > > > release.
> > > > > > > > > > > > > > > One
> > > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > we could accomplish this is by
> introducing
> > a
> > > > new
> > > > > > > > > > > "field.style"
> > > > > > > > > > > > > > (name
> > > > > > > > > > > > > > > > > > obviously subject to change) property
> with
> > > > values
> > > > > > of
> > > > > > > > > > "plain"
> > > > > > > > > > > > > > > (default)
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > "nested". If set to "plain" then the
> > current
> > > > > > > non-nested
> > > > > > > > > > > > behavior
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > used,
> > > > > > > > > > > > > > > > > > and if set to "nested", then the proposed
> > > > nested
> > > > > > > > behavior
> > > > > > > > > > is.
> > > > > > > > > > > > We
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > consider updating the default value to
> > > "nested"
> > > > > in
> > > > > > a
> > > > > > > > > future
> > > > > > > > > > > > major
> > > > > > > > > > > > > > > > release
> > > > > > > > > > > > > > > > > > (or even codify that plan in this KIP, if
> > > > there's
> > > > > > > > enough
> > > > > > > > > > > > support
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > it).
> > > > > > > > > > > > > > > > > > This would also leave the door open for
> > > adding
> > > > > > > > recursive
> > > > > > > > > > > > support
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > SMTs
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > the future by adding a permitted value of
> > > > > > > "recursive".
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 9. One of the linked tickets in the
> > > > "Motivation"
> > > > > > > > section,
> > > > > > > > > > > > > > > > > >
> > > > > https://issues.apache.org/jira/browse/KAFKA-10640,
> > > > > > > has
> > > > > > > > > an
> > > > > > > > > > > open
> > > > > > > > > > > > > PR
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > > > > > that propose adding recursive support to
> > some
> > > > > SMTs.
> > > > > > > > Have
> > > > > > > > > > you
> > > > > > > > > > > > > > > considered
> > > > > > > > > > > > > > > > > > this type of functionality for your KIP?
> Or
> > > is
> > > > > your
> > > > > > > aim
> > > > > > > > > to
> > > > > > > > > > > > stick
> > > > > > > > > > > > > > > solely
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > nested field support?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I like the `field.style` configuration flag
> > > > > approach.
> > > > > > > > > > > > > > > > > Thanks for pointing out the `recursive`
> > > approach.
> > > > > > Will
> > > > > > > > add
> > > > > > > > > > > > `nested`
> > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > moment, let's check the demand for
> > `recursive`
> > > to
> > > > > > > > consider
> > > > > > > > > it
> > > > > > > > > > > as
> > > > > > > > > > > > > part
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > this or another KIP.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I have added the following on the KIP:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > Future KIPs could extend this support for:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Recursive notation: name a field and
> apply
> > it
> > > > to
> > > > > > all
> > > > > > > > > fields
> > > > > > > > > > > > > across
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > schema matching that name.
> > > > > > > > > > > > > > > > > - Access to arrays: Adding `[]` notation to
> > > > > represent
> > > > > > > > > access
> > > > > > > > > > to
> > > > > > > > > > > > > > arrays
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > applying SMTs to fields within an array.
> > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Chris
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Feb 8, 2022 at 1:23 PM Jorge
> > Esteban
> > > > > > Quilcate
> > > > > > > > > > Otoya <
> > > > > > > > > > > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi Dev team,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I'd like to start a new discussion
> thread
> > > on
> > > > > > Kafka
> > > > > > > > > > Connect
> > > > > > > > > > > > > > KIP-821:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > This KIP is aimed to include support
> for
> > > > nested
> > > > > > > > > > structures
> > > > > > > > > > > on
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > existing
> > > > > > > > > > > > > > > > > > > SMTs — where this make sense.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Looking forward to your feedback.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > > > > Jorge.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to