Hi Tom,

Thanks for taking a look at this, and for your thoughtful comments. I'll
leave it up to Jorge to address most of your comments but I wanted to share
a couple quick thoughts I had regarding 103 and 104.

103. Like you, I was envisioning a possible syntax for array access that
used classic C-style brackets; e.g., `arr[index]`. However, I wonder if we
could keep things simple and use the same syntax that we're proposing for
nested field access? In other words, instead of `arr[index]`, you'd write
`arr.index`. It'd save us and users the headache of reserving characters
now that would need to be escaped even if their unescaped brethren aren't
used for anything, and also avoid the question of what exactly we should do
when we see a config that uses reserved characters that aren't yet
supported (throwing an exception seems pretty unfriendly for new users).

104. This would probably be useful, but it would come with some nasty
compatibility questions that would need to be addressed if we'd want SMTs
that leverage this new API to be viable for older versions of Connect. If
we package and distribute this feature as a library (either via an entirely
new artifact, or as part of the existing connect-transforms or connect-api
artifacts), then we'd have to either sidestep the existing plugin isolation
logic [1] that basically makes it impossible for Connect plugins to ship
their own versions of Connect artifacts, or issue a big warning to people
that any SMT that uses this API won't work with any older versions of
Connect. There's also some other features we might want to add in an
SMT-utils library such as the existing, internal, utils that Connect uses
right now [2]. It may be worth exploring this in a separate KIP of its own.

[1] -
https://github.com/apache/kafka/blob/d480c4aa6e513e36050d8e067931de2270525d18/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/PluginUtils.java#L46-L143

[2] -
https://github.com/apache/kafka/tree/d480c4aa6e513e36050d8e067931de2270525d18/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/util

Cheers,

Chris

On Fri, Apr 22, 2022 at 6:55 AM Tom Bentley <tbent...@redhat.com> wrote:

> Hi Jorge,
>
> Thanks for the KIP, especially for the examples which are super-clear.
>
> 100. The name `field.style` isn't so clear for something like ReplaceField:
> it's not so obvious that field.style applies to `include` and `exclude`.
>
> 101. The permitted values for `field.style` don't seem terribly intuitive
> (to me anyway): the meaning of `plain` isn't very guessable. Why not
> `top-level` or `root` instead? Also `nested` could be misconstrued to mean
> nested-but-not-top-level, so perhaps `recursive` or `cascading` might be
> better?
>
> 102. I'm torn on whether making the interpretation of existing configs like
> `field` be dependent on `field.style` is a good idea. I can see that it's
> the simplest thing to do, but it just feels a bit odd that sometimes the
> `field` would actually be a path and have different escaping rules. An
> alternative would be to come up with a parallel set of config names (e.g.
> as well as "field" an SMT might support "path") which were defined to
> always take paths, thus avoiding the changeable interpretation of the
> existing configs. The SMT's #configure() would need to throw in the case
> that both configs were given. I can see that that would be more work in
> implementation, but it feels cleaner.
>
> 103. I think in order to allow for supporting arrays in a later KIP (which
> certainly seems like it could be useful), we'd want to specify the syntax
> now, even if it wasn't implemented under this KIP. That's because I don't
> think you can't exclude the possibility that characters such as `[` and `]`
> appear in field names. So you'd have a compatibility problem if people
> started using the features of this KIP to access such fields, only for
> those characters to change their meaning under a later KIP.
>
> 104. I also wonder whether making paths into a public Java API, for use by
> 3rd party SMTs, would be valuable.
>
> Thanks again,
>
> Tom
>
>
>
> On Wed, 20 Apr 2022 at 17:53, Chris Egerton <fearthecel...@gmail.com>
> wrote:
>
> > 💯 Thanks Jorge, LGTM!
> >
> > On Wed, Apr 20, 2022, 12:40 Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Thank you, Chris! Not possible without your feedback.
> > >
> > > On Tue, 19 Apr 2022 at 23:04, Chris Egerton <fearthecel...@gmail.com>
> > > wrote:
> > >
> > > > Hi Jorge,
> > > >
> > > > Thank you for sticking through this. I have one small remark and one
> > > small
> > > > clarification; assuming you agree with me on them then I'm ready to
> > vote
> > > on
> > > > the KIP.
> > > >
> > > > 1. InsertField: The "field.on.missing.parent" and
> > > "field.on.existing.field"
> > > > docs both mention a permitted value of "ingore"; this should be
> > "ignore",
> > > > right?
> > > >
> > >
> > > Of course, one more typo :)
> > >
> > >
> > > > 2. InsertField: The examples are still missing the "field.style"
> > property
> > > > from the configurations. They should all include the property
> > > > "transforms.smt1.field.style": "nested", correct?
> > > >
> > >
> > > Yes, it is there. I think I know what you mean now, seems that
> Confluence
> > > is putting everything in one line when it's in separate lines in the
> > > editor.
> > > Hopefully, it's fixed now.
> > >
> > >
> > > >
> > > > Thanks again for working through this, and congratulations on a
> > > > well-written KIP!
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Tue, Apr 19, 2022 at 2:06 PM Jorge Esteban Quilcate Otoya <
> > > > quilcate.jo...@gmail.com> wrote:
> > > >
> > > > > Thank you, Chris! I apply these improvements to the KIP, let me
> know
> > > how
> > > > it
> > > > > looks now.
> > > > >
> > > > > On Mon, 11 Apr 2022 at 23:43, Chris Egerton <
> fearthecel...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Jorge,
> > > > > >
> > > > > > Wow, those examples are great! A few more remarks, but I think
> > we're
> > > > > > getting close:
> > > > > >
> > > > > > 1. The examples differ across SMTs with the name of the
> > > > newly-introduced
> > > > > > style property; some of them use "field.style", and some use
> > > > > > "fields.style". I think for consistency's sake we should stick
> with
> > > > just
> > > > > > "field.style"; otherwise it could be painful for users to have to
> > > > > remember
> > > > > > which to use.
> > > > > >
> > > > >
> > > > > Great catch. Agree, I fixed the config names to `field.style`.
> > > > >
> > > > >
> > > > > >
> > > > > > 2. Some of the examples are off:
> > > > > > - TimestampConverter: the input in the second example ("when
> field
> > > > names
> > > > > > include dots") doesn't contain a field with a dotted name
> > > > > > - ValueToKey: the config in the third example ("when field names
> > > > include
> > > > > > dots") should probably use "parent..child.k2" as the
> > > > > > "transforms.smt1.fields" property
> > > > > >
> > > > >
> > > > > Fixed. Thanks!
> > > > >
> > > > >
> > > > > >
> > > > > > 3. RE changes to InsertField:
> > > > > > - The InsertField SMT should also come with the new "field.style"
> > > > > property
> > > > > > in order to preserve backwards compatibility, right? I don't see
> it
> > > > > > included in the example configs for that one, just want to make
> > sure
> > > > > > - I don't know of any cases where we use snake_case for property
> > > names
> > > > in
> > > > > > Kafka; we should probably use "on.missing.parent" and
> > > > "on.existing.field"
> > > > > > as the new property names for InsertField.
> > > > > > - Why is the "on_existing_field" (or "on.existing.field")
> property
> > > only
> > > > > > applied when the field style is nested? Couldn't it be useful for
> > > > > > non-nested fields as well?
> > > > > >
> > > > >
> > > > > Great points! I have applied these suggestions to the KIP.
> > > > >
> > > > >
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > > On Sat, Apr 9, 2022 at 12:40 PM Jorge Esteban Quilcate Otoya <
> > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > >
> > > > > > > Again, great feedback Chris. Much appreciated.
> > > > > > > Added my comments below:
> > > > > > >
> > > > > > > On Tue, 5 Apr 2022 at 20:22, Chris Egerton <
> > > fearthecel...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jorge,
> > > > > > > >
> > > > > > > > Looking good! I have a few comments left but all but one or
> two
> > > are
> > > > > > > minor.
> > > > > > > >
> > > > > > > > 1. The motivation section states "This KIP is aimed to
> include
> > > > > support
> > > > > > > for
> > > > > > > > nested structures on the existing SMTs... and to include the
> > > > > > abstractions
> > > > > > > > to reuse this in future SMTs". A good implementation of this
> > KIP
> > > > will
> > > > > > > > definitely isolate reusable logic into a separate abstraction
> > > that
> > > > > can
> > > > > > be
> > > > > > > > easily pulled in to the SMTs we want to add nested field
> > support
> > > > to,
> > > > > > but
> > > > > > > > unless we plan on making this kind of abstraction publicly
> > > > available
> > > > > as
> > > > > > > > some kind of utility method or class that external SMT
> > developers
> > > > can
> > > > > > > > leverage, we can probably leave this part out as it's more of
> > an
> > > > > > > > implementation detail.
> > > > > > > >
> > > > > > >
> > > > > > > Make sense, will leave this out of the KIP.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > 2. The Cast example is a little misleading, isn't it? It
> > > > demonstrates
> > > > > > the
> > > > > > > > escape syntax for fields with dot literals in their names,
> but
> > it
> > > > > > doesn't
> > > > > > > > demonstrate a way to actually use the Cast (or any other) SMT
> > to
> > > > > > access a
> > > > > > > > nested field in a record, which is the whole point of the
> KIP.
> > I
> > > > like
> > > > > > the
> > > > > > > > example of escape syntax but we should probably also add one
> > for
> > > > > nested
> > > > > > > > field access.
> > > > > > > >
> > > > > > >
> > > > > > > Agree. I have added examples to each SMT to be more clear about
> > how
> > > > it
> > > > > > > affects each function
> > > > > > > .
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > 3. With the InsertField SMT, I'm wondering what the specific
> > > > behavior
> > > > > > > will
> > > > > > > > be when some or all of the "middle layer" nested fields are
> > > > missing.
> > > > > > For
> > > > > > > > example, if I have a record with a value of { "k1": "v1 }
> and I
> > > > apply
> > > > > > > > InsertField with topic.field = n1.n2.n3.topic, what will
> > happen?
> > > > Will
> > > > > > the
> > > > > > > > updated value become { "k1": "v1", "n1": { "n2": { "n3":
> > "topic"
> > > > }}},
> > > > > > > will
> > > > > > > > an exception be thrown, or something else? This seems
> analogous
> > > to
> > > > > the
> > > > > > > > command line mkdir command, which (at least on some operating
> > > > > systems)
> > > > > > > > fails by default if you try to create a new nested directory
> > > where
> > > > > > > anything
> > > > > > > > but the last element in the path doesn't exist, but can be
> > > invoked
> > > > > > with a
> > > > > > > > flag that instructs it to go ahead and create all levels of
> > > nested
> > > > > > > > directory instead. I'm leaning on the side of "just create
> > > > > everything"
> > > > > > > but
> > > > > > > > would be interested in your thoughts, and either way, we
> should
> > > > > > probably
> > > > > > > > make sure the intended behavior is well-defined before
> voting.
> > > > > > > >
> > > > > > >
> > > > > > > This is an interesting case, thanks for catching this!
> > > > > > > The default behavior I see is to create parents if they are
> > missing
> > > > and
> > > > > > > overwrite fields
> > > > > > > if they already exist.
> > > > > > > I'm planning to include the following two flags if there is a
> > need
> > > to
> > > > > > > overwrite this behavior:
> > > > > > > - `on_missing_parent` = (CREATE|IGNORE), default=CREATE
> > > > > > > - `on_existing_field` = (OVERWRITE|IGNORE), default=OVERWRITE
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > 4. Similarly, what will the behavior be if any of the field
> > > > elements
> > > > > > > > specified with InsertField already exist in the record value?
> > > Will
> > > > we
> > > > > > > just
> > > > > > > > overwrite them? What's the behavior of InsertField today
> under
> > > > > similar
> > > > > > > > circumstances?
> > > > > > > >
> > > > > > >
> > > > > > > The current behavior is to overwrite the value.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > > Chris
> > > > > > > >
> > > > > > > > On Thu, Mar 31, 2022 at 4:15 PM Jorge Esteban Quilcate Otoya
> <
> > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Thanks, Chris! Much appreciated all the feedback here.
> > > > > > > > >
> > > > > > > > > 1. You nailed it setting the design goal here: "it
> shouldn't
> > be
> > > > > > > > impossible
> > > > > > > > > to use this new feature for any field name, no matter how
> > > > > convoluted.
> > > > > > > > It's
> > > > > > > > > fine if edge cases introduce difficulty (such as
> > less-readable
> > > > > > > > > configurations), but it's not fine if they can't be
> addressed
> > > at
> > > > > > all."
> > > > > > > > > Back to the previous proposals (using only dots as
> > separators)
> > > we
> > > > > > have
> > > > > > > 2
> > > > > > > > > alternatives:
> > > > > > > > > 1. escaping with backslashes
> > > > > > > > > 2. escaping with dots itself
> > > > > > > > >
> > > > > > > > > I'll lean for alternative 2, as you proposed before. Feels
> to
> > > me
> > > > > that
> > > > > > > > > backslashes have more potential to lead to confusion in
> JSON
> > > > > configs,
> > > > > > > and
> > > > > > > > > CSV seems like a good precedent to use the same character
> to
> > > > escape
> > > > > > > > itself.
> > > > > > > > > KIP is updated to reflect this.
> > > > > > > > >
> > > > > > > > > 2. Thanks! I'll add an example, and stick with the current
> > > > approach
> > > > > > > > > defining the style per individual transform configuration.
> > > > > > > > >
> > > > > > > > > 3. Yes, thanks! KIP updated.
> > > > > > > > >
> > > > > > > > > 4. Of course. KIP updated.
> > > > > > > > >
> > > > > > > > > On Mon, 28 Mar 2022 at 21:59, Chris Egerton <
> > > > > fearthecel...@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Jorge,
> > > > > > > > > >
> > > > > > > > > > Thanks for addressing my comments; the KIP looks
> up-to-date
> > > and
> > > > > > > pretty
> > > > > > > > > > readable now, and the rejected alternatives section does
> a
> > > > great
> > > > > > job
> > > > > > > of
> > > > > > > > > > outlining the discussion so far and providing context for
> > > > anyone
> > > > > > else
> > > > > > > > who
> > > > > > > > > > might want to join in.
> > > > > > > > > >
> > > > > > > > > > 1. Thoughts on choice of delimiter:
> > > > > > > > > > - I like the optimization for simple cases, but I think
> the
> > > new
> > > > > > > > proposal
> > > > > > > > > is
> > > > > > > > > > a little too restrictive. What if there's a field whose
> > name
> > > > > > contains
> > > > > > > > all
> > > > > > > > > > of the permitted options (currently just ".", ",", and
> > "/")?
> > > > > > > > > > - If we expand the set of permitted delimiters to allow
> for
> > > any
> > > > > > > > > > single-character string, configuration complexity will
> > > increase
> > > > > and
> > > > > > > > > > readability may decrease
> > > > > > > > > > - Also worth pointing out that there is some convention
> for
> > > > > > doubling
> > > > > > > a
> > > > > > > > > > delimiter character as an escape mechanism with formats
> > like
> > > > CSV
> > > > > > [1]
> > > > > > > > > > - Overall I think we may be approaching the saturation
> > point
> > > > for
> > > > > > > > > productive
> > > > > > > > > > discussion on delimiter syntax so I don't want to spend
> too
> > > > much
> > > > > > more
> > > > > > > > of
> > > > > > > > > > your time on it. I think the one point I'd like to leave
> > for
> > > > now
> > > > > is
> > > > > > > > that
> > > > > > > > > it
> > > > > > > > > > shouldn't be impossible to use this new feature for any
> > field
> > > > > name,
> > > > > > > no
> > > > > > > > > > matter how convoluted. It's fine if edge cases introduce
> > > > > difficulty
> > > > > > > > (such
> > > > > > > > > > as less-readable configurations), but it's not fine if
> they
> > > > can't
> > > > > > be
> > > > > > > > > > addressed at all.
> > > > > > > > > >
> > > > > > > > > > 2.
> > > > > > > > > > The configuration style where you define
> > > > "transforms.field.style"
> > > > > > in
> > > > > > > > the
> > > > > > > > > > connector config, and then this applies to all SMTs for
> the
> > > > > > > connector,
> > > > > > > > is
> > > > > > > > > > very interesting. However, it doesn't follow convention
> for
> > > > > > existing
> > > > > > > > > SMTs.
> > > > > > > > > > Right now, if you want to configure an SMT, you define
> its
> > > name
> > > > > in
> > > > > > > the
> > > > > > > > > > connector config (for example, "transforms": "smt1"), and
> > > then
> > > > > > define
> > > > > > > > all
> > > > > > > > > > of the properties for that SMT in the connector config
> > using
> > > a
> > > > > > > > > namespacing
> > > > > > > > > > mechanism specific to that SMT (for example,
> > > > > > "transforms.smt1.prop1":
> > > > > > > > > > "val1"). That SMT then sees only the properties defined
> in
> > > that
> > > > > > > > > namespace,
> > > > > > > > > > with the prefix stripped (for example, "prop1": "val1")
> in
> > > its
> > > > > > > > configure
> > > > > > > > > > [2] [3] method.
> > > > > > > > > > If we want to continue to follow this convention, then
> > > instead
> > > > of
> > > > > > > > > > specifying "transforms.field.style" in a connector
> config,
> > we
> > > > > would
> > > > > > > > > expect
> > > > > > > > > > users to configure "transforms.<name>.field.style", for
> > each
> > > > SMT
> > > > > > that
> > > > > > > > > they
> > > > > > > > > > want to configure a field style for. This would require
> > more
> > > > work
> > > > > > on
> > > > > > > > the
> > > > > > > > > > part of the user, but would be simpler to reason about
> and
> > > > easier
> > > > > > to
> > > > > > > > > > implement.
> > > > > > > > > > If we want to explore an alternative where users can
> > specify
> > > > > global
> > > > > > > > > > properties that apply to all transforms in a connector
> > > config,
> > > > > then
> > > > > > > the
> > > > > > > > > > semantics for this need to be defined in the KIP. This
> > would
> > > > have
> > > > > > to
> > > > > > > > > > include whether this will apply only for the special case
> > of
> > > > the
> > > > > > > > > > "field.style" and possibly "field.separator" properties
> or
> > if
> > > > it
> > > > > > > would
> > > > > > > > be
> > > > > > > > > > available more generally for other properties, whether it
> > > will
> > > > > > apply
> > > > > > > > only
> > > > > > > > > > for the SMTs outlined in the KIP or if the "field.style"
> > and
> > > > > > possibly
> > > > > > > > > > "field.separator" properties would also be passed into
> > custom
> > > > > SMTs
> > > > > > so
> > > > > > > > > that
> > > > > > > > > > they could choose to act on them if applicable, how edge
> > > cases
> > > > > like
> > > > > > > > > having
> > > > > > > > > > an SMT named "field" in your connector config would be
> > > handled,
> > > > > > etc.
> > > > > > > > > > Either way, it might help to have an example in the KIP
> > > > outlining
> > > > > > how
> > > > > > > > one
> > > > > > > > > > of the to-be-augmented SMTs can be configured with this
> new
> > > > > feature
> > > > > > > > and a
> > > > > > > > > > before/after of how a record value would be transformed
> > with
> > > > that
> > > > > > > > > > configuration.
> > > > > > > > > >
> > > > > > > > > > 3. The docstring for the "transforms.field.style"
> property
> > > > > mentions
> > > > > > > > that
> > > > > > > > > > the permitted values are "plain" and "nested", but then
> > > > describes
> > > > > > > > > behavior
> > > > > > > > > > with a value of "root". Should that be "plain" instead?
> > > > > > > > > >
> > > > > > > > > > 4. The docstring for the "transforms.field.separator"
> > > property
> > > > > > > > > exclusively
> > > > > > > > > > mentions structs, but the feature is intended to work
> with
> > > maps
> > > > > as
> > > > > > > > well.
> > > > > > > > > > Can we update it to reflect this?
> > > > > > > > > >
> > > > > > > > > > References:
> > > > > > > > > >
> > > > > > > > > > [1] - https://stackoverflow.com/a/17808731
> > > > > > > > > > [2] -
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/7243facb8d69a7252e6b9556b5eaee13e41bab7f/connect/api/src/main/java/org/apache/kafka/connect/transforms/Transformation.java#L30
> > > > > > > > > > [3] -
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/7243facb8d69a7252e6b9556b5eaee13e41bab7f/clients/src/main/java/org/apache/kafka/common/Configurable.java#L26-L29
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > >
> > > > > > > > > > Chris
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 28, 2022 at 1:32 PM Jorge Esteban Quilcate
> > Otoya
> > > <
> > > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks, Chris!
> > > > > > > > > > >
> > > > > > > > > > > 1. I'd argue "this..field.child" could be harder to
> grasp
> > > > than
> > > > > > > > > > > "this.field/child" + separator: "/".
> > > > > > > > > > > Even though this represents additional information, it
> > > > follows
> > > > > a
> > > > > > > > > similar
> > > > > > > > > > > approach as the "Flatten#delimeter" configuration.
> > > > > > > > > > > I want to give the separator approach another try, so I
> > > have
> > > > > > > updated
> > > > > > > > > the
> > > > > > > > > > > KIP with the separator proposal, sticking to only 2
> > > > > alternatives
> > > > > > > that
> > > > > > > > > > > should hopefully cover most scenarios.
> > > > > > > > > > >
> > > > > > > > > > > 2. Agree. KIP has been updated with this improvement.
> > > > > > > > > > >
> > > > > > > > > > > 3. You're right. I have updated this section
> accordingly.
> > > > > > > > > > >
> > > > > > > > > > > 4. Good catch! I've replaced it with `DropHeaders`.
> > > > > > > > > > >
> > > > > > > > > > > Looking forward to your feedback.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Jorge.
> > > > > > > > > > >
> > > > > > > > > > > On Wed, 9 Mar 2022 at 21:33, Chris Egerton <
> > > > > > > fearthecel...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Jorge,
> > > > > > > > > > > >
> > > > > > > > > > > > Looking good! Got a few more thoughts.
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Sorry to revisit this, but I think we may want to
> > > adopt
> > > > a
> > > > > > > > slightly
> > > > > > > > > > > > different escape syntax style. Backslashes are great,
> > but
> > > > > since
> > > > > > > > > they're
> > > > > > > > > > > > already used by JSON, using them as an escape
> sequence
> > in
> > > > > field
> > > > > > > > > > notation
> > > > > > > > > > > > would also lead to some pretty ugly connector
> configs.
> > > > Anyone
> > > > > > > who's
> > > > > > > > > had
> > > > > > > > > > > to
> > > > > > > > > > > > write regular expressions with backslashes in Java is
> > > > > probably
> > > > > > > > > already
> > > > > > > > > > > > familiar with this:
> > > > > > "this\\\\.is\\\\.not\\\\.very\\\\.readable".
> > > > > > > > What
> > > > > > > > > > do
> > > > > > > > > > > > you think about using the dot character to escape
> > itself?
> > > > In
> > > > > > > other
> > > > > > > > > > words,
> > > > > > > > > > > > to access a single field named "this.field", instead
> of
> > > > using
> > > > > > the
> > > > > > > > > > syntax
> > > > > > > > > > > > "this\.field" (which in JSON would have to be
> expressed
> > > as
> > > > > > > > > > > "this\\.field"),
> > > > > > > > > > > > we could use "this..field", and for a single field
> > named
> > > > > > > > > "this\field",
> > > > > > > > > > > > instead of using the syntax "this\\field" (or, in
> JSON,
> > > > > > > > > > "this\\\\field"),
> > > > > > > > > > > > we could use "this\field" (or, in JSON,
> "this\\field").
> > > > > > > > > > > >
> > > > > > > > > > > > 2. Could you flesh out the details on the new
> > > "field.style"
> > > > > > > > property,
> > > > > > > > > > > > including the type, default value, importance, and a
> > > > > > preliminary
> > > > > > > > > > > docstring?
> > > > > > > > > > > > See
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors#KIP618:ExactlyOnceSupportforSourceConnectors-Newproperties
> > > > > > > > > > > > for an example.
> > > > > > > > > > > >
> > > > > > > > > > > > 3. Is the "Compatibility, Deprecation, and Migration
> > > Plan"
> > > > > > > section
> > > > > > > > > > still
> > > > > > > > > > > > accurate after the latest update? Seems like it's
> still
> > > > > written
> > > > > > > > with
> > > > > > > > > > the
> > > > > > > > > > > > assumption that nested field syntax will be hardcoded
> > or
> > > > > > opt-in,
> > > > > > > > > which
> > > > > > > > > > > IIUC
> > > > > > > > > > > > isn't the case anymore.
> > > > > > > > > > > >
> > > > > > > > > > > > 4. Nit: The "These SMTs do not require nested
> structure
> > > > > > support"
> > > > > > > > > > section
> > > > > > > > > > > > mentions a "Drop" SMT. I think this may be referring
> to
> > > the
> > > > > > > > Confluent
> > > > > > > > > > > Drop
> > > > > > > > > > > > SMT, which isn't a part of Apache Kafka. Should we
> drop
> > > > (heh)
> > > > > > > that
> > > > > > > > > SMT
> > > > > > > > > > > from
> > > > > > > > > > > > the list? Or perhaps just replace it with
> > "DropHeaders",
> > > > > which
> > > > > > is
> > > > > > > > > > > currently
> > > > > > > > > > > > missing from the list and shouldn't require any
> > > > nested-field
> > > > > > > > related
> > > > > > > > > > > > updates?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > >
> > > > > > > > > > > > Chris
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Feb 28, 2022 at 2:12 PM Jorge Esteban
> Quilcate
> > > > Otoya
> > > > > <
> > > > > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thank you, Chris! and sorry for the delayed
> response.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please, find my comments below:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 14 Feb 2022 at 17:34, Chris Egerton
> > > > > > > > > > > <chr...@confluent.io.invalid
> > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Jorge,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the KIP! I'd love to see support for
> > > nested
> > > > > > fields
> > > > > > > > > added
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > out-of-the-box SMTs provided with Connect. Here
> are
> > > my
> > > > > > > initial
> > > > > > > > > > > > thoughts:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. I agree that there's a case to be made for
> > > expanding
> > > > > > > > > HoistField
> > > > > > > > > > > > with a
> > > > > > > > > > > > > > new config property for identifying a nested,
> > > > > to-be-hoisted
> > > > > > > > > field,
> > > > > > > > > > > but
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > example in the KIP doesn't really demonstrate why
> > > this
> > > > > > would
> > > > > > > be
> > > > > > > > > > > > > valuable. I
> > > > > > > > > > > > > > think it'd be helpful to expand the example to
> add
> > > > other
> > > > > > > fields
> > > > > > > > > in
> > > > > > > > > > > > order
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > show how adding nested field support enables
> users
> > to
> > > > > > hoist a
> > > > > > > > > > nested
> > > > > > > > > > > > > field
> > > > > > > > > > > > > > without dropping other fields from the value.
> Maybe
> > > > > > something
> > > > > > > > > like
> > > > > > > > > > > > this:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >     source = nested.val
> > > > > > > > > > > > > >     field = line
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >     value (before):
> > > > > > > > > > > > > >         {
> > > > > > > > > > > > > >             "nested": {
> > > > > > > > > > > > > >                 "val": 42,
> > > > > > > > > > > > > >                 "other val": 96
> > > > > > > > > > > > > >             }
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >     value (after):
> > > > > > > > > > > > > >         {
> > > > > > > > > > > > > >             "nested": {
> > > > > > > > > > > > > >                 "line": {
> > > > > > > > > > > > > >                     "val": 42,
> > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > >                 "other val": 96
> > > > > > > > > > > > > >             }
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. Nit: I think "source" is a little strange for
> > the
> > > > new
> > > > > > > > > HoistField
> > > > > > > > > > > > > > property name. Maybe "hoisted" or "hoisted.field"
> > > would
> > > > > be
> > > > > > > more
> > > > > > > > > > > > > > descriptive?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > About 1. and 2.:
> > > > > > > > > > > > > Agree. The example for this SMT is updated and have
> > > added
> > > > > the
> > > > > > > > > > `hoisted`
> > > > > > > > > > > > > configuration.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 3. Is there a reasonable use case for expanding
> > > Flatten
> > > > > to
> > > > > > be
> > > > > > > > > able
> > > > > > > > > > to
> > > > > > > > > > > > > > flatten specific fields? My understanding is that
> > > it's
> > > > > > mostly
> > > > > > > > > > useful
> > > > > > > > > > > > for
> > > > > > > > > > > > > > writing to systems like databases that don't
> > support
> > > > > nested
> > > > > > > > > values
> > > > > > > > > > > and
> > > > > > > > > > > > > > require everything to be a flat list of key-value
> > > > pairs.
> > > > > > > Being
> > > > > > > > > able
> > > > > > > > > > > to
> > > > > > > > > > > > > > flatten a nested field wouldn't provide any
> > advantage
> > > > for
> > > > > > > that
> > > > > > > > > use
> > > > > > > > > > > > case.
> > > > > > > > > > > > > > Are there other cases where it would?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 4. I don't think we should unconditionally change
> > the
> > > > > > default
> > > > > > > > > > > delimiter
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > Flatten. It's a backwards-incompatible, breaking
> > > change
> > > > > > that
> > > > > > > > > could
> > > > > > > > > > > > cause
> > > > > > > > > > > > > > headaches for users. It might be reasonable to
> > change
> > > > the
> > > > > > > > default
> > > > > > > > > > > value
> > > > > > > > > > > > > > dynamically based on whether the user has
> > specified a
> > > > > value
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > > > > "field"
> > > > > > > > > > > > > > property, but considering the motivation for
> > changing
> > > > the
> > > > > > > > default
> > > > > > > > > > is
> > > > > > > > > > > > that
> > > > > > > > > > > > > > it creates conflicts with the to-be-introduced
> > nested
> > > > > field
> > > > > > > > > syntax
> > > > > > > > > > > > (which
> > > > > > > > > > > > > > could arise with downstream SMTs regardless of
> > > whether
> > > > > the
> > > > > > > user
> > > > > > > > > has
> > > > > > > > > > > > > > explicitly configured Flatten with the "field"
> > > > > property), I
> > > > > > > > don't
> > > > > > > > > > > know
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > this would be too useful either. I have some
> > thoughts
> > > > > below
> > > > > > > on
> > > > > > > > > how
> > > > > > > > > > to
> > > > > > > > > > > > > > handle possible conflicts between names with dots
> > in
> > > > > their
> > > > > > > > names
> > > > > > > > > > and
> > > > > > > > > > > > > dotted
> > > > > > > > > > > > > > syntax for nested field references that should
> > > > hopefully
> > > > > > make
> > > > > > > > > > either
> > > > > > > > > > > > > change
> > > > > > > > > > > > > > unnecessary.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Fair enough. With the support for nested fields in
> > > other
> > > > > > SMTs,
> > > > > > > > > > Flatten
> > > > > > > > > > > > > could stay as it is.
> > > > > > > > > > > > > This removes the need for (4) changing Flatten
> config
> > > as
> > > > > > well.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 5. I think it's fine to expand ExtractField to
> > > support
> > > > > > nested
> > > > > > > > > > > notation,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > it might be worth noting in the rejected
> > alternatives
> > > > > > section
> > > > > > > > > that
> > > > > > > > > > > this
> > > > > > > > > > > > > > isn't strictly necessary since you can replace
> any
> > > > single
> > > > > > > > > > invocation
> > > > > > > > > > > of
> > > > > > > > > > > > > > that SMT that uses nested field notation with
> > > multiple
> > > > > > > > > invocations
> > > > > > > > > > of
> > > > > > > > > > > > it
> > > > > > > > > > > > > > that use non-nested notation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Agree. Adding it.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 6. Nit: "RegerRouter" should be "RegexRouter" in
> > the
> > > > list
> > > > > > of
> > > > > > > > SMTs
> > > > > > > > > > > that
> > > > > > > > > > > > do
> > > > > > > > > > > > > > not require nested structure support.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Ack. Fixing it.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 7. It may be rare for dots in field names to
> occur
> > in
> > > > the
> > > > > > > wild
> > > > > > > > > > > > (although
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > wouldn't be so certain of this), but unless we
> want
> > > to
> > > > > > > inflict
> > > > > > > > > > > > headaches
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > users of Flatten, I think we're going to have to
> > > think
> > > > > > about
> > > > > > > > > > > conflicts
> > > > > > > > > > > > > > between dotted notation and non-nested fields
> whose
> > > > names
> > > > > > > > contain
> > > > > > > > > > > > dots. I
> > > > > > > > > > > > > > don't think this is actually such a bad thing,
> > > though.
> > > > I
> > > > > > > agree
> > > > > > > > > that
> > > > > > > > > > > > > dotted
> > > > > > > > > > > > > > notation is intuitive and pretty commonplace (in
> > > tools
> > > > > like
> > > > > > > jq,
> > > > > > > > > for
> > > > > > > > > > > > > > example), so I'd like it if we could stick to it.
> > > What
> > > > > > about
> > > > > > > > > > > > introducing
> > > > > > > > > > > > > > escape syntax, using a backslash? That way, users
> > > could
> > > > > > > > > > disambiguate
> > > > > > > > > > > > > > between "this.field" (which would refer to the
> > nested
> > > > > field
> > > > > > > > > "field"
> > > > > > > > > > > > under
> > > > > > > > > > > > > > the top-level "this" field), and "this\.field"
> > (which
> > > > > would
> > > > > > > > refer
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > > > field named "this.field"). Like with most
> languages
> > > > that
> > > > > > use
> > > > > > > > the
> > > > > > > > > > > > > backslash
> > > > > > > > > > > > > > for escape sequences, it could also be used to
> > escape
> > > > > > itself,
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > event
> > > > > > > > > > > > > > that a field name contains a backslash. I think
> > this
> > > is
> > > > > > more
> > > > > > > > > > > intuitive
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > simpler than, e.g., adding a new config property
> to
> > > > > toggle
> > > > > > > the
> > > > > > > > > > > > delimiter
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > be used when parsing nested field references.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I like this approach. Adding to the KIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 8. I don't think we can unconditionally turn this
> > > > feature
> > > > > > on.
> > > > > > > > The
> > > > > > > > > > > risk
> > > > > > > > > > > > of
> > > > > > > > > > > > > > breaking existing pipelines (especially ones that
> > > > > involve,
> > > > > > > for
> > > > > > > > > > > > example, a
> > > > > > > > > > > > > > combination of the Flatten and Cast SMTs) is
> pretty
> > > > > high. I
> > > > > > > > think
> > > > > > > > > > > this
> > > > > > > > > > > > > > should be an opt-in feature, at least until the
> > next
> > > > > major
> > > > > > > > > release.
> > > > > > > > > > > One
> > > > > > > > > > > > > way
> > > > > > > > > > > > > > we could accomplish this is by introducing a new
> > > > > > > "field.style"
> > > > > > > > > > (name
> > > > > > > > > > > > > > obviously subject to change) property with values
> > of
> > > > > > "plain"
> > > > > > > > > > > (default)
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > "nested". If set to "plain" then the current
> > > non-nested
> > > > > > > > behavior
> > > > > > > > > is
> > > > > > > > > > > > used,
> > > > > > > > > > > > > > and if set to "nested", then the proposed nested
> > > > behavior
> > > > > > is.
> > > > > > > > We
> > > > > > > > > > can
> > > > > > > > > > > > > > consider updating the default value to "nested"
> in
> > a
> > > > > future
> > > > > > > > major
> > > > > > > > > > > > release
> > > > > > > > > > > > > > (or even codify that plan in this KIP, if there's
> > > > enough
> > > > > > > > support
> > > > > > > > > > for
> > > > > > > > > > > > it).
> > > > > > > > > > > > > > This would also leave the door open for adding
> > > > recursive
> > > > > > > > support
> > > > > > > > > to
> > > > > > > > > > > > SMTs
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > the future by adding a permitted value of
> > > "recursive".
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 9. One of the linked tickets in the "Motivation"
> > > > section,
> > > > > > > > > > > > > >
> https://issues.apache.org/jira/browse/KAFKA-10640,
> > > has
> > > > > an
> > > > > > > open
> > > > > > > > > PR
> > > > > > > > > > > and
> > > > > > > > > > > > > KIP
> > > > > > > > > > > > > > that propose adding recursive support to some
> SMTs.
> > > > Have
> > > > > > you
> > > > > > > > > > > considered
> > > > > > > > > > > > > > this type of functionality for your KIP? Or is
> your
> > > aim
> > > > > to
> > > > > > > > stick
> > > > > > > > > > > solely
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > nested field support?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I like the `field.style` configuration flag
> approach.
> > > > > > > > > > > > > Thanks for pointing out the `recursive` approach.
> > Will
> > > > add
> > > > > > > > `nested`
> > > > > > > > > > at
> > > > > > > > > > > > the
> > > > > > > > > > > > > moment, let's check the demand for `recursive` to
> > > > consider
> > > > > it
> > > > > > > as
> > > > > > > > > part
> > > > > > > > > > > of
> > > > > > > > > > > > > this or another KIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have added the following on the KIP:
> > > > > > > > > > > > >
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > Future KIPs could extend this support for:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Recursive notation: name a field and apply it to
> > all
> > > > > fields
> > > > > > > > > across
> > > > > > > > > > > the
> > > > > > > > > > > > > schema matching that name.
> > > > > > > > > > > > > - Access to arrays: Adding `[]` notation to
> represent
> > > > > access
> > > > > > to
> > > > > > > > > > arrays
> > > > > > > > > > > > and
> > > > > > > > > > > > > applying SMTs to fields within an array.
> > > > > > > > > > > > > ```
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Chris
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Feb 8, 2022 at 1:23 PM Jorge Esteban
> > Quilcate
> > > > > > Otoya <
> > > > > > > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Dev team,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'd like to start a new discussion thread on
> > Kafka
> > > > > > Connect
> > > > > > > > > > KIP-821:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This KIP is aimed to include support for nested
> > > > > > structures
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > > > > > existing
> > > > > > > > > > > > > > > SMTs — where this make sense.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Looking forward to your feedback.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > Jorge.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to