Thanks, Chris!

1. I'd argue "this..field.child" could be harder to grasp than
"this.field/child" + separator: "/".
Even though this represents additional information, it follows a similar
approach as the "Flatten#delimeter" configuration.
I want to give the separator approach another try, so I have updated the
KIP with the separator proposal, sticking to only 2 alternatives that
should hopefully cover most scenarios.

2. Agree. KIP has been updated with this improvement.

3. You're right. I have updated this section accordingly.

4. Good catch! I've replaced it with `DropHeaders`.

Looking forward to your feedback.

Thanks,
Jorge.

On Wed, 9 Mar 2022 at 21:33, Chris Egerton <fearthecel...@gmail.com> wrote:

> Hi Jorge,
>
> Looking good! Got a few more thoughts.
>
> 1. Sorry to revisit this, but I think we may want to adopt a slightly
> different escape syntax style. Backslashes are great, but since they're
> already used by JSON, using them as an escape sequence in field notation
> would also lead to some pretty ugly connector configs. Anyone who's had to
> write regular expressions with backslashes in Java is probably already
> familiar with this: "this\\\\.is\\\\.not\\\\.very\\\\.readable". What do
> you think about using the dot character to escape itself? In other words,
> to access a single field named "this.field", instead of using the syntax
> "this\.field" (which in JSON would have to be expressed as "this\\.field"),
> we could use "this..field", and for a single field named "this\field",
> instead of using the syntax "this\\field" (or, in JSON, "this\\\\field"),
> we could use "this\field" (or, in JSON, "this\\field").
>
> 2. Could you flesh out the details on the new "field.style" property,
> including the type, default value, importance, and a preliminary docstring?
> See
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors#KIP618:ExactlyOnceSupportforSourceConnectors-Newproperties
> for an example.
>
> 3. Is the "Compatibility, Deprecation, and Migration Plan" section still
> accurate after the latest update? Seems like it's still written with the
> assumption that nested field syntax will be hardcoded or opt-in, which IIUC
> isn't the case anymore.
>
> 4. Nit: The "These SMTs do not require nested structure support" section
> mentions a "Drop" SMT. I think this may be referring to the Confluent Drop
> SMT, which isn't a part of Apache Kafka. Should we drop (heh) that SMT from
> the list? Or perhaps just replace it with "DropHeaders", which is currently
> missing from the list and shouldn't require any nested-field related
> updates?
>

> Cheers,
>
> Chris
>
> On Mon, Feb 28, 2022 at 2:12 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Thank you, Chris! and sorry for the delayed response.
> >
> > Please, find my comments below:
> >
> > On Mon, 14 Feb 2022 at 17:34, Chris Egerton <chr...@confluent.io.invalid
> >
> > wrote:
> >
> > > Hi Jorge,
> > >
> > > Thanks for the KIP! I'd love to see support for nested fields added to
> > the
> > > out-of-the-box SMTs provided with Connect. Here are my initial
> thoughts:
> > >
> > > 1. I agree that there's a case to be made for expanding HoistField
> with a
> > > new config property for identifying a nested, to-be-hoisted field, but
> > the
> > > example in the KIP doesn't really demonstrate why this would be
> > valuable. I
> > > think it'd be helpful to expand the example to add other fields in
> order
> > to
> > > show how adding nested field support enables users to hoist a nested
> > field
> > > without dropping other fields from the value. Maybe something like
> this:
> > >
> > >     source = nested.val
> > >     field = line
> > >
> > >     value (before):
> > >         {
> > >             "nested": {
> > >                 "val": 42,
> > >                 "other val": 96
> > >             }
> > >         }
> > >
> > >     value (after):
> > >         {
> > >             "nested": {
> > >                 "line": {
> > >                     "val": 42,
> > >                 }
> > >                 "other val": 96
> > >             }
> > >         }
> > >
> > > 2. Nit: I think "source" is a little strange for the new HoistField
> > > property name. Maybe "hoisted" or "hoisted.field" would be more
> > > descriptive?
> > >
> > >
> > About 1. and 2.:
> > Agree. The example for this SMT is updated and have added the `hoisted`
> > configuration.
> >
> >
> > > 3. Is there a reasonable use case for expanding Flatten to be able to
> > > flatten specific fields? My understanding is that it's mostly useful
> for
> > > writing to systems like databases that don't support nested values and
> > > require everything to be a flat list of key-value pairs. Being able to
> > > flatten a nested field wouldn't provide any advantage for that use
> case.
> > > Are there other cases where it would?
> > >
> > > 4. I don't think we should unconditionally change the default delimiter
> > for
> > > Flatten. It's a backwards-incompatible, breaking change that could
> cause
> > > headaches for users. It might be reasonable to change the default value
> > > dynamically based on whether the user has specified a value for the
> > "field"
> > > property, but considering the motivation for changing the default is
> that
> > > it creates conflicts with the to-be-introduced nested field syntax
> (which
> > > could arise with downstream SMTs regardless of whether the user has
> > > explicitly configured Flatten with the "field" property), I don't know
> > that
> > > this would be too useful either. I have some thoughts below on how to
> > > handle possible conflicts between names with dots in their names and
> > dotted
> > > syntax for nested field references that should hopefully make either
> > change
> > > unnecessary.
> > >
> > >
> > Fair enough. With the support for nested fields in other SMTs, Flatten
> > could stay as it is.
> > This removes the need for (4) changing Flatten config as well.
> >
> >
> > > 5. I think it's fine to expand ExtractField to support nested notation,
> > but
> > > it might be worth noting in the rejected alternatives section that this
> > > isn't strictly necessary since you can replace any single invocation of
> > > that SMT that uses nested field notation with multiple invocations of
> it
> > > that use non-nested notation.
> > >
> >
> > Agree. Adding it.
> >
> >
> > >
> > > 6. Nit: "RegerRouter" should be "RegexRouter" in the list of SMTs that
> do
> > > not require nested structure support.
> > >
> > >
> > Ack. Fixing it.
> >
> >
> > > 7. It may be rare for dots in field names to occur in the wild
> (although
> > I
> > > wouldn't be so certain of this), but unless we want to inflict
> headaches
> > on
> > > users of Flatten, I think we're going to have to think about conflicts
> > > between dotted notation and non-nested fields whose names contain
> dots. I
> > > don't think this is actually such a bad thing, though. I agree that
> > dotted
> > > notation is intuitive and pretty commonplace (in tools like jq, for
> > > example), so I'd like it if we could stick to it. What about
> introducing
> > > escape syntax, using a backslash? That way, users could disambiguate
> > > between "this.field" (which would refer to the nested field "field"
> under
> > > the top-level "this" field), and "this\.field" (which would refer to
> the
> > > field named "this.field"). Like with most languages that use the
> > backslash
> > > for escape sequences, it could also be used to escape itself, in the
> > event
> > > that a field name contains a backslash. I think this is more intuitive
> > and
> > > simpler than, e.g., adding a new config property to toggle the
> delimiter
> > to
> > > be used when parsing nested field references.
> > >
> >
> > I like this approach. Adding to the KIP.
> >
> >
> > >
> > > 8. I don't think we can unconditionally turn this feature on. The risk
> of
> > > breaking existing pipelines (especially ones that involve, for
> example, a
> > > combination of the Flatten and Cast SMTs) is pretty high. I think this
> > > should be an opt-in feature, at least until the next major release. One
> > way
> > > we could accomplish this is by introducing a new "field.style" (name
> > > obviously subject to change) property with values of "plain" (default)
> > and
> > > "nested". If set to "plain" then the current non-nested behavior is
> used,
> > > and if set to "nested", then the proposed nested behavior is. We can
> > > consider updating the default value to "nested" in a future major
> release
> > > (or even codify that plan in this KIP, if there's enough support for
> it).
> > > This would also leave the door open for adding recursive support to
> SMTs
> > in
> > > the future by adding a permitted value of "recursive".
> > >
> > > 9. One of the linked tickets in the "Motivation" section,
> > > https://issues.apache.org/jira/browse/KAFKA-10640, has an open PR and
> > KIP
> > > that propose adding recursive support to some SMTs. Have you considered
> > > this type of functionality for your KIP? Or is your aim to stick solely
> > to
> > > nested field support?
> > >
> >
> > I like the `field.style` configuration flag approach.
> > Thanks for pointing out the `recursive` approach. Will add `nested` at
> the
> > moment, let's check the demand for `recursive` to consider it as part of
> > this or another KIP.
> >
> > I have added the following on the KIP:
> >
> > ```
> > Future KIPs could extend this support for:
> >
> > - Recursive notation: name a field and apply it to all fields across the
> > schema matching that name.
> > - Access to arrays: Adding `[]` notation to represent access to arrays
> and
> > applying SMTs to fields within an array.
> > ```
> >
> >
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Tue, Feb 8, 2022 at 1:23 PM Jorge Esteban Quilcate Otoya <
> > > quilcate.jo...@gmail.com> wrote:
> > >
> > > > Hi Dev team,
> > > >
> > > > I'd like to start a new discussion thread on Kafka Connect KIP-821:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> > > >
> > > > This KIP is aimed to include support for nested structures on the
> > > existing
> > > > SMTs — where this make sense.
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > Regards,
> > > > Jorge.
> > > >
> > >
> >
>

Reply via email to