Thanks, Chris! 1. I'd argue "this..field.child" could be harder to grasp than "this.field/child" + separator: "/". Even though this represents additional information, it follows a similar approach as the "Flatten#delimeter" configuration. I want to give the separator approach another try, so I have updated the KIP with the separator proposal, sticking to only 2 alternatives that should hopefully cover most scenarios.
2. Agree. KIP has been updated with this improvement. 3. You're right. I have updated this section accordingly. 4. Good catch! I've replaced it with `DropHeaders`. Looking forward to your feedback. Thanks, Jorge. On Wed, 9 Mar 2022 at 21:33, Chris Egerton <fearthecel...@gmail.com> wrote: > Hi Jorge, > > Looking good! Got a few more thoughts. > > 1. Sorry to revisit this, but I think we may want to adopt a slightly > different escape syntax style. Backslashes are great, but since they're > already used by JSON, using them as an escape sequence in field notation > would also lead to some pretty ugly connector configs. Anyone who's had to > write regular expressions with backslashes in Java is probably already > familiar with this: "this\\\\.is\\\\.not\\\\.very\\\\.readable". What do > you think about using the dot character to escape itself? In other words, > to access a single field named "this.field", instead of using the syntax > "this\.field" (which in JSON would have to be expressed as "this\\.field"), > we could use "this..field", and for a single field named "this\field", > instead of using the syntax "this\\field" (or, in JSON, "this\\\\field"), > we could use "this\field" (or, in JSON, "this\\field"). > > 2. Could you flesh out the details on the new "field.style" property, > including the type, default value, importance, and a preliminary docstring? > See > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors#KIP618:ExactlyOnceSupportforSourceConnectors-Newproperties > for an example. > > 3. Is the "Compatibility, Deprecation, and Migration Plan" section still > accurate after the latest update? Seems like it's still written with the > assumption that nested field syntax will be hardcoded or opt-in, which IIUC > isn't the case anymore. > > 4. Nit: The "These SMTs do not require nested structure support" section > mentions a "Drop" SMT. I think this may be referring to the Confluent Drop > SMT, which isn't a part of Apache Kafka. Should we drop (heh) that SMT from > the list? Or perhaps just replace it with "DropHeaders", which is currently > missing from the list and shouldn't require any nested-field related > updates? > > Cheers, > > Chris > > On Mon, Feb 28, 2022 at 2:12 PM Jorge Esteban Quilcate Otoya < > quilcate.jo...@gmail.com> wrote: > > > Thank you, Chris! and sorry for the delayed response. > > > > Please, find my comments below: > > > > On Mon, 14 Feb 2022 at 17:34, Chris Egerton <chr...@confluent.io.invalid > > > > wrote: > > > > > Hi Jorge, > > > > > > Thanks for the KIP! I'd love to see support for nested fields added to > > the > > > out-of-the-box SMTs provided with Connect. Here are my initial > thoughts: > > > > > > 1. I agree that there's a case to be made for expanding HoistField > with a > > > new config property for identifying a nested, to-be-hoisted field, but > > the > > > example in the KIP doesn't really demonstrate why this would be > > valuable. I > > > think it'd be helpful to expand the example to add other fields in > order > > to > > > show how adding nested field support enables users to hoist a nested > > field > > > without dropping other fields from the value. Maybe something like > this: > > > > > > source = nested.val > > > field = line > > > > > > value (before): > > > { > > > "nested": { > > > "val": 42, > > > "other val": 96 > > > } > > > } > > > > > > value (after): > > > { > > > "nested": { > > > "line": { > > > "val": 42, > > > } > > > "other val": 96 > > > } > > > } > > > > > > 2. Nit: I think "source" is a little strange for the new HoistField > > > property name. Maybe "hoisted" or "hoisted.field" would be more > > > descriptive? > > > > > > > > About 1. and 2.: > > Agree. The example for this SMT is updated and have added the `hoisted` > > configuration. > > > > > > > 3. Is there a reasonable use case for expanding Flatten to be able to > > > flatten specific fields? My understanding is that it's mostly useful > for > > > writing to systems like databases that don't support nested values and > > > require everything to be a flat list of key-value pairs. Being able to > > > flatten a nested field wouldn't provide any advantage for that use > case. > > > Are there other cases where it would? > > > > > > 4. I don't think we should unconditionally change the default delimiter > > for > > > Flatten. It's a backwards-incompatible, breaking change that could > cause > > > headaches for users. It might be reasonable to change the default value > > > dynamically based on whether the user has specified a value for the > > "field" > > > property, but considering the motivation for changing the default is > that > > > it creates conflicts with the to-be-introduced nested field syntax > (which > > > could arise with downstream SMTs regardless of whether the user has > > > explicitly configured Flatten with the "field" property), I don't know > > that > > > this would be too useful either. I have some thoughts below on how to > > > handle possible conflicts between names with dots in their names and > > dotted > > > syntax for nested field references that should hopefully make either > > change > > > unnecessary. > > > > > > > > Fair enough. With the support for nested fields in other SMTs, Flatten > > could stay as it is. > > This removes the need for (4) changing Flatten config as well. > > > > > > > 5. I think it's fine to expand ExtractField to support nested notation, > > but > > > it might be worth noting in the rejected alternatives section that this > > > isn't strictly necessary since you can replace any single invocation of > > > that SMT that uses nested field notation with multiple invocations of > it > > > that use non-nested notation. > > > > > > > Agree. Adding it. > > > > > > > > > > 6. Nit: "RegerRouter" should be "RegexRouter" in the list of SMTs that > do > > > not require nested structure support. > > > > > > > > Ack. Fixing it. > > > > > > > 7. It may be rare for dots in field names to occur in the wild > (although > > I > > > wouldn't be so certain of this), but unless we want to inflict > headaches > > on > > > users of Flatten, I think we're going to have to think about conflicts > > > between dotted notation and non-nested fields whose names contain > dots. I > > > don't think this is actually such a bad thing, though. I agree that > > dotted > > > notation is intuitive and pretty commonplace (in tools like jq, for > > > example), so I'd like it if we could stick to it. What about > introducing > > > escape syntax, using a backslash? That way, users could disambiguate > > > between "this.field" (which would refer to the nested field "field" > under > > > the top-level "this" field), and "this\.field" (which would refer to > the > > > field named "this.field"). Like with most languages that use the > > backslash > > > for escape sequences, it could also be used to escape itself, in the > > event > > > that a field name contains a backslash. I think this is more intuitive > > and > > > simpler than, e.g., adding a new config property to toggle the > delimiter > > to > > > be used when parsing nested field references. > > > > > > > I like this approach. Adding to the KIP. > > > > > > > > > > 8. I don't think we can unconditionally turn this feature on. The risk > of > > > breaking existing pipelines (especially ones that involve, for > example, a > > > combination of the Flatten and Cast SMTs) is pretty high. I think this > > > should be an opt-in feature, at least until the next major release. One > > way > > > we could accomplish this is by introducing a new "field.style" (name > > > obviously subject to change) property with values of "plain" (default) > > and > > > "nested". If set to "plain" then the current non-nested behavior is > used, > > > and if set to "nested", then the proposed nested behavior is. We can > > > consider updating the default value to "nested" in a future major > release > > > (or even codify that plan in this KIP, if there's enough support for > it). > > > This would also leave the door open for adding recursive support to > SMTs > > in > > > the future by adding a permitted value of "recursive". > > > > > > 9. One of the linked tickets in the "Motivation" section, > > > https://issues.apache.org/jira/browse/KAFKA-10640, has an open PR and > > KIP > > > that propose adding recursive support to some SMTs. Have you considered > > > this type of functionality for your KIP? Or is your aim to stick solely > > to > > > nested field support? > > > > > > > I like the `field.style` configuration flag approach. > > Thanks for pointing out the `recursive` approach. Will add `nested` at > the > > moment, let's check the demand for `recursive` to consider it as part of > > this or another KIP. > > > > I have added the following on the KIP: > > > > ``` > > Future KIPs could extend this support for: > > > > - Recursive notation: name a field and apply it to all fields across the > > schema matching that name. > > - Access to arrays: Adding `[]` notation to represent access to arrays > and > > applying SMTs to fields within an array. > > ``` > > > > > > > > > > Cheers, > > > > > > Chris > > > > > > On Tue, Feb 8, 2022 at 1:23 PM Jorge Esteban Quilcate Otoya < > > > quilcate.jo...@gmail.com> wrote: > > > > > > > Hi Dev team, > > > > > > > > I'd like to start a new discussion thread on Kafka Connect KIP-821: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures > > > > > > > > This KIP is aimed to include support for nested structures on the > > > existing > > > > SMTs — where this make sense. > > > > > > > > Looking forward to your feedback. > > > > > > > > Regards, > > > > Jorge. > > > > > > > > > >