Hi Chris

I updated KIP document
https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex
)
about discussion and changes before,

and commited some code according to your review.
https://github.com/apache/kafka/pull/12219/commits/1541a538eaff676441f125c400a7807d17f2e138

I understand you mean the simplicity is more important with SMT ( because
it is Simple Message Trasform.. )
so I removed about "message" structured format input support

and also move GroupRegexValidator to inner private class.

thanks, always

whsoul

2022년 6월 14일 (화) 오전 11:30, Chris Egerton <fearthecel...@gmail.com>님이 작성:

> Hi whsoul,
>
> Would you mind updating the KIP document (
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex
> )
> with all of these changes? We tend to discuss and vote on what's included
> in the Wiki as our source of truth, as opposed to pull requests.
>
> RE 4:
> > it seems the chained transform with extractField SMT + plainText value
> ParseStructByRegex make the same result with struct value
> ParseSturctByRegex, but it will drop collector meta data during
> extractField ( I think.. )
> This is exactly right--using ExtractField + ParseStructByRegex will replace
> the record key/value with the parsed struct, and drop everything else. Do
> you think it'd increase the implementation complexity significantly to add
> support for users to specify a field to operate on, which would preserve
> the rest of the record key/value as-is? I'm not so sure that everyone is
> going to want to drop all other metadata from their messages, but if
> there's something that makes this particularly difficult to implement (as
> compared to the other SMTs that are already included out of the box with
> Kafka Connect), then we could probably leave it out for now.
>
> RE 6:
> It's a tricky distinction, but what I mean is that, although we might add a
> class named GroupRegexValidator and use that class in our SMT library,
> unless it's part of the public interface we're trying to change, it's an
> implementation detail and we don't have to call it out in the KIP. The
> advantage to leaving it out is that it makes the KIP more concise and
> therefore easier to review, you can choose to rename, remove, decompose,
> etc. the class in your PR without having to worry about sticking to the
> plan in the KIP that everyone reviewed and voted on. It's a minor detail
> though, I'm noting it here more because it may be useful when writing
> future KIPs than because it's necessary to adhere to strictly in this one.
>
> Cheers,
>
> Chris
>
> On Fri, Jun 10, 2022 at 8:13 AM gyejun choi <whsou...@gmail.com> wrote:
>
> > Hi Chris,
> >
> > I applied some code fix according your second reviews.
> >
> >
> https://github.com/apache/kafka/pull/12219/commits/f673ea2eae0d907502e44c0ecd53b616386627bf
> >
> >
> > 1. [applied] changed name as ParseStructByRegex
> >
> > 2. [applied] throw DataException, when a line that the SMT sees doesn't
> > match the regex...
> > originally, it will be skipped if no data match with regex,
> > but change code to throw DataException according to your review
> >
> > 3. [already applied]
> > I already delete code about desc ":{TYPE}" with commit below
> >
> >
> https://github.com/apache/kafka/pull/12219/commits/534d995b3e6371c37443eb72eee03884cb23c85d
> >
> > 4. [need discuss]
> > In my use case about log data collection,
> > I configured the pipeline below
> > nginx => filebeat => kafka => kafka connect es connector => es
> >
> > filebeat ( or most other log collector ) usually send value as struct (
> not
> > plaintext ) with collector meta data,
> > and the key name as "message" ( in case filebeat )
> >
> > I think there are more use cases log message wrapped struct value than
> > plain text value,
> > and it seems the chained transform with extractField SMT + plainText
> value
> > ParseStructByRegex make the same result with struct value
> > ParseSturctByRegex,
> > but it will drop collector meta data during extractField ( I think.. )
> > and also in almost case, users will use ParseStructByRegex SMT with
> > extractField SMT
> >
> > 5. [applied] throw DataException, when there are difference between group
> > size and mapping names size
> >
> >
> > 6. [ question ]
> > you mean, use RegexValidator class already exist?
> > without group "(.*)" pattern check, it will not provide early detection
> > about regex config mistake,
> > If you think it is enough as runtime DataException detection?
> >
> > always thanks.
> >
> > whsoul
> >
> > 2022년 6월 8일 (수) 오전 9:46, Chris Egerton <fearthecel...@gmail.com>님이 작성:
> >
> > > Hi whsoul,
> > >
> > > Thanks for the updates. I have a few more thoughts but this is looking
> > > pretty good:
> > >
> > > 1. The name "ToStructByRegexTransform" is a little unwieldy. What do
> you
> > > think about something shorter like ParseStruct, ParseRegex, or
> > > ParseStructByRegex?
> > >
> > > 2. What happens if a line that the SMT sees doesn't match the regex
> > > supplied by the user? Will it throw an exception, silently skip the
> > message
> > > and return it as-is, allow the user to configure it to do either,
> > something
> > > else entirely? (I'd personally lean towards just throwing an exception
> > > since users can configure regexes to be lenient via the optional
> > > quantifier, i.e. "?")
> > >
> > > 3. The description for the "regex" property still includes the "( with
> > > :{TYPE} )" snippet; should that be removed?
> > >
> > > 4. Is it worth adding support to this SMT to operate on an individual
> > field
> > > of a message? I.e., you specify a "field" of "log_line" and the SMT
> > expects
> > > to see a struct or a map with a field/key of "log_line" and parses that
> > > instead of the entire message. If so, it might be worth specifying that
> > > this property would follow any precedents set by KIP-821 [1] so that
> > nested
> > > fields could be accessed instead of just top-level fields.
> > >
> > > 5. What happens if the user specifies more names in the "mapping"
> > property
> > > than there are groups in the "regex" property?
> > >
> > > 6. (Nit) I don't think the GroupRegexValidator class needs to be called
> > out
> > > as part of the changes to public interface if it's just going to be
> used
> > by
> > > this transform.
> > >
> > > [1] -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Tue, Jun 7, 2022 at 4:47 AM gyejun choi <whsou...@gmail.com> wrote:
> > >
> > > > Hi Chris,
> > > >
> > > > Thank you for your positive feed back,
> > > > And sorry about late reply ( I didn’t recognize your reply email… TT
> )
> > > >
> > > > And I update KIP and PR with your review
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex
> > > >
> > > > 1. typecast function removed
> > > > 2. struct.field removed
> > > > 3. Proposed Interface changed
> > > >
> > > >
> > > > I will wait for you second feed back
> > > >
> > > > Thanks~
> > > >
> > > > On 2021/07/15 15:57:14 Chris Egerton wrote:
> > > > > Hi whsoul,
> > > > >
> > > > > Thanks for the KIP. The two use cases you identified seem quite
> > > > appealing,
> > > > > especially the first one: parsing structured log messages.
> > > > >
> > > > > Here are my initial thoughts on the proposed design:
> > > > >
> > > > > 1. I wonder if it's necessary to include support for type casting
> > with
> > > > this
> > > > > SMT. We already have a Cast SMT (
> > > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Cast.java
> > > > )
> > > > > that can parse multiple fields of a structured record value with
> > > > differing
> > > > > types. Would it be enough for your new SMT to only produce string
> > > values
> > > > > for its structured data, and then allow users to perform casting
> > logic
> > > > > using the Cast SMT afterward?
> > > > >
> > > > > 2. It seems like the "struct.field" property is similar; based on
> the
> > > > > examples, it looks like when the SMT is configured with a value for
> > > that
> > > > > property, it will first pull out a field from a structured record
> > value
> > > > > (for example, it would pull out the value "
> > > > > https://kafka.apache.org/documentation/#connect"; from a map of
> > > {"url": "
> > > > > https://kafka.apache.org/documentation/#connect"}), then parse
> that
> > > > field's
> > > > > value, and replace the entire record value (or key) with the result
> > of
> > > > the
> > > > > parsing stage. It seems like this could be accomplished using the
> > > > > ExtractField SMT (
> > > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/ExtractField.java
> > > > )
> > > > > as a preliminary step before passing it to your new SMT. Is this
> > > correct?
> > > > > And if so, could we simplify the interface for your SMT by removing
> > the
> > > > > "struct.field" property in favor of the existing ExtractField SMT?
> > > > >
> > > > > 3. (Nit) I believe the property names in the table beneath the
> > > "Proposed
> > > > > Interfaces" section should have the "transforms.RegexTransform."
> > prefix
> > > > > removed, since users will be able to configure the SMT with any
> name,
> > > not
> > > > > just "RegexTransform". This keeps in line with similar KIPs such as
> > > > > KIP-437, which added a new property to the MaskField SMT (
> > > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT
> > > > > ).
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Thu, Jul 15, 2021 at 7:56 AM gyejun choi <wh...@gmail.com>
> wrote:
> > > > >
> > > > > > is there someone who any feed back about this?
> > > > > >
> > > > > > 2020년 10월 23일 (금) 오후 2:56, gyejun choi <wh...@gmail.com>님이 작성:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I've opened KIP-678 which is intended to provide a new SMT in
> > Kafka
> > > > > > Connect.
> > > > > > > I'd be grateful for any
> > > > > > > feedback:
> > > > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex==
> > > > > > >
> > > > > > > thanks,
> > > > > > >
> > > > > > > whsoul
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to