Hi Chris,

I applied some code fix according your second reviews.
https://github.com/apache/kafka/pull/12219/commits/f673ea2eae0d907502e44c0ecd53b616386627bf


1. [applied] changed name as ParseStructByRegex

2. [applied] throw DataException, when a line that the SMT sees doesn't
match the regex...
originally, it will be skipped if no data match with regex,
but change code to throw DataException according to your review

3. [already applied]
I already delete code about desc ":{TYPE}" with commit below
https://github.com/apache/kafka/pull/12219/commits/534d995b3e6371c37443eb72eee03884cb23c85d

4. [need discuss]
In my use case about log data collection,
I configured the pipeline below
nginx => filebeat => kafka => kafka connect es connector => es

filebeat ( or most other log collector ) usually send value as struct ( not
plaintext ) with collector meta data,
and the key name as "message" ( in case filebeat )

I think there are more use cases log message wrapped struct value than
plain text value,
and it seems the chained transform with extractField SMT + plainText value
ParseStructByRegex make the same result with struct value
ParseSturctByRegex,
but it will drop collector meta data during extractField ( I think.. )
and also in almost case, users will use ParseStructByRegex SMT with
extractField SMT

5. [applied] throw DataException, when there are difference between group
size and mapping names size


6. [ question ]
you mean, use RegexValidator class already exist?
without group "(.*)" pattern check, it will not provide early detection
about regex config mistake,
If you think it is enough as runtime DataException detection?

always thanks.

whsoul

2022년 6월 8일 (수) 오전 9:46, Chris Egerton <fearthecel...@gmail.com>님이 작성:

> Hi whsoul,
>
> Thanks for the updates. I have a few more thoughts but this is looking
> pretty good:
>
> 1. The name "ToStructByRegexTransform" is a little unwieldy. What do you
> think about something shorter like ParseStruct, ParseRegex, or
> ParseStructByRegex?
>
> 2. What happens if a line that the SMT sees doesn't match the regex
> supplied by the user? Will it throw an exception, silently skip the message
> and return it as-is, allow the user to configure it to do either, something
> else entirely? (I'd personally lean towards just throwing an exception
> since users can configure regexes to be lenient via the optional
> quantifier, i.e. "?")
>
> 3. The description for the "regex" property still includes the "( with
> :{TYPE} )" snippet; should that be removed?
>
> 4. Is it worth adding support to this SMT to operate on an individual field
> of a message? I.e., you specify a "field" of "log_line" and the SMT expects
> to see a struct or a map with a field/key of "log_line" and parses that
> instead of the entire message. If so, it might be worth specifying that
> this property would follow any precedents set by KIP-821 [1] so that nested
> fields could be accessed instead of just top-level fields.
>
> 5. What happens if the user specifies more names in the "mapping" property
> than there are groups in the "regex" property?
>
> 6. (Nit) I don't think the GroupRegexValidator class needs to be called out
> as part of the changes to public interface if it's just going to be used by
> this transform.
>
> [1] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
>
> Cheers,
>
> Chris
>
> On Tue, Jun 7, 2022 at 4:47 AM gyejun choi <whsou...@gmail.com> wrote:
>
> > Hi Chris,
> >
> > Thank you for your positive feed back,
> > And sorry about late reply ( I didn’t recognize your reply email… TT )
> >
> > And I update KIP and PR with your review
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex
> >
> > 1. typecast function removed
> > 2. struct.field removed
> > 3. Proposed Interface changed
> >
> >
> > I will wait for you second feed back
> >
> > Thanks~
> >
> > On 2021/07/15 15:57:14 Chris Egerton wrote:
> > > Hi whsoul,
> > >
> > > Thanks for the KIP. The two use cases you identified seem quite
> > appealing,
> > > especially the first one: parsing structured log messages.
> > >
> > > Here are my initial thoughts on the proposed design:
> > >
> > > 1. I wonder if it's necessary to include support for type casting with
> > this
> > > SMT. We already have a Cast SMT (
> > >
> >
> >
> https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Cast.java
> > )
> > > that can parse multiple fields of a structured record value with
> > differing
> > > types. Would it be enough for your new SMT to only produce string
> values
> > > for its structured data, and then allow users to perform casting logic
> > > using the Cast SMT afterward?
> > >
> > > 2. It seems like the "struct.field" property is similar; based on the
> > > examples, it looks like when the SMT is configured with a value for
> that
> > > property, it will first pull out a field from a structured record value
> > > (for example, it would pull out the value "
> > > https://kafka.apache.org/documentation/#connect"; from a map of
> {"url": "
> > > https://kafka.apache.org/documentation/#connect"}), then parse that
> > field's
> > > value, and replace the entire record value (or key) with the result of
> > the
> > > parsing stage. It seems like this could be accomplished using the
> > > ExtractField SMT (
> > >
> >
> >
> https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/ExtractField.java
> > )
> > > as a preliminary step before passing it to your new SMT. Is this
> correct?
> > > And if so, could we simplify the interface for your SMT by removing the
> > > "struct.field" property in favor of the existing ExtractField SMT?
> > >
> > > 3. (Nit) I believe the property names in the table beneath the
> "Proposed
> > > Interfaces" section should have the "transforms.RegexTransform." prefix
> > > removed, since users will be able to configure the SMT with any name,
> not
> > > just "RegexTransform". This keeps in line with similar KIPs such as
> > > KIP-437, which added a new property to the MaskField SMT (
> > >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT
> > > ).
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Thu, Jul 15, 2021 at 7:56 AM gyejun choi <wh...@gmail.com> wrote:
> > >
> > > > is there someone who any feed back about this?
> > > >
> > > > 2020년 10월 23일 (금) 오후 2:56, gyejun choi <wh...@gmail.com>님이 작성:
> > > >
> > > > > Hi,
> > > > >
> > > > > I've opened KIP-678 which is intended to provide a new SMT in Kafka
> > > > Connect.
> > > > > I'd be grateful for any
> > > > > feedback:
> > > >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex==
> > > > >
> > > > > thanks,
> > > > >
> > > > > whsoul
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to