Hello Chris, Long time no feed back about this,
do I have any work left to do to proceed with discuss and vote? I will waiting for your guide, thanks, always whsoul 2022년 6월 14일 (화) 오후 9:29, gyejun choi <whsou...@gmail.com>님이 작성: > Hi Chris > > I updated KIP document > > https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex > ) > about discussion and changes before, > > and commited some code according to your review. > > https://github.com/apache/kafka/pull/12219/commits/1541a538eaff676441f125c400a7807d17f2e138 > > I understand you mean the simplicity is more important with SMT ( because > it is Simple Message Trasform.. ) > so I removed about "message" structured format input support > > and also move GroupRegexValidator to inner private class. > > thanks, always > > whsoul > > 2022년 6월 14일 (화) 오전 11:30, Chris Egerton <fearthecel...@gmail.com>님이 작성: > >> Hi whsoul, >> >> Would you mind updating the KIP document ( >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex >> ) >> with all of these changes? We tend to discuss and vote on what's included >> in the Wiki as our source of truth, as opposed to pull requests. >> >> RE 4: >> > it seems the chained transform with extractField SMT + plainText value >> ParseStructByRegex make the same result with struct value >> ParseSturctByRegex, but it will drop collector meta data during >> extractField ( I think.. ) >> This is exactly right--using ExtractField + ParseStructByRegex will >> replace >> the record key/value with the parsed struct, and drop everything else. Do >> you think it'd increase the implementation complexity significantly to add >> support for users to specify a field to operate on, which would preserve >> the rest of the record key/value as-is? I'm not so sure that everyone is >> going to want to drop all other metadata from their messages, but if >> there's something that makes this particularly difficult to implement (as >> compared to the other SMTs that are already included out of the box with >> Kafka Connect), then we could probably leave it out for now. >> >> RE 6: >> It's a tricky distinction, but what I mean is that, although we might add >> a >> class named GroupRegexValidator and use that class in our SMT library, >> unless it's part of the public interface we're trying to change, it's an >> implementation detail and we don't have to call it out in the KIP. The >> advantage to leaving it out is that it makes the KIP more concise and >> therefore easier to review, you can choose to rename, remove, decompose, >> etc. the class in your PR without having to worry about sticking to the >> plan in the KIP that everyone reviewed and voted on. It's a minor detail >> though, I'm noting it here more because it may be useful when writing >> future KIPs than because it's necessary to adhere to strictly in this one. >> >> Cheers, >> >> Chris >> >> On Fri, Jun 10, 2022 at 8:13 AM gyejun choi <whsou...@gmail.com> wrote: >> >> > Hi Chris, >> > >> > I applied some code fix according your second reviews. >> > >> > >> https://github.com/apache/kafka/pull/12219/commits/f673ea2eae0d907502e44c0ecd53b616386627bf >> > >> > >> > 1. [applied] changed name as ParseStructByRegex >> > >> > 2. [applied] throw DataException, when a line that the SMT sees doesn't >> > match the regex... >> > originally, it will be skipped if no data match with regex, >> > but change code to throw DataException according to your review >> > >> > 3. [already applied] >> > I already delete code about desc ":{TYPE}" with commit below >> > >> > >> https://github.com/apache/kafka/pull/12219/commits/534d995b3e6371c37443eb72eee03884cb23c85d >> > >> > 4. [need discuss] >> > In my use case about log data collection, >> > I configured the pipeline below >> > nginx => filebeat => kafka => kafka connect es connector => es >> > >> > filebeat ( or most other log collector ) usually send value as struct ( >> not >> > plaintext ) with collector meta data, >> > and the key name as "message" ( in case filebeat ) >> > >> > I think there are more use cases log message wrapped struct value than >> > plain text value, >> > and it seems the chained transform with extractField SMT + plainText >> value >> > ParseStructByRegex make the same result with struct value >> > ParseSturctByRegex, >> > but it will drop collector meta data during extractField ( I think.. ) >> > and also in almost case, users will use ParseStructByRegex SMT with >> > extractField SMT >> > >> > 5. [applied] throw DataException, when there are difference between >> group >> > size and mapping names size >> > >> > >> > 6. [ question ] >> > you mean, use RegexValidator class already exist? >> > without group "(.*)" pattern check, it will not provide early detection >> > about regex config mistake, >> > If you think it is enough as runtime DataException detection? >> > >> > always thanks. >> > >> > whsoul >> > >> > 2022년 6월 8일 (수) 오전 9:46, Chris Egerton <fearthecel...@gmail.com>님이 작성: >> > >> > > Hi whsoul, >> > > >> > > Thanks for the updates. I have a few more thoughts but this is looking >> > > pretty good: >> > > >> > > 1. The name "ToStructByRegexTransform" is a little unwieldy. What do >> you >> > > think about something shorter like ParseStruct, ParseRegex, or >> > > ParseStructByRegex? >> > > >> > > 2. What happens if a line that the SMT sees doesn't match the regex >> > > supplied by the user? Will it throw an exception, silently skip the >> > message >> > > and return it as-is, allow the user to configure it to do either, >> > something >> > > else entirely? (I'd personally lean towards just throwing an exception >> > > since users can configure regexes to be lenient via the optional >> > > quantifier, i.e. "?") >> > > >> > > 3. The description for the "regex" property still includes the "( with >> > > :{TYPE} )" snippet; should that be removed? >> > > >> > > 4. Is it worth adding support to this SMT to operate on an individual >> > field >> > > of a message? I.e., you specify a "field" of "log_line" and the SMT >> > expects >> > > to see a struct or a map with a field/key of "log_line" and parses >> that >> > > instead of the entire message. If so, it might be worth specifying >> that >> > > this property would follow any precedents set by KIP-821 [1] so that >> > nested >> > > fields could be accessed instead of just top-level fields. >> > > >> > > 5. What happens if the user specifies more names in the "mapping" >> > property >> > > than there are groups in the "regex" property? >> > > >> > > 6. (Nit) I don't think the GroupRegexValidator class needs to be >> called >> > out >> > > as part of the changes to public interface if it's just going to be >> used >> > by >> > > this transform. >> > > >> > > [1] - >> > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures >> > > >> > > Cheers, >> > > >> > > Chris >> > > >> > > On Tue, Jun 7, 2022 at 4:47 AM gyejun choi <whsou...@gmail.com> >> wrote: >> > > >> > > > Hi Chris, >> > > > >> > > > Thank you for your positive feed back, >> > > > And sorry about late reply ( I didn’t recognize your reply email… >> TT ) >> > > > >> > > > And I update KIP and PR with your review >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex >> > > > >> > > > 1. typecast function removed >> > > > 2. struct.field removed >> > > > 3. Proposed Interface changed >> > > > >> > > > >> > > > I will wait for you second feed back >> > > > >> > > > Thanks~ >> > > > >> > > > On 2021/07/15 15:57:14 Chris Egerton wrote: >> > > > > Hi whsoul, >> > > > > >> > > > > Thanks for the KIP. The two use cases you identified seem quite >> > > > appealing, >> > > > > especially the first one: parsing structured log messages. >> > > > > >> > > > > Here are my initial thoughts on the proposed design: >> > > > > >> > > > > 1. I wonder if it's necessary to include support for type casting >> > with >> > > > this >> > > > > SMT. We already have a Cast SMT ( >> > > > > >> > > > >> > > > >> > > >> > >> https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Cast.java >> > > > ) >> > > > > that can parse multiple fields of a structured record value with >> > > > differing >> > > > > types. Would it be enough for your new SMT to only produce string >> > > values >> > > > > for its structured data, and then allow users to perform casting >> > logic >> > > > > using the Cast SMT afterward? >> > > > > >> > > > > 2. It seems like the "struct.field" property is similar; based on >> the >> > > > > examples, it looks like when the SMT is configured with a value >> for >> > > that >> > > > > property, it will first pull out a field from a structured record >> > value >> > > > > (for example, it would pull out the value " >> > > > > https://kafka.apache.org/documentation/#connect" from a map of >> > > {"url": " >> > > > > https://kafka.apache.org/documentation/#connect"}), then parse >> that >> > > > field's >> > > > > value, and replace the entire record value (or key) with the >> result >> > of >> > > > the >> > > > > parsing stage. It seems like this could be accomplished using the >> > > > > ExtractField SMT ( >> > > > > >> > > > >> > > > >> > > >> > >> https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/ExtractField.java >> > > > ) >> > > > > as a preliminary step before passing it to your new SMT. Is this >> > > correct? >> > > > > And if so, could we simplify the interface for your SMT by >> removing >> > the >> > > > > "struct.field" property in favor of the existing ExtractField SMT? >> > > > > >> > > > > 3. (Nit) I believe the property names in the table beneath the >> > > "Proposed >> > > > > Interfaces" section should have the "transforms.RegexTransform." >> > prefix >> > > > > removed, since users will be able to configure the SMT with any >> name, >> > > not >> > > > > just "RegexTransform". This keeps in line with similar KIPs such >> as >> > > > > KIP-437, which added a new property to the MaskField SMT ( >> > > > > >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT >> > > > > ). >> > > > > >> > > > > Cheers, >> > > > > >> > > > > Chris >> > > > > >> > > > > On Thu, Jul 15, 2021 at 7:56 AM gyejun choi <wh...@gmail.com> >> wrote: >> > > > > >> > > > > > is there someone who any feed back about this? >> > > > > > >> > > > > > 2020년 10월 23일 (금) 오후 2:56, gyejun choi <wh...@gmail.com>님이 작성: >> > > > > > >> > > > > > > Hi, >> > > > > > > >> > > > > > > I've opened KIP-678 which is intended to provide a new SMT in >> > Kafka >> > > > > > Connect. >> > > > > > > I'd be grateful for any >> > > > > > > feedback: >> > > > > > >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex== >> > > > > > > >> > > > > > > thanks, >> > > > > > > >> > > > > > > whsoul >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >