whsoul created KAFKA-9436:
-----------------------------

             Summary: New Kafka Connect SMT for plainText => Struct(or Map)
                 Key: KAFKA-9436
                 URL: https://issues.apache.org/jira/browse/KAFKA-9436
             Project: Kafka
          Issue Type: Improvement
          Components: KafkaConnect
            Reporter: whsoul


I'd like to parse and convert plain text rows to struct(or map) data, and load 
into documented database such as mongoDB, elasticSearch, etc... with SMT

 

For example

 

plain text apache log
{code:java}
"111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS 
/api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\"; 
\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, 
like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
{code}
SMT connect config with regular expression below can easily transform a plain 
text to struct (or map) data.

 
{code:java}
"transforms": "TimestampTopic, RegexTransform",
"transforms.RegexTransform.type": 
"org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",

"transforms.RegexTransform.struct.field": "message",
"transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) 
\\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) 
(.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""

"transforms.RegexTransform.mapping": 
"IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
{code}
 

I have PR about this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to