David Burgos. Isban (Banco Santander) created FLUME-2800:
------------------------------------------------------------

             Summary: Multiline log events for Taildir Source
                 Key: FLUME-2800
                 URL: https://issues.apache.org/jira/browse/FLUME-2800
             Project: Flume
          Issue Type: Improvement
          Components: Sinks+Sources
    Affects Versions: v1.6.0, v1.7.0
            Reporter: David Burgos. Isban (Banco Santander)
            Priority: Minor


This a proposal of implementation to handle multiline log messages for new 
tailing source FLUME-2498.
Based on an idea FLUME-2779 MultiLine Deserializer for Spooling DIrectory 
Source.

Config.
* multiLineRegex: Regular expression to handle multiline log messages (grok 
expressions permitted)
* grokDictionaryDir: Custom Grok dictionaries directory
* maxNumberLines: Max number of lines per event in multiline log messages. 
Default 100. Remaining lines is never transferred to sink. 

For Regex expressions use joni regex engine which can be twice as fast as the 
Java regex engine and will be more efficient, producing less object churn while 
scanning, because it operates natively on byte arrays.
https://github.com/jruby/joni

Include a functionality for extracting grok expressions into a pure named regex 
(inspired by the logstash inteceptor)
By default load the included built-in grok dictionaries with pre-defined 
patterns.
https://github.com/aicer/grok


Attached patch includes a config documentation and unit tests.
Also attached a completed port/patch for Flume 1.6 a Java 1.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to