[ 
https://issues.apache.org/jira/browse/FLUME-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780509#comment-13780509
 ] 

wolfgang hoschek commented on FLUME-1988:
-----------------------------------------

Splitting an input stream into events in a configurable and extensible way 
sounds like a good idea. 

An alternative way would be to address this problem (and many similar problems) 
by writing a MorphlineDeserializer that implements a java.io.InputStream on top 
of the SpoolingDirectorySource, then have that MorphlineDeserializer feed that 
InputStream into a configurable morphline which in turn contains a 
readMultiLine command. Then you can easily replace the readMultiLine with a 
command that splits on a character sequence, etc, etc. There are many other 
flavours of the same byte stream -> event splitting theme, and this way 
individual commands can be composed together in a morphline which makes them 
more powerful, flexible and reusable. 

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#readMultiLine

                
> Add Support for Additional Deserializers for SpoolingDirectorySource
> --------------------------------------------------------------------
>
>                 Key: FLUME-1988
>                 URL: https://issues.apache.org/jira/browse/FLUME-1988
>             Project: Flume
>          Issue Type: New Feature
>          Components: Docs, Sinks+Sources
>    Affects Versions: v1.4.0
>            Reporter: Israel Ekpo
>            Assignee: Israel Ekpo
>              Labels: serializers
>         Attachments: EventDeserializerType.java, 
> RegexDelimiterDeSerializer.java, ResettableTestStringInputStream.java, 
> TestRegexDelimiterDeSerializer.java
>
>
> There are certain use cases for SpoolingDirectorySource where the events in 
> the log file are not delimited with newline characters.
> Certain log files that contain stack traces, xml documents and pretty JSON 
> strings seem to contain multiple new line characters within each event.
> We can use alternative logic such as specific characters, strings or regular 
> expressions to determine when the event is complete.
> Hence I am proposing the following new deserializers based on 
> org.apache.flume.serialization.LineDeserializer
> # org.apache.flume.serialization.RegexDelimiterDeSerializer
> Allows the user to specify a regular expression that is a delimiter for 
> events within the log file
> # org.apache.flume.serialization.CharSequenceDelimiterDeSerializer
> Allows the user to specify a comma separated character sequence that is a 
> delimiter for events within the log file
> The user will specify an integer for the ascii characters and we will use 
> that as the delimter.
> For example support for \r\n could be specified as 13,10
> A list of codes is available at http://www.asciitable.com/
> We will also need to update the user guide with examples on how to configure 
> and specify a custom deserializer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to