[ 
https://issues.apache.org/jira/browse/METRON-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407606#comment-16407606
 ] 

ASF GitHub Bot commented on METRON-1496:
----------------------------------------

Github user kevin91nl commented on the issue:

    https://github.com/apache/metron/pull/969
  
    The package naming is adjusted in the latest commit.
    
    **Chainparsers versus Stellar**
    Here I will present some of the issues we encountered when writing the 
parsers. I am interested in your thoughts :-).
    
    _Keeping memory_
    In one of our to-be-written parsers, we need to read an external file, 
parse the file and keep the parsed file in memory. As far as I can see, Stellar 
is stateless and not capable of keeping data into memory. It might be possible 
by using environment variables, but this approach might get messy.
    
    _Performance_
    A ChainParser does not need to interpret a language. We found that 
interpreting a language was a performance bottleneck in the parsers. In fact, 
we implemented a template engine in one of our links, but that was breaking the 
throughput rate of the parsers. Therefore, we removed this mechanism.
    
    _Tests_
    There is no clear way how to write end-to-end tests for parsers involving 
Stellar. Using the ChainParser approach, the data files (input and expected 
output) and parser config are kept in one place and it is easy to add new tests 
for a parser.
    
    _Sharing configuration_
    When parsers are created using Stellar, there is no clear way how the 
parser configuration files are shared.
    
    _Post processing_
    Note that Stellar can still be used for post-processing the parser output. 
Therefore, parsers can fully rely on all the Stellar functionality.


> ChainLink Parser to reuse parser code at parserConfig level
> -----------------------------------------------------------
>
>                 Key: METRON-1496
>                 URL: https://issues.apache.org/jira/browse/METRON-1496
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Bas van de Lustgraaf
>            Priority: Minor
>
> During the development of some custom parsers we wrote a couple of classes / 
> functions to make it possible to reuse code and assemble parser quicker at 
> java coding level.
> We took this idea one step further and created the so called ChainLinkParser.
> This parser gives user without any java knowledge the opportunity to assemble 
> parsers at parser configuration level.
> We would like to discuss the code and see if it can be submitted to the 
> project. We will create a PR during this week to submit the code for review 
> and discussion.
> Below you'll find an example of our Parser configuration for Suricata, which 
> is using our ChainParser. 
>  
> {noformat}
> {
>    "parserClassName":"nl.qsight.chainparser.ChainParser",
>    "sensorTopic":"suricata",
>    "readMetadata":true,
>    "mergeMetadata":true,
>    "numWorkers":3,
>    "numAckers":3,
>    "spoutParallelism":6,
>    "spoutNumTasks":6,
>    "parserParallelism":20,
>    "parserNumTasks":20,
>    "errorWriterParallelism":1,
>    "errorWriterNumTasks":1,
>    "spoutConfig":{
>       "spout.firstPollOffsetStrategy":"LATEST"
>    },
>    "stormConfig":{
>       "topology.max.spout.pending":2000
>    },
>    "parserConfig":{
>       "chain":[
>          "parse_json",
>          "parse_username",
>          "rename_fields",
>          "parse_datetime"
>       ],
>       "parsers":{
>          "parse_json":{
>             "class":"nl.qsight.links.io.JSONDecoderLink"
>          },
>          "parse_username":{
>             "class":"nl.qsight.links.io.RegexLink",
>             "pattern":"(?i)(user|username|log)[=:](\\w+)",
>             "selector":{
>                "username":"2"
>             },
>             "input":"{{payload_printable}}"
>          },
>          "rename_fields":{
>             "class":"nl.qsight.links.fields.RenameLink",
>             "rename":{
>                "proto":"protocol",
>                "dest_ip":"ip_dst_addr",
>                "src_ip":"ip_src_addr",
>                "dest_port":"ip_dst_port",
>                "src_port":"ip_src_port"
>             }
>          },
>          "parse_datetime":{
>             "class":"nl.qsight.links.io.TimestampLink",
>             "patterns":[
>                [
>                   
> "([0-9]{4})-([0-9]+)-([0-9]+)T([0-9]+):([0-9]+):([0-9]+).([0-9]+)([+-]{1}[0-9]{1,2}[:]?[0-9]{2})",
>                   "yyyy MM dd HH mm ss SSSSSS Z",
>                   "newest"
>                ]
>             ],
>             "input":"{{timestamp}}"
>          }
>       }
>    }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to