[ https://issues.apache.org/jira/browse/METRON-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407606#comment-16407606 ]
ASF GitHub Bot commented on METRON-1496: ---------------------------------------- Github user kevin91nl commented on the issue: https://github.com/apache/metron/pull/969 The package naming is adjusted in the latest commit. **Chainparsers versus Stellar** Here I will present some of the issues we encountered when writing the parsers. I am interested in your thoughts :-). _Keeping memory_ In one of our to-be-written parsers, we need to read an external file, parse the file and keep the parsed file in memory. As far as I can see, Stellar is stateless and not capable of keeping data into memory. It might be possible by using environment variables, but this approach might get messy. _Performance_ A ChainParser does not need to interpret a language. We found that interpreting a language was a performance bottleneck in the parsers. In fact, we implemented a template engine in one of our links, but that was breaking the throughput rate of the parsers. Therefore, we removed this mechanism. _Tests_ There is no clear way how to write end-to-end tests for parsers involving Stellar. Using the ChainParser approach, the data files (input and expected output) and parser config are kept in one place and it is easy to add new tests for a parser. _Sharing configuration_ When parsers are created using Stellar, there is no clear way how the parser configuration files are shared. _Post processing_ Note that Stellar can still be used for post-processing the parser output. Therefore, parsers can fully rely on all the Stellar functionality. > ChainLink Parser to reuse parser code at parserConfig level > ----------------------------------------------------------- > > Key: METRON-1496 > URL: https://issues.apache.org/jira/browse/METRON-1496 > Project: Metron > Issue Type: Improvement > Reporter: Bas van de Lustgraaf > Priority: Minor > > During the development of some custom parsers we wrote a couple of classes / > functions to make it possible to reuse code and assemble parser quicker at > java coding level. > We took this idea one step further and created the so called ChainLinkParser. > This parser gives user without any java knowledge the opportunity to assemble > parsers at parser configuration level. > We would like to discuss the code and see if it can be submitted to the > project. We will create a PR during this week to submit the code for review > and discussion. > Below you'll find an example of our Parser configuration for Suricata, which > is using our ChainParser. > > {noformat} > { > "parserClassName":"nl.qsight.chainparser.ChainParser", > "sensorTopic":"suricata", > "readMetadata":true, > "mergeMetadata":true, > "numWorkers":3, > "numAckers":3, > "spoutParallelism":6, > "spoutNumTasks":6, > "parserParallelism":20, > "parserNumTasks":20, > "errorWriterParallelism":1, > "errorWriterNumTasks":1, > "spoutConfig":{ > "spout.firstPollOffsetStrategy":"LATEST" > }, > "stormConfig":{ > "topology.max.spout.pending":2000 > }, > "parserConfig":{ > "chain":[ > "parse_json", > "parse_username", > "rename_fields", > "parse_datetime" > ], > "parsers":{ > "parse_json":{ > "class":"nl.qsight.links.io.JSONDecoderLink" > }, > "parse_username":{ > "class":"nl.qsight.links.io.RegexLink", > "pattern":"(?i)(user|username|log)[=:](\\w+)", > "selector":{ > "username":"2" > }, > "input":"{{payload_printable}}" > }, > "rename_fields":{ > "class":"nl.qsight.links.fields.RenameLink", > "rename":{ > "proto":"protocol", > "dest_ip":"ip_dst_addr", > "src_ip":"ip_src_addr", > "dest_port":"ip_dst_port", > "src_port":"ip_src_port" > } > }, > "parse_datetime":{ > "class":"nl.qsight.links.io.TimestampLink", > "patterns":[ > [ > > "([0-9]{4})-([0-9]+)-([0-9]+)T([0-9]+):([0-9]+):([0-9]+).([0-9]+)([+-]{1}[0-9]{1,2}[:]?[0-9]{2})", > "yyyy MM dd HH mm ss SSSSSS Z", > "newest" > ] > ], > "input":"{{timestamp}}" > } > } > } > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)