+1 This looks like it would be a great contribution. It might be worth having a look at this in the context of REGEX_ROUTING, which does a similar thing, but requires a very large number of disparate sensor configs. In that context, I would say this provides a good means of handling things like server syslog sources in particular.
It would be great to see a JIRA and PR on this. Discussion around any configuration specifics is probably easier around some code. Also, it would be really interesting to hear about any performance thoughts between something like this vs a complex pattern in Grok for instance, or the approach taken in the default ASA parser, which is really quite similar to this, but more 'coded in'. Simon On Mon, 27 Aug 2018 at 11:28, <jskar...@gmail.com> wrote: > Hello, > > > > We have implemented a general purpose regex parser for Metron that we are > interested in contributing back to the community. > > > > While the Metron Grok parser provides some regex based capability today, > the intention of this general purpose regex parser is to: > > 1. Allow for more advanced parsing scenarios (specifically, dealing with > multiple regex lines for devices that contain several log formats within > them) > 2. Give users and developers of Metron additional options for parsing > 3. With the new parser chaining and regex routing feature available in > Metron, this gives some additional flexibility to logically separate a > flow > by: > 1. Regex routing to segregate logs at a device level and handle > envelope unwrapping > 2. This general purpose regex parser to parse an entire device type > that contains multiple log formats within the single device (for > example, > RHEL logs) > > > > At a high level control flow is like this: > > 1. Identify the record type if incoming raw message. > > 2. Find and apply the regular expression of corresponding record type to > extract the fields (using named groups). > > 3. Apply the message header regex to extract the fields in the header part > of the message (using named groups). > > > The parser config uses the following structure: > > "recordTypeRegex": "(?<process>(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))" > > "messageHeaderRegex": "(?<syslogpriority>(?<=^<) > > \\d{1,4}(?=>)).*?(?<timestamp>(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?<syslogHost>(?<=\\s).*?(?=\\s)) > ", > > "fields": [ > > { > > "recordType": "kernel", > > "regex": ".*(?<eventInfo>(?<=\\]|\\w\\:).*?(?=$))" > > }, > > { > > "recordType": "syslog", > > "regex": > > ".*(?<processid>(?<=PID\\s=\\s).*?(?=\\sLine)).*(?<filePath>(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?<fileName>.*?(?=\")).*(?<eventInfo>(?<=\").*?(?=$))" > > } > > ] > > > > Where: > > - recordTypeRegex is used to distinctly identify a record type. It > inputs a valid regular expression and may also have named groups, which > would be extracted into fields. > - messageHeaderRegex is used to specify a regular expression to extract > fields from a message part which is common across all the messages (i.e, > syslog fields, standard headers) > - fields: json list of objects containing recordType and regex. The > expression that is evaluated is based on the output of the > recordTypeRegex > - Note: recordTypeRegex and messageHeaderRegex could be specified as > lists also (as a JSON array), where the list will be evaluated in order > until a matching regular expression is found. > > > > > > If there are no objections to having this type of Parser within Metron, we > will open a JIRA/PR for code review. > > *Jagdeep Singh* > -- -- simon elliston ball @sireb