[ https://issues.apache.org/jira/browse/METRON-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545684#comment-16545684 ]
ASF GitHub Bot commented on METRON-1657: ---------------------------------------- Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/1099#discussion_r202805609 --- Diff: metron-platform/metron-parsers/README.md --- @@ -82,6 +82,12 @@ topology in kafka. Errors are collected with the context of the error (e.g. stacktrace) and original message causing the error and sent to an `error` queue. Invalid messages as determined by global validation functions are also treated as errors and sent to an `error` queue. + +Multiple sensors can be aggregated into a single Storm topology. When this is done, there will be +multiple Kafka spouts, but only a single parser bolt which will handle delegating to the correct --- End diff -- This PR gives us the ability to group the parsers into a single topology if we so desire. You would still write through to kafka. So, the topology in the example would have 3 kafka spouts: * One for monitoring `pix_syslog_router` (the syslog parser aka the routing parser) * One for monitoring `cisco-5-304` * One for monitoring `cisco-6-302` There would be one parser bolt, though, which would handle parsing all 3 sensor types. That is the contribution of this PR, the ability to determine the parser and filter and field transformations from the input kafka topic and use the appropriate one to parse the messages. There is not, however, any code here that would bypass the intermediate kafka write (e.g. from the router topology to the individual `cisco-5-304` or `cisco-6-302` topics). That's a current gap. > Parser aggregation in storm > --------------------------- > > Key: METRON-1657 > URL: https://issues.apache.org/jira/browse/METRON-1657 > Project: Metron > Issue Type: Bug > Reporter: Justin Leet > Assignee: Justin Leet > Priority: Major > > Currently our parsing solution requires one storm topology per sensor. It has > been complained that this may be wasteful of resources and that, rather than > one storm topology per sensor, it would be advantageous to have multiple > sensors in the same topology. The benefit to this is that it would require > fewer storm slots. > The issue with this is that whenever we've aggregated functionality like this > before, we've run into issues appropriately being able to scale storm (e.g. > batch vs random access indexing in the same topology). The main point in > addressing this is to recommend that parsers with similar velocities and > complexity are grouped together. > Particularly for a first cut, leave the configuration mostly as-is, while > allowing for comma separated lists of sensors in start_parser_topology.sh > (e.g. bro,yaf creates a aggregated parser consisting of those two). -- This message was sent by Atlassian JIRA (v7.6.3#76005)