[ https://issues.apache.org/jira/browse/METRON-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546856#comment-16546856 ]
ASF GitHub Bot commented on METRON-1657: ---------------------------------------- Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/1099#discussion_r203092801 --- Diff: use-cases/parser_chaining/README.md --- @@ -233,3 +233,10 @@ cat ~/data.log | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --b ``` You should see indices created for the `cisco-5-304` and `cisco-6-302` data with appropriate fields created for each type. + +# Aggregated Parsers with Parser Chaining +Chained parsers can be run as aggregated parsers. These parsers continue to use the sensor specific Kafka topics, and do not do internal routing to the appropriate sensor. + --- End diff -- Right now, as noted in the description, there's no UI attached to this. Even the REST API's update is pretty minimal (just to take comma separated lists). I didn't want to build that out, because the management UI requires some decent amount of thought put into it and that'll ripple through REST as needed (e.g. needing/wanting to pass spout num tasks, parallelism, etc.). Right now I look at this as providing a low level way of being able to get some of the benefits of this type of aggregation, with making it more user friendly being follow-on since it'll require nontrivial effort and design. I can go ahead and create follow-on tickets for that work, if that works for you. For the default Ambari processors, I'm not particularly inclined to worry about it, although I could be persuaded that we need to. That feels like something that can be addressed as this is made more user friendly (i.e. I expect people familiar enough with the system to make the determination to aggregate parsers right now to also be familiar enough to stop the topologies). I could add a warning or something like that in the docs to not run an aggregated parser with sensor X alongside a dedicated topology for sensor X, but I'm not sure that's necessary. I also went ahead and added the actual command to the chain parsers README, so the practical example is complete. > Parser aggregation in storm > --------------------------- > > Key: METRON-1657 > URL: https://issues.apache.org/jira/browse/METRON-1657 > Project: Metron > Issue Type: Bug > Reporter: Justin Leet > Assignee: Justin Leet > Priority: Major > > Currently our parsing solution requires one storm topology per sensor. It has > been complained that this may be wasteful of resources and that, rather than > one storm topology per sensor, it would be advantageous to have multiple > sensors in the same topology. The benefit to this is that it would require > fewer storm slots. > The issue with this is that whenever we've aggregated functionality like this > before, we've run into issues appropriately being able to scale storm (e.g. > batch vs random access indexing in the same topology). The main point in > addressing this is to recommend that parsers with similar velocities and > complexity are grouped together. > Particularly for a first cut, leave the configuration mostly as-is, while > allowing for comma separated lists of sensors in start_parser_topology.sh > (e.g. bro,yaf creates a aggregated parser consisting of those two). -- This message was sent by Atlassian JIRA (v7.6.3#76005)