[ https://issues.apache.org/jira/browse/METRON-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538690#comment-16538690 ]
ASF GitHub Bot commented on METRON-1657: ---------------------------------------- GitHub user justinleet reopened a pull request: https://github.com/apache/metron/pull/1099 METRON-1657: Parser aggregation in storm ## Contributor Comments This PR allows for users to specify multiple parsers to be run in one aggregated Storm topology. Essentially, the ParserBolt (and the supporting infrastructure) has been generalized to take multiple sensors. This gives us a structure where there's a Storm spout per sensor. These all lead to a parser bolt that delegates based on the topic metadata and output appropriately. ### Current assumptions/restrictions - Topic metadata must be enabled in order to be able to delegate properly in the aggregated case (still unneeded for standard case). - Configs that are shared across parsers generally apply in a "Last-one wins" manner. Theres's a couple minor exceptions (e.g. if a parser says Kafka security is not PLAINTEXT, that'll win over anyone who says PLAINTEXT). - All error topics for the aggregated parsers are the same. This restriction could be lifted if we generalize that infrastructure a bit, but I think it's reasonable to leave as a follow-on if there's enough demand. - Order matters in how the sensors are specified. "bro,yaf" is not the same as "yaf,bro". There are two places this matters, configs and the name of the Storm topology. This could pretty easily be lifted by sorting for the Storm topology (and I might just go ahead and do it anyway). ### Testing To ensure that single sensor parsers work, just spin up full dev and ensure everything is passing data as expected. On fulldev, the REST API has been altered to accept a comma separated list in the parser start and parser stop endpoints. Just kill bro, and start up "bro,yaf". An aggregated parser should be be launched in Storm with the names of both sensors used, and data should flow through both. Because no UI is attached to aggregated parsers, I chose not to expand this REST API out for now until we know what's actually needed to properly manage it. The upshot of this is that you can't specify advanced configs directly here. On the CLI, the start_parser_topology.sh can be used to submit e.g. ``` ${METRON_HOME}/bin/start_parser_topology.sh -z node1:2181 -s bro,yaf ``` If you want to override certain parameters via command line this is still possible: ``` ${METRON_HOME}/bin/start_parser_topology.sh -z node1:2181 -s bro,yaf -snt 2,3 -pp 2 -pnt 5 ``` This will spin up a topology with a spout number of tasks of 2 for bro, 3 for yaf, along with a parser parallelism of 2 and a parser number of tasks of 5. This can be seen in the Storm UI by looking at the number of executors and tasks for each Storm component after launch. As a heads up, because this is full dev and it's only running with one Storm supervisor/worker/etc, it's likely to be unhappy with the config, but it'll still submit the job. ## Pull Request Checklist Thank you for submitting a contribution to Apache Metron. Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions. Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides. In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? ### For code changes: - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: ``` mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh ``` - [x] Have you written or updated unit tests and or integration tests to verify your changes? - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`: ``` cd site-book mvn site ``` #### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/justinleet/metron parserAgg Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1099.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1099 ---- commit edb1bc1fc6217230ad73442045bc5904fd852309 Author: justinjleet <justinjleet@...> Date: 2018-06-28T14:28:35Z wip commit a7725d16209aabd2a3c8935c694aa4373a34468b Author: justinjleet <justinjleet@...> Date: 2018-07-02T12:36:17Z More wip on getting everything carried through commit d08993a74350314a2197fb86645268e4590b0edc Author: justinjleet <justinjleet@...> Date: 2018-07-03T20:09:56Z more complete wip, although still work to do commit 463c38911383a19064e5251dc7bf7327391704c8 Author: justinjleet <justinjleet@...> Date: 2018-07-04T21:50:09Z allowing for nullable output topic. should end up being enrichment by default commit a04b40e8c72d3e437f4e8363505ff77afabb1394 Author: justinjleet <justinjleet@...> Date: 2018-07-05T13:19:02Z Actually committing all the fix from before commit c6949a074c61a86d57749d526418d1dcbf09a90a Author: justinjleet <justinjleet@...> Date: 2018-07-05T14:59:26Z Stop and Start work in REST now commit ee1ef16ec9e69ba5859dbd9fe8a63bdf64817142 Author: justinjleet <justinjleet@...> Date: 2018-07-05T19:30:26Z Fixing unit test commit 61592b8c5767238a501dce355933f4df22d4a03b Author: justinjleet <justinjleet@...> Date: 2018-07-05T21:04:35Z Some cleanup commit 3b7d2423aa6105ac20e943c7a49590b7723a1f74 Author: justinjleet <justinjleet@...> Date: 2018-07-06T01:48:46Z More cleanup commit 8e4263ac5452eaf9b8939f7b7acb8d2ef6404676 Author: justinjleet <justinjleet@...> Date: 2018-07-08T19:31:48Z Some cleanup, test additions, couple fixes commit b106c56c549325935813933eb7f5a0d2f1a024a6 Author: justinjleet <justinjleet@...> Date: 2018-07-10T12:58:36Z docs and more cleanup commit 717ff36fa6ff72a17d8891222ff27778d8405c35 Author: justinjleet <justinjleet@...> Date: 2018-07-10T13:03:37Z removing old TODO commit fd0957ed8269467afa7c4c538cf61f64828faaf8 Author: justinjleet <justinjleet@...> Date: 2018-07-10T13:05:45Z Adding note to README commit bceec862fc6275fb54348b7e18cc14d1903a6769 Author: justinjleet <justinjleet@...> Date: 2018-07-10T13:44:40Z Removing Splitter and just using a Java stream ---- > Parser aggregation in storm > --------------------------- > > Key: METRON-1657 > URL: https://issues.apache.org/jira/browse/METRON-1657 > Project: Metron > Issue Type: Bug > Reporter: Justin Leet > Assignee: Justin Leet > Priority: Major > > Currently our parsing solution requires one storm topology per sensor. It has > been complained that this may be wasteful of resources and that, rather than > one storm topology per sensor, it would be advantageous to have multiple > sensors in the same topology. The benefit to this is that it would require > fewer storm slots. > The issue with this is that whenever we've aggregated functionality like this > before, we've run into issues appropriately being able to scale storm (e.g. > batch vs random access indexing in the same topology). The main point in > addressing this is to recommend that parsers with similar velocities and > complexity are grouped together. > Particularly for a first cut, leave the configuration mostly as-is, while > allowing for comma separated lists of sensors in start_parser_topology.sh > (e.g. bro,yaf creates a aggregated parser consisting of those two). -- This message was sent by Atlassian JIRA (v7.6.3#76005)