[GitHub] incubator-metron issue #359: METRON-569: Change acking to prevent duplicate ...
Github user DomenicPuzio commented on the issue: https://github.com/apache/incubator-metron/pull/359 Sorry for the delayed response, @cestella - I was taking some time off for the holiday. While I don't believe custom enrichment is pushing us over the timeout (we've clocked it at 15ms at the worst), that's certainly a possibility and something that we should also guard against. With this in mind, we now have two tasks to look into: 1. Ensuring all messages get acked after a successful join by building out the `streamMessageMap` 2. Making sure that our cache refresh is set to some fraction of the Storm timeout interval Are these both items that we would like to complete? Should we split these into separate JIRAs and PRs? Should I work on the acking changes within this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #359: METRON-569: Change acking to prevent duplicate ...
Github user DomenicPuzio commented on the issue: https://github.com/apache/incubator-metron/pull/359 @cestella, I made the change to the PR name; thanks for the tip there. I completely agree that we don't want to miss out on catching failures due to an enrichment source (like MySQL) or in the enrichment infrastructure. However, after we have put a tuple into the JoinBolt's cache, isn't it already past the point where these pitfalls could occur? If there is a failure in an enrichment, then Storm will time out while waiting for an ack that never takes place, so this message will be re-sent; but if the message is already in the cache, isn't its journey complete? My thought with placing the ack there was (1) at this stage of the topology, that tuple has been correctly processed, and (2) for simplicity's sake so that we wouldn't have to modify `streamMessageMap`. I do agree that acking everything after the join has taken place also makes a lot of sense, and I can work on that if you would like. We saw the duplicate data while running this in our development environment in EC2. Perhaps this is due to different ack timeout settings in the Storm topologies? We've repeated the duplication of data many times on our end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #359: Change acking to prevent duplicate tuples in en...
Github user DomenicPuzio commented on the issue: https://github.com/apache/incubator-metron/pull/359 Here's what I was seeing: We were running only one Enrichment. So our EnrichmentSplitter sends _originalTuple_ to the Join and _originalTuple_ to the Enrichment. Enrichment then sends _enrichedTuple_ to Join. Join knows that there are two streams, so it waits for two tuples to join together. When _originalTuple_ comes in first, we put it in the cache and do nothing. When _enrichedTuple_ comes in, we see that it has its mate in the cache, we join _originalTuple_ and _enrichedTuple_, and then we ack **the current tuple**, in this case, _enrichedTuple_. This means that we have never acked _originalTuple_, which caused Storm to re-send that tuple unnecessarily from the EnrichmentSplitter. Since none of the cached messages are ever acked, Storm does not know that they have been correctly handled. Adding in the ack statement completely resolved this for me. Does this clarify? And @dlyle65535, I'm not sure on the best way to do that. Now that I've explained the problem, can you provide some guidance on what you'd like to see? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron pull request #359: Change acking to prevent duplicate tuple...
GitHub user DomenicPuzio opened a pull request: https://github.com/apache/incubator-metron/pull/359 Change acking to prevent duplicate tuples in enrichment topology Adding this ack statement prevents duplicate messages from being sent through the _enrichment_ topology. Previously, in the _JoinBolt_, when a message does not complete a join, it does not get acked - this cause the message to get re-sent through the topology, resulting in duplicate messages being indexed. This is a show-stopping bug, impacting performance and resulting in duplicate data. I have tested this in the Quick-Dev enrichment as well as in our production system at high velocity. You can merge this pull request into a Git repository by running: $ git pull https://github.com/DomenicPuzio/incubator-metron METRON-569 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-metron/pull/359.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #359 commit 7920eb575f1848786aeccd3d7d47cf458119f367 Author: DomenicPuzio Date: 2016-11-15T20:43:12Z Adding critical ack statement to prevent duplicate messages --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron pull request: [METRON-150] Adding WebSphere parse...
Github user DomenicPuzio commented on the pull request: https://github.com/apache/incubator-metron/pull/115#issuecomment-219095558 @merrimanr and @cestella, thank you SO MUCH for the help! I really appreciate it! I will certainly follow the steps to test on vagrant for future parsers. I'm very excited to be a contributor, and I'm already working on my next parser! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron pull request: [METRON-150] Adding WebSphere parse...
Github user DomenicPuzio commented on the pull request: https://github.com/apache/incubator-metron/pull/115#issuecomment-218661286 @merrimanr, thank you so much for the feedback and assistance. I have incorporated your changes! I have also added negative test cases (trying to parse empty or malformed lines) to help ensure that the topology will not crash when attempting to parse incorrectly formatted messages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron pull request: [METRON-150] Adding WebSphere parse...
Github user DomenicPuzio commented on the pull request: https://github.com/apache/incubator-metron/pull/115#issuecomment-218462405 @cestella, thank you so much for the feedback! I will get to work on fixing the 'websphere.json' file that was causing some failed tests. I agree that a negative case in the test class is a really good call. @merrimanr, I agree that there is a good amount of overlap between this parser and the GrokParser class. There was some custom behavior needed to handle different log types, so I used GrokAsaParser (which also extends BasicParser) as a model. Off the top of my head, there are several sources that have different log types within a single source - Infoblox DNS, Checkpoint Firewall, Big IP VPN - so we should determine a strategy to address those. I like your idea of making the GrokParser more flexible so that we can easily extend it and add custom behavior for sources like WebSphere. Is this something that I should work to address in this PR or in a future refactoring? Either way, I am more than happy to work on that! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron pull request: [METRON-150] Adding WebSphere parse...
GitHub user DomenicPuzio opened a pull request: https://github.com/apache/incubator-metron/pull/115 [METRON-150] Adding WebSphere parser, unit tests, and integration tests This PR is for [METRON-150](https://issues.apache.org/jira/browse/METRON-150). It satisfies the specified requirements of the WebSphere parser, passes unit tests, and passes integration tests. The parser uses Grok to pull out the fields present in all message types and uses Java to handle message-specific parsing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/DomenicPuzio/incubator-metron master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-metron/pull/115.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #115 commit eafc6c9b2e05c2f51922b5454efc367915cf5f6d Author: xxl072 Date: 2016-05-10T18:46:24Z Adding WebSphere parser, unit tests, and integration tests to close METRON-150 commit a4e3f8cd98d45fe38d5cf35269e60ba59a117796 Author: xxl072 Date: 2016-05-10T19:01:13Z Cleaning up comments and whitespace commit 85bb7584f50b06159341bdc6e7091c7cf1b14405 Author: xxl072 Date: 2016-05-10T19:04:07Z Adding Apache license information to classes commit 92ab5134570387d3519bf6fad3e500d709850d13 Author: xxl072 Date: 2016-05-10T19:06:17Z Adding Apache license to one more file for METRON-150 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---