[GitHub] incubator-metron issue #359: METRON-569: Change acking to prevent duplicate ...

2016-11-29 Thread DomenicPuzio
Github user DomenicPuzio commented on the issue:

https://github.com/apache/incubator-metron/pull/359
  
Sorry for the delayed response, @cestella - I was taking some time off for 
the holiday.

While I don't believe custom enrichment is pushing us over the timeout 
(we've clocked it at 15ms at the worst), that's certainly a possibility and 
something that we should also guard against. With this in mind, we now have two 
tasks to look into:
1. Ensuring all messages get acked after a successful join by building out 
the `streamMessageMap`
2. Making sure that our cache refresh is set to some fraction of the Storm 
timeout interval

Are these both items that we would like to complete? Should we split these 
into separate JIRAs and PRs? Should I work on the acking changes within this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #359: METRON-569: Change acking to prevent duplicate ...

2016-11-16 Thread DomenicPuzio
Github user DomenicPuzio commented on the issue:

https://github.com/apache/incubator-metron/pull/359
  
@cestella, I made the change to the PR name; thanks for the tip there.

I completely agree that we don't want to miss out on catching failures due 
to an enrichment source (like MySQL) or in the enrichment infrastructure. 
However, after we have put a tuple into the JoinBolt's cache, isn't it already 
past the point where these pitfalls could occur? If there is a failure in an 
enrichment, then Storm will time out while waiting for an ack that never takes 
place, so this message will be re-sent; but if the message is already in the 
cache, isn't its journey complete?

My thought with placing the ack there was (1) at this stage of the 
topology, that tuple has been correctly processed, and (2) for simplicity's 
sake so that we wouldn't have to modify `streamMessageMap`. I do agree that 
acking everything after the join has taken place also makes a lot of sense, and 
I can work on that if you would like.

We saw the duplicate data while running this in our development environment 
in EC2. Perhaps this is due to different ack timeout settings in the Storm 
topologies? We've repeated the duplication of data many times on our end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #359: Change acking to prevent duplicate tuples in en...

2016-11-15 Thread DomenicPuzio
Github user DomenicPuzio commented on the issue:

https://github.com/apache/incubator-metron/pull/359
  
Here's what I was seeing:

We were running only one Enrichment. So our EnrichmentSplitter sends 
_originalTuple_ to the Join and _originalTuple_ to the Enrichment. Enrichment 
then sends _enrichedTuple_ to Join. Join knows that there are two streams, so 
it waits for two tuples to join together.

When _originalTuple_ comes in first, we put it in the cache and do nothing. 
When _enrichedTuple_ comes in, we see that it has its mate in the cache, we 
join _originalTuple_ and _enrichedTuple_, and then we ack **the current 
tuple**, in this case, _enrichedTuple_. This means that we have never acked 
_originalTuple_, which caused Storm to re-send that tuple unnecessarily from 
the EnrichmentSplitter.

Since none of the cached messages are ever acked, Storm does not know that 
they have been correctly handled. Adding in the ack statement completely 
resolved this for me.

Does this clarify?

And @dlyle65535, I'm not sure on the best way to do that. Now that I've 
explained the problem, can you provide some guidance on what you'd like to see?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #359: Change acking to prevent duplicate tuple...

2016-11-15 Thread DomenicPuzio
GitHub user DomenicPuzio opened a pull request:

https://github.com/apache/incubator-metron/pull/359

Change acking to prevent duplicate tuples in enrichment topology

Adding this ack statement prevents duplicate messages from being sent 
through the _enrichment_ topology. Previously, in the _JoinBolt_, when a 
message does not complete a join, it does not get acked - this cause the 
message to get re-sent through the topology, resulting in duplicate messages 
being indexed.

This is a show-stopping bug, impacting performance and resulting in 
duplicate data.

I have tested this in the Quick-Dev enrichment as well as in our production 
system at high velocity.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DomenicPuzio/incubator-metron METRON-569

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-metron/pull/359.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #359


commit 7920eb575f1848786aeccd3d7d47cf458119f367
Author: DomenicPuzio 
Date:   2016-11-15T20:43:12Z

Adding critical ack statement to prevent duplicate messages




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request: [METRON-150] Adding WebSphere parse...

2016-05-13 Thread DomenicPuzio
Github user DomenicPuzio commented on the pull request:

https://github.com/apache/incubator-metron/pull/115#issuecomment-219095558
  
@merrimanr and @cestella, thank you SO MUCH for the help! I really 
appreciate it! I will certainly follow the steps to test on vagrant for future 
parsers.

I'm very excited to be a contributor, and I'm already working on my next 
parser!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request: [METRON-150] Adding WebSphere parse...

2016-05-11 Thread DomenicPuzio
Github user DomenicPuzio commented on the pull request:

https://github.com/apache/incubator-metron/pull/115#issuecomment-218661286
  
@merrimanr, thank you so much for the feedback and assistance. I have 
incorporated your changes! I have also added negative test cases (trying to 
parse empty or malformed lines) to help ensure that the topology will not crash 
when attempting to parse incorrectly formatted messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request: [METRON-150] Adding WebSphere parse...

2016-05-11 Thread DomenicPuzio
Github user DomenicPuzio commented on the pull request:

https://github.com/apache/incubator-metron/pull/115#issuecomment-218462405
  
@cestella, thank you so much for the feedback! I will get to work on fixing 
the 'websphere.json' file that was causing some failed tests. I agree that a 
negative case in the test class is a really good call.

@merrimanr, I agree that there is a good amount of overlap between this 
parser and the GrokParser class. There was some custom behavior needed to 
handle different log types, so I used GrokAsaParser (which also extends 
BasicParser) as a model. Off the top of my head, there are several sources that 
have different log types within a single source - Infoblox DNS, Checkpoint 
Firewall, Big IP VPN - so we should determine a strategy to address those. I 
like your idea of making the GrokParser more flexible so that we can easily 
extend it and add custom behavior for sources like WebSphere. Is this something 
that I should work to address in this PR or in a future refactoring? Either 
way, I am more than happy to work on that!




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request: [METRON-150] Adding WebSphere parse...

2016-05-10 Thread DomenicPuzio
GitHub user DomenicPuzio opened a pull request:

https://github.com/apache/incubator-metron/pull/115

[METRON-150] Adding WebSphere parser, unit tests, and integration tests

This PR is for 
[METRON-150](https://issues.apache.org/jira/browse/METRON-150). It satisfies 
the specified requirements of the WebSphere parser, passes unit tests, and 
passes integration tests. The parser uses Grok to pull out the fields present 
in all message types and uses Java to handle message-specific parsing.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DomenicPuzio/incubator-metron master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-metron/pull/115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #115


commit eafc6c9b2e05c2f51922b5454efc367915cf5f6d
Author: xxl072 
Date:   2016-05-10T18:46:24Z

Adding WebSphere parser, unit tests, and integration tests to close 
METRON-150

commit a4e3f8cd98d45fe38d5cf35269e60ba59a117796
Author: xxl072 
Date:   2016-05-10T19:01:13Z

Cleaning up comments and whitespace

commit 85bb7584f50b06159341bdc6e7091c7cf1b14405
Author: xxl072 
Date:   2016-05-10T19:04:07Z

Adding Apache license information to classes

commit 92ab5134570387d3519bf6fad3e500d709850d13
Author: xxl072 
Date:   2016-05-10T19:06:17Z

Adding Apache license to one more file for METRON-150




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---