That use case makes sense to me. I don't think it will require that much additional effort either.
On Tue, Jan 24, 2017 at 1:02 PM, zeo...@gmail.com <zeo...@gmail.com> wrote: > Regarding error vs validation - Either way I'm not very concerned. I > initially assumed they would be combined and agree with that approach, but > splitting them out isn't a very big deal to me either. > > Re: Ryan. Yes, exactly. In the case of a parser issue (or anywhere else > where it's not possible to pick out the exact thing causing the issue) it > would be a hash of the complete message. > > Regarding the architecture, I mostly agree with James except that I think > step 3 needs to also be able to somehow group errors via the original > data (identify > replays, identify repeat issues with data in a specific field, issues with > consistently different data, etc.). This is essentially the first step of > troubleshooting, which I assume you are doing if you're looking at the > error dashboard. > > If the hash gets moved out of the initial implementation, I'm fairly > certain you lose this ability. The point here isn't to handle long fields > (although that's a benefit of this approach), it's to attach a unique > identifier to the error/validation issue message that links it to the > original problem. I'd be happy to consider alternative solutions to this > problem (for instance, actually sending across the data itself) I just > haven't been able to think of another way to do this that I like better. > > Jon > > On Tue, Jan 24, 2017 at 1:13 PM Ryan Merriman <merrim...@gmail.com> wrote: > > > We also need a JIRA for any install/Ansible/MPack work needed. > > > > On Tue, Jan 24, 2017 at 12:06 PM, James Sirota <jsir...@apache.org> > wrote: > > > > > Now that I had some time to think about it I would collapse all error > and > > > validation topics into one. We can differentiate between different > views > > > of the data (split by error source etc) via Kibana dashboards. I would > > > implement this feature incrementally. First I would modify all the > bolts > > > to log to a single topic. Second, I would get the error indexing done > by > > > attaching the indexing topology to the error topic. Third I would > create > > > the necessary dashboards to view errors and validation failures by > > source. > > > Lastly, I would file a follow-on JIRA to introduce hashing of errors or > > > fields that are too long. It seems like a separate feature that we > need > > to > > > think through. We may need a stellar function around that. > > > > > > Thanks, > > > James > > > > > > 24.01.2017, 10:25, "Ryan Merriman" <merrim...@gmail.com>: > > > > I understand what Jon is talking about. He's proposing we hash the > > value > > > > that caused the error, not necessarily the error message itself. For > an > > > > enrichment this is easy. Just pass along the field value that failed > > > > enrichment. For other cases the field that caused the error may not > be > > so > > > > obvious. Take parser validation for example. The message is validated > > as > > > > a whole and it may not be easy to determine which field is the cause. > > In > > > > that case would a hash of the whole message work? > > > > > > > > There is a broader architectural discussion that needs to happen > before > > > we > > > > can implement this. Currently we have an indexing topology that reads > > > from > > > > 1 topic and writes messages to ES but errors are written to several > > > > different topics: > > > > > > > > - parser_error > > > > - parser_invalid > > > > - enrichments_error > > > > - threatintel_error > > > > - indexing_error > > > > > > > > I can see 4 possible approaches to implementing this: > > > > > > > > 1. Create an index topology for each error topic > > > > 1. Good because we can easily reuse the indexing topology and > > would > > > > require the least development effort > > > > 2. Bad because it would consume a lot of extra worker slots > > > > 2. Move the topic name into the error JSON message as a new > > > "error_type" > > > > field and write all messages to the indexing topic > > > > 1. Good because we don't need to create a new topology > > > > 2. Bad because we would be flowing data and errors through the > > same > > > > topology. A spike in errors could affect message indexing. > > > > 3. Compromise between 1 and 2. Create another indexing topology > that > > > is > > > > dedicated to indexing errors. Move the topic name into the error > > JSON > > > > message as a new "error_type" field and write all errors to a > single > > > error > > > > topic. > > > > 4. Write a completely new topology with multiple spouts (1 for > each > > > > error type listed above) that all feed into a single > > > BulkMessageWriterBolt. > > > > 1. Good because the current topologies would not need to change > > > > 2. Bad because it would require the most development effort, > > would > > > > not reuse existing topologies and takes up more worker slots > > than 3 > > > > > > > > Are there other approaches I haven't thought of? I think 1 and 2 are > > off > > > > the table because they are shortcuts and not good long-term > solutions. > > 3 > > > > would be my choice because it introduces less complexity than 4. > > > Thoughts? > > > > > > > > Ryan > > > > > > > > On Mon, Jan 23, 2017 at 5:44 PM, zeo...@gmail.com <zeo...@gmail.com> > > > wrote: > > > > > > > >> In that case the hash would be of the value in the IP field, such > as > > > >> sha3(8.8.8.8). > > > >> > > > >> Jon > > > >> > > > >> On Mon, Jan 23, 2017, 6:41 PM James Sirota <jsir...@apache.org> > > wrote: > > > >> > > > >> > Jon, > > > >> > > > > >> > I am still not entirely following why we would want to use > hashing. > > > For > > > >> > example if my error is "Your IP field is invalid and failed > > > validation" > > > >> > hashing this error string will always result in the same hash. > Why > > > not > > > >> > just use the actual error string? Can you provide an example > where > > > you > > > >> > would use it? > > > >> > > > > >> > Thanks, > > > >> > James > > > >> > > > > >> > 23.01.2017, 16:29, "zeo...@gmail.com" <zeo...@gmail.com>: > > > >> > > For 1 - I'm good with that. > > > >> > > > > > >> > > I'm talking about hashing the relevant content itself not the > > > error. > > > >> Some > > > >> > > benefits are (1) minimize load on search index (there's minimal > > > benefit > > > >> > in > > > >> > > spending the CPU and disk to keep it at full fidelity (tokenize > > and > > > >> > store)) > > > >> > > (2) provide something to key on for dashboards (assuming a good > > > hash > > > >> > > algorithm that avoids collisions and is second preimage > > resistant) > > > and > > > >> > (3) > > > >> > > specific to errors, if the issue is that it failed to index, a > > hash > > > >> gives > > > >> > > us some protection that the issue will not occur twice. > > > >> > > > > > >> > > Jon > > > >> > > > > > >> > > On Mon, Jan 23, 2017, 2:47 PM James Sirota <jsir...@apache.org > > > > > wrote: > > > >> > > > > > >> > > Jon, > > > >> > > > > > >> > > With regards to 1, collapsing to a single dashboard for each > > would > > > be > > > >> > > fine. So we would have one error index and one "failed to > > validate" > > > >> > > index. The distinction is that errors would be things that went > > > wrong > > > >> > > during stream processing (failed to parse, etc...), while > > > validation > > > >> > > failures are messages that explicitly failed stellar > > > validation/schema > > > >> > > enforcement. There should be relatively few of the second type. > > > >> > > > > > >> > > With respect to 3, why do you want the error hashed? Why not > just > > > >> search > > > >> > > for the error text? > > > >> > > > > > >> > > Thanks, > > > >> > > James > > > >> > > > > > >> > > 20.01.2017, 14:01, "zeo...@gmail.com" <zeo...@gmail.com>: > > > >> > >> As someone who currently fills the platform engineer role, I > can > > > give > > > >> > this > > > >> > >> idea a huge +1. My thoughts: > > > >> > >> > > > >> > >> 1. I think it depends on exactly what data is pushed into the > > > index > > > >> > (#3). > > > >> > >> However, assuming the errors you proposed recording, I can't > see > > > huge > > > >> > >> benefits to having more than one dashboard. I would be happy > to > > be > > > >> > >> persuaded otherwise. > > > >> > >> > > > >> > >> 2. I would say yes, storing the errors in HDFS in addition to > > > >> indexing > > > >> > is > > > >> > >> a good thing. Using METRON-510 > > > >> > >> <https://issues.apache.org/jira/browse/METRON-510> as a case > > > study, > > > >> > there > > > >> > >> is the potential in this environment for attacker-controlled > > data > > > to > > > >> > > > > > >> > > result > > > >> > >> in processing errors which could be a method of evading > security > > > >> > >> monitoring. Once an attack is identified, the long term HDFS > > > storage > > > >> > would > > > >> > >> allow better historical analysis for low-and-slow/persistent > > > attacks > > > >> > (I'm > > > >> > >> thinking of a method of data exfil that also won't > successfully > > > get > > > >> > stored > > > >> > >> in Lucene, but is hard to identify over a short period of > time). > > > >> > >> - Along this line, I think that there are various parts of > > Metron > > > >> > (this > > > >> > >> included) which could benefit from having method of > configuring > > > data > > > >> > aging > > > >> > >> by bucket in HDFS (Following Nick's comments here > > > >> > >> <https://issues.apache.org/jira/browse/METRON-477>). > > > >> > >> > > > >> > >> 3. I would potentially add a hash of the content that failed > > > >> > validation to > > > >> > >> help identify repeats over time with less of a concern that > > you'd > > > >> have > > > >> > > > > > >> > > back > > > >> > >> to back failures (i.e. instead of storing the value itself). > > > >> > Additionally, > > > >> > >> I think it's helpful to be able to search all times there was > an > > > >> > indexing > > > >> > >> error (instead of it hitting the catch-all). > > > >> > >> > > > >> > >> Jon > > > >> > >> > > > >> > >> On Fri, Jan 20, 2017 at 1:17 PM James Sirota < > > jsir...@apache.org> > > > >> > wrote: > > > >> > >> > > > >> > >> We already have a capability to capture bolt errors and > > validation > > > >> > errors > > > >> > >> and pipe them into a Kafka topic. I want to propose that we > > > attach a > > > >> > >> writer topology to the error and validation failed kafka > topics > > so > > > >> > that we > > > >> > >> can (a) create a new ES index for these errors and (b) create > a > > > new > > > >> > Kibana > > > >> > >> dashboard to visualize them. The benefit would be that errors > > and > > > >> > >> validation failures would be easier to see and analyze. > > > >> > >> > > > >> > >> I am seeking feedback on the following: > > > >> > >> > > > >> > >> - How granular would we want this feature to be? Think we > would > > > want > > > >> > one > > > >> > >> index/dashboard per source? Or would it be better to collapse > > > >> > everything > > > >> > >> into the same index? > > > >> > >> - Do we care about storing these errors in HDFS as well? Or is > > > >> indexing > > > >> > >> them enough? > > > >> > >> - What types of errors should we record? I am proposing: > > > >> > >> > > > >> > >> For error reporting: > > > >> > >> --Message failed to parse > > > >> > >> --Enrichment failed to enrich > > > >> > >> --Threat intel feed failures > > > >> > >> --Generic catch-all for all other errors > > > >> > >> > > > >> > >> For validation reporting: > > > >> > >> --What part of message failed validation > > > >> > >> --What stellar validator caused the failure > > > >> > >> > > > >> > >> ------------------- > > > >> > >> Thank you, > > > >> > >> > > > >> > >> James Sirota > > > >> > >> PPMC- Apache Metron (Incubating) > > > >> > >> jsirota AT apache DOT org > > > >> > >> > > > >> > >> -- > > > >> > >> > > > >> > >> Jon > > > >> > >> > > > >> > >> Sent from my mobile device > > > >> > > > > > >> > > ------------------- > > > >> > > Thank you, > > > >> > > > > > >> > > James Sirota > > > >> > > PPMC- Apache Metron (Incubating) > > > >> > > jsirota AT apache DOT org > > > >> > > > > > >> > > -- > > > >> > > > > > >> > > Jon > > > >> > > > > > >> > > Sent from my mobile device > > > >> > > > > >> > ------------------- > > > >> > Thank you, > > > >> > > > > >> > James Sirota > > > >> > PPMC- Apache Metron (Incubating) > > > >> > jsirota AT apache DOT org > > > >> > > > > >> -- > > > >> > > > >> Jon > > > >> > > > >> Sent from my mobile device > > > > > > ------------------- > > > Thank you, > > > > > > James Sirota > > > PPMC- Apache Metron (Incubating) > > > jsirota AT apache DOT org > > > > > > -- > > Jon > > Sent from my mobile device >