In that case the hash would be of the value in the IP field, such as sha3(8.8.8.8).
Jon On Mon, Jan 23, 2017, 6:41 PM James Sirota <jsir...@apache.org> wrote: > Jon, > > I am still not entirely following why we would want to use hashing. For > example if my error is "Your IP field is invalid and failed validation" > hashing this error string will always result in the same hash. Why not > just use the actual error string? Can you provide an example where you > would use it? > > Thanks, > James > > 23.01.2017, 16:29, "zeo...@gmail.com" <zeo...@gmail.com>: > > For 1 - I'm good with that. > > > > I'm talking about hashing the relevant content itself not the error. Some > > benefits are (1) minimize load on search index (there's minimal benefit > in > > spending the CPU and disk to keep it at full fidelity (tokenize and > store)) > > (2) provide something to key on for dashboards (assuming a good hash > > algorithm that avoids collisions and is second preimage resistant) and > (3) > > specific to errors, if the issue is that it failed to index, a hash gives > > us some protection that the issue will not occur twice. > > > > Jon > > > > On Mon, Jan 23, 2017, 2:47 PM James Sirota <jsir...@apache.org> wrote: > > > > Jon, > > > > With regards to 1, collapsing to a single dashboard for each would be > > fine. So we would have one error index and one "failed to validate" > > index. The distinction is that errors would be things that went wrong > > during stream processing (failed to parse, etc...), while validation > > failures are messages that explicitly failed stellar validation/schema > > enforcement. There should be relatively few of the second type. > > > > With respect to 3, why do you want the error hashed? Why not just search > > for the error text? > > > > Thanks, > > James > > > > 20.01.2017, 14:01, "zeo...@gmail.com" <zeo...@gmail.com>: > >> As someone who currently fills the platform engineer role, I can give > this > >> idea a huge +1. My thoughts: > >> > >> 1. I think it depends on exactly what data is pushed into the index > (#3). > >> However, assuming the errors you proposed recording, I can't see huge > >> benefits to having more than one dashboard. I would be happy to be > >> persuaded otherwise. > >> > >> 2. I would say yes, storing the errors in HDFS in addition to indexing > is > >> a good thing. Using METRON-510 > >> <https://issues.apache.org/jira/browse/METRON-510> as a case study, > there > >> is the potential in this environment for attacker-controlled data to > > > > result > >> in processing errors which could be a method of evading security > >> monitoring. Once an attack is identified, the long term HDFS storage > would > >> allow better historical analysis for low-and-slow/persistent attacks > (I'm > >> thinking of a method of data exfil that also won't successfully get > stored > >> in Lucene, but is hard to identify over a short period of time). > >> - Along this line, I think that there are various parts of Metron > (this > >> included) which could benefit from having method of configuring data > aging > >> by bucket in HDFS (Following Nick's comments here > >> <https://issues.apache.org/jira/browse/METRON-477>). > >> > >> 3. I would potentially add a hash of the content that failed > validation to > >> help identify repeats over time with less of a concern that you'd have > > > > back > >> to back failures (i.e. instead of storing the value itself). > Additionally, > >> I think it's helpful to be able to search all times there was an > indexing > >> error (instead of it hitting the catch-all). > >> > >> Jon > >> > >> On Fri, Jan 20, 2017 at 1:17 PM James Sirota <jsir...@apache.org> > wrote: > >> > >> We already have a capability to capture bolt errors and validation > errors > >> and pipe them into a Kafka topic. I want to propose that we attach a > >> writer topology to the error and validation failed kafka topics so > that we > >> can (a) create a new ES index for these errors and (b) create a new > Kibana > >> dashboard to visualize them. The benefit would be that errors and > >> validation failures would be easier to see and analyze. > >> > >> I am seeking feedback on the following: > >> > >> - How granular would we want this feature to be? Think we would want > one > >> index/dashboard per source? Or would it be better to collapse > everything > >> into the same index? > >> - Do we care about storing these errors in HDFS as well? Or is indexing > >> them enough? > >> - What types of errors should we record? I am proposing: > >> > >> For error reporting: > >> --Message failed to parse > >> --Enrichment failed to enrich > >> --Threat intel feed failures > >> --Generic catch-all for all other errors > >> > >> For validation reporting: > >> --What part of message failed validation > >> --What stellar validator caused the failure > >> > >> ------------------- > >> Thank you, > >> > >> James Sirota > >> PPMC- Apache Metron (Incubating) > >> jsirota AT apache DOT org > >> > >> -- > >> > >> Jon > >> > >> Sent from my mobile device > > > > ------------------- > > Thank you, > > > > James Sirota > > PPMC- Apache Metron (Incubating) > > jsirota AT apache DOT org > > > > -- > > > > Jon > > > > Sent from my mobile device > > ------------------- > Thank you, > > James Sirota > PPMC- Apache Metron (Incubating) > jsirota AT apache DOT org > -- Jon Sent from my mobile device