Re: [DISCUSS] Error Indexing

zeo...@gmail.com Mon, 23 Jan 2017 15:45:06 -0800

In that case the hash would be of the value in the IP field, such as
sha3(8.8.8.8).


Jon

On Mon, Jan 23, 2017, 6:41 PM James Sirota <jsir...@apache.org> wrote:

> Jon,
>
> I am still not entirely following why we would want to use hashing.  For
> example if my error is "Your IP field is invalid and failed validation"
> hashing this error string will always result in the same hash.  Why not
> just use the actual error string? Can you provide an example where you
> would use it?
>
> Thanks,
> James
>
> 23.01.2017, 16:29, "zeo...@gmail.com" <zeo...@gmail.com>:
> > For 1 - I'm good with that.
> >
> > I'm talking about hashing the relevant content itself not the error. Some
> > benefits are (1) minimize load on search index (there's minimal benefit
> in
> > spending the CPU and disk to keep it at full fidelity (tokenize and
> store))
> > (2) provide something to key on for dashboards (assuming a good hash
> > algorithm that avoids collisions and is second preimage resistant) and
> (3)
> > specific to errors, if the issue is that it failed to index, a hash gives
> > us some protection that the issue will not occur twice.
> >
> > Jon
> >
> > On Mon, Jan 23, 2017, 2:47 PM James Sirota <jsir...@apache.org> wrote:
> >
> > Jon,
> >
> > With regards to 1, collapsing to a single dashboard for each would be
> > fine. So we would have one error index and one "failed to validate"
> > index. The distinction is that errors would be things that went wrong
> > during stream processing (failed to parse, etc...), while validation
> > failures are messages that explicitly failed stellar validation/schema
> > enforcement. There should be relatively few of the second type.
> >
> > With respect to 3, why do you want the error hashed? Why not just search
> > for the error text?
> >
> > Thanks,
> > James
> >
> > 20.01.2017, 14:01, "zeo...@gmail.com" <zeo...@gmail.com>:
> >>  As someone who currently fills the platform engineer role, I can give
> this
> >>  idea a huge +1. My thoughts:
> >>
> >>  1. I think it depends on exactly what data is pushed into the index
> (#3).
> >>  However, assuming the errors you proposed recording, I can't see huge
> >>  benefits to having more than one dashboard. I would be happy to be
> >>  persuaded otherwise.
> >>
> >>  2. I would say yes, storing the errors in HDFS in addition to indexing
> is
> >>  a good thing. Using METRON-510
> >>  <https://issues.apache.org/jira/browse/METRON-510> as a case study,
> there
> >>  is the potential in this environment for attacker-controlled data to
> >
> > result
> >>  in processing errors which could be a method of evading security
> >>  monitoring. Once an attack is identified, the long term HDFS storage
> would
> >>  allow better historical analysis for low-and-slow/persistent attacks
> (I'm
> >>  thinking of a method of data exfil that also won't successfully get
> stored
> >>  in Lucene, but is hard to identify over a short period of time).
> >>   - Along this line, I think that there are various parts of Metron
> (this
> >>  included) which could benefit from having method of configuring data
> aging
> >>  by bucket in HDFS (Following Nick's comments here
> >>  <https://issues.apache.org/jira/browse/METRON-477>).
> >>
> >>  3. I would potentially add a hash of the content that failed
> validation to
> >>  help identify repeats over time with less of a concern that you'd have
> >
> > back
> >>  to back failures (i.e. instead of storing the value itself).
> Additionally,
> >>  I think it's helpful to be able to search all times there was an
> indexing
> >>  error (instead of it hitting the catch-all).
> >>
> >>  Jon
> >>
> >>  On Fri, Jan 20, 2017 at 1:17 PM James Sirota <jsir...@apache.org>
> wrote:
> >>
> >>  We already have a capability to capture bolt errors and validation
> errors
> >>  and pipe them into a Kafka topic. I want to propose that we attach a
> >>  writer topology to the error and validation failed kafka topics so
> that we
> >>  can (a) create a new ES index for these errors and (b) create a new
> Kibana
> >>  dashboard to visualize them. The benefit would be that errors and
> >>  validation failures would be easier to see and analyze.
> >>
> >>  I am seeking feedback on the following:
> >>
> >>  - How granular would we want this feature to be? Think we would want
> one
> >>  index/dashboard per source? Or would it be better to collapse
> everything
> >>  into the same index?
> >>  - Do we care about storing these errors in HDFS as well? Or is indexing
> >>  them enough?
> >>  - What types of errors should we record? I am proposing:
> >>
> >>  For error reporting:
> >>  --Message failed to parse
> >>  --Enrichment failed to enrich
> >>  --Threat intel feed failures
> >>  --Generic catch-all for all other errors
> >>
> >>  For validation reporting:
> >>  --What part of message failed validation
> >>  --What stellar validator caused the failure
> >>
> >>  -------------------
> >>  Thank you,
> >>
> >>  James Sirota
> >>  PPMC- Apache Metron (Incubating)
> >>  jsirota AT apache DOT org
> >>
> >>  --
> >>
> >>  Jon
> >>
> >>  Sent from my mobile device
> >
> > -------------------
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
> > --
> >
> > Jon
> >
> > Sent from my mobile device
>
> -------------------
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>
-- 

Jon

Sent from my mobile device

Re: [DISCUSS] Error Indexing

Reply via email to