We also need a JIRA for any install/Ansible/MPack work needed.

On Tue, Jan 24, 2017 at 12:06 PM, James Sirota <jsir...@apache.org> wrote:

> Now that I had some time to think about it I would collapse all error and
> validation topics into one.  We can differentiate between different views
> of the data (split by error source etc) via Kibana dashboards.  I would
> implement this feature incrementally.  First I would modify all the bolts
> to log to a single topic.  Second, I would get the error indexing done by
> attaching the indexing topology to the error topic. Third I would create
> the necessary dashboards to view errors and validation failures by source.
> Lastly, I would file a follow-on JIRA to introduce hashing of errors or
> fields that are too long.  It seems like a separate feature that we need to
> think through.  We may need a stellar function around that.
>
> Thanks,
> James
>
> 24.01.2017, 10:25, "Ryan Merriman" <merrim...@gmail.com>:
> > I understand what Jon is talking about. He's proposing we hash the value
> > that caused the error, not necessarily the error message itself. For an
> > enrichment this is easy. Just pass along the field value that failed
> > enrichment. For other cases the field that caused the error may not be so
> > obvious. Take parser validation for example. The message is validated as
> > a whole and it may not be easy to determine which field is the cause. In
> > that case would a hash of the whole message work?
> >
> > There is a broader architectural discussion that needs to happen before
> we
> > can implement this. Currently we have an indexing topology that reads
> from
> > 1 topic and writes messages to ES but errors are written to several
> > different topics:
> >
> >    - parser_error
> >    - parser_invalid
> >    - enrichments_error
> >    - threatintel_error
> >    - indexing_error
> >
> > I can see 4 possible approaches to implementing this:
> >
> >    1. Create an index topology for each error topic
> >       1. Good because we can easily reuse the indexing topology and would
> >       require the least development effort
> >       2. Bad because it would consume a lot of extra worker slots
> >    2. Move the topic name into the error JSON message as a new
> "error_type"
> >    field and write all messages to the indexing topic
> >       1. Good because we don't need to create a new topology
> >       2. Bad because we would be flowing data and errors through the same
> >       topology. A spike in errors could affect message indexing.
> >    3. Compromise between 1 and 2. Create another indexing topology that
> is
> >    dedicated to indexing errors. Move the topic name into the error JSON
> >    message as a new "error_type" field and write all errors to a single
> error
> >    topic.
> >    4. Write a completely new topology with multiple spouts (1 for each
> >    error type listed above) that all feed into a single
> BulkMessageWriterBolt.
> >       1. Good because the current topologies would not need to change
> >       2. Bad because it would require the most development effort, would
> >       not reuse existing topologies and takes up more worker slots than 3
> >
> > Are there other approaches I haven't thought of? I think 1 and 2 are off
> > the table because they are shortcuts and not good long-term solutions. 3
> > would be my choice because it introduces less complexity than 4.
> Thoughts?
> >
> > Ryan
> >
> > On Mon, Jan 23, 2017 at 5:44 PM, zeo...@gmail.com <zeo...@gmail.com>
> wrote:
> >
> >>  In that case the hash would be of the value in the IP field, such as
> >>  sha3(8.8.8.8).
> >>
> >>  Jon
> >>
> >>  On Mon, Jan 23, 2017, 6:41 PM James Sirota <jsir...@apache.org> wrote:
> >>
> >>  > Jon,
> >>  >
> >>  > I am still not entirely following why we would want to use hashing.
> For
> >>  > example if my error is "Your IP field is invalid and failed
> validation"
> >>  > hashing this error string will always result in the same hash. Why
> not
> >>  > just use the actual error string? Can you provide an example where
> you
> >>  > would use it?
> >>  >
> >>  > Thanks,
> >>  > James
> >>  >
> >>  > 23.01.2017, 16:29, "zeo...@gmail.com" <zeo...@gmail.com>:
> >>  > > For 1 - I'm good with that.
> >>  > >
> >>  > > I'm talking about hashing the relevant content itself not the
> error.
> >>  Some
> >>  > > benefits are (1) minimize load on search index (there's minimal
> benefit
> >>  > in
> >>  > > spending the CPU and disk to keep it at full fidelity (tokenize and
> >>  > store))
> >>  > > (2) provide something to key on for dashboards (assuming a good
> hash
> >>  > > algorithm that avoids collisions and is second preimage resistant)
> and
> >>  > (3)
> >>  > > specific to errors, if the issue is that it failed to index, a hash
> >>  gives
> >>  > > us some protection that the issue will not occur twice.
> >>  > >
> >>  > > Jon
> >>  > >
> >>  > > On Mon, Jan 23, 2017, 2:47 PM James Sirota <jsir...@apache.org>
> wrote:
> >>  > >
> >>  > > Jon,
> >>  > >
> >>  > > With regards to 1, collapsing to a single dashboard for each would
> be
> >>  > > fine. So we would have one error index and one "failed to validate"
> >>  > > index. The distinction is that errors would be things that went
> wrong
> >>  > > during stream processing (failed to parse, etc...), while
> validation
> >>  > > failures are messages that explicitly failed stellar
> validation/schema
> >>  > > enforcement. There should be relatively few of the second type.
> >>  > >
> >>  > > With respect to 3, why do you want the error hashed? Why not just
> >>  search
> >>  > > for the error text?
> >>  > >
> >>  > > Thanks,
> >>  > > James
> >>  > >
> >>  > > 20.01.2017, 14:01, "zeo...@gmail.com" <zeo...@gmail.com>:
> >>  > >> As someone who currently fills the platform engineer role, I can
> give
> >>  > this
> >>  > >> idea a huge +1. My thoughts:
> >>  > >>
> >>  > >> 1. I think it depends on exactly what data is pushed into the
> index
> >>  > (#3).
> >>  > >> However, assuming the errors you proposed recording, I can't see
> huge
> >>  > >> benefits to having more than one dashboard. I would be happy to be
> >>  > >> persuaded otherwise.
> >>  > >>
> >>  > >> 2. I would say yes, storing the errors in HDFS in addition to
> >>  indexing
> >>  > is
> >>  > >> a good thing. Using METRON-510
> >>  > >> <https://issues.apache.org/jira/browse/METRON-510> as a case
> study,
> >>  > there
> >>  > >> is the potential in this environment for attacker-controlled data
> to
> >>  > >
> >>  > > result
> >>  > >> in processing errors which could be a method of evading security
> >>  > >> monitoring. Once an attack is identified, the long term HDFS
> storage
> >>  > would
> >>  > >> allow better historical analysis for low-and-slow/persistent
> attacks
> >>  > (I'm
> >>  > >> thinking of a method of data exfil that also won't successfully
> get
> >>  > stored
> >>  > >> in Lucene, but is hard to identify over a short period of time).
> >>  > >> - Along this line, I think that there are various parts of Metron
> >>  > (this
> >>  > >> included) which could benefit from having method of configuring
> data
> >>  > aging
> >>  > >> by bucket in HDFS (Following Nick's comments here
> >>  > >> <https://issues.apache.org/jira/browse/METRON-477>).
> >>  > >>
> >>  > >> 3. I would potentially add a hash of the content that failed
> >>  > validation to
> >>  > >> help identify repeats over time with less of a concern that you'd
> >>  have
> >>  > >
> >>  > > back
> >>  > >> to back failures (i.e. instead of storing the value itself).
> >>  > Additionally,
> >>  > >> I think it's helpful to be able to search all times there was an
> >>  > indexing
> >>  > >> error (instead of it hitting the catch-all).
> >>  > >>
> >>  > >> Jon
> >>  > >>
> >>  > >> On Fri, Jan 20, 2017 at 1:17 PM James Sirota <jsir...@apache.org>
> >>  > wrote:
> >>  > >>
> >>  > >> We already have a capability to capture bolt errors and validation
> >>  > errors
> >>  > >> and pipe them into a Kafka topic. I want to propose that we
> attach a
> >>  > >> writer topology to the error and validation failed kafka topics so
> >>  > that we
> >>  > >> can (a) create a new ES index for these errors and (b) create a
> new
> >>  > Kibana
> >>  > >> dashboard to visualize them. The benefit would be that errors and
> >>  > >> validation failures would be easier to see and analyze.
> >>  > >>
> >>  > >> I am seeking feedback on the following:
> >>  > >>
> >>  > >> - How granular would we want this feature to be? Think we would
> want
> >>  > one
> >>  > >> index/dashboard per source? Or would it be better to collapse
> >>  > everything
> >>  > >> into the same index?
> >>  > >> - Do we care about storing these errors in HDFS as well? Or is
> >>  indexing
> >>  > >> them enough?
> >>  > >> - What types of errors should we record? I am proposing:
> >>  > >>
> >>  > >> For error reporting:
> >>  > >> --Message failed to parse
> >>  > >> --Enrichment failed to enrich
> >>  > >> --Threat intel feed failures
> >>  > >> --Generic catch-all for all other errors
> >>  > >>
> >>  > >> For validation reporting:
> >>  > >> --What part of message failed validation
> >>  > >> --What stellar validator caused the failure
> >>  > >>
> >>  > >> -------------------
> >>  > >> Thank you,
> >>  > >>
> >>  > >> James Sirota
> >>  > >> PPMC- Apache Metron (Incubating)
> >>  > >> jsirota AT apache DOT org
> >>  > >>
> >>  > >> --
> >>  > >>
> >>  > >> Jon
> >>  > >>
> >>  > >> Sent from my mobile device
> >>  > >
> >>  > > -------------------
> >>  > > Thank you,
> >>  > >
> >>  > > James Sirota
> >>  > > PPMC- Apache Metron (Incubating)
> >>  > > jsirota AT apache DOT org
> >>  > >
> >>  > > --
> >>  > >
> >>  > > Jon
> >>  > >
> >>  > > Sent from my mobile device
> >>  >
> >>  > -------------------
> >>  > Thank you,
> >>  >
> >>  > James Sirota
> >>  > PPMC- Apache Metron (Incubating)
> >>  > jsirota AT apache DOT org
> >>  >
> >>  --
> >>
> >>  Jon
> >>
> >>  Sent from my mobile device
>
> -------------------
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>

Reply via email to