Re: [DISCUSS] Error Indexing

zeo...@gmail.com Tue, 31 Jan 2017 20:24:05 -0800

After thinking on this for a few days I recant my previous suggestion of
TupleHash256.  It's still a bit early for SHA-3 - no good reference
implementations/libraries exist (I did some searching and emailing), it is
optimized for hardware but no hardware implementation is widely accessible,
FIPS 140-3 is still not close to finalized, etc.


I think we could simulate the benefits of tuplehash by sorting the tuples,
then doing SHA-256(len(tuple1) | tuple1 | ... | len(tuplen) | tuplen).
Happy to entertain opposing thoughts, such as BLAKE2, etc. but with the
likely users of Metron, I think sticking with FIPS 140-2 is a solid choice.

Jon

On Thu, Jan 26, 2017, 11:23 AM zeo...@gmail.com <zeo...@gmail.com> wrote:

So one more thing regarding why I think we should throw an exception on a
failed enrichment.  If we do make something like username a constant field,
in cases where that is used to calculate rawMessage_hash, if it fails to
enrich, the hash would be different compared to when it succeeds.  Of
course I think the initial intent of adding username as a constant field
would be to handle it in the parsers, where that information is provided in
the messages themselves, but how would Threat Intel know the difference?
In my environment I am looking forward to a streaming enrichment that adds
the username, where applicable, anywhere I have an IP.

My hesitant suggestion for a hashing algorithm would be to use
TupleHash256, as it is a NIST-provided implementation of SHA-3 (using
cSHAKE) for this use case.  Details here
<http://nvlpubs.nist.gov/nistpubs/specialpublications/nist.sp.800-185.pdf>.
However, I haven't been able to find a reference implementation of this in
any language, so that's a bit of a downside.  A more general SHA3-256
implementation where we handle ordering could work as well, but would be
significantly less optimal.

Jon

On Thu, Jan 26, 2017 at 10:20 AM Ryan Merriman <merrim...@gmail.com> wrote:

Jon, I misread the code in the GenericEnrichmentBolt.  The error is
forwarded on so no issues there.

Defaulting to the common fields makes sense.  I will dig into the
GenericEnrichmentBolt more, maybe there is a way to get the error fields
without having to significantly change things.  Any opinion on a hashing
algorithm?

On Wed, Jan 25, 2017 at 9:37 PM, zeo...@gmail.com <zeo...@gmail.com> wrote:

> Although hashing the whole message is better than nothing, it misses a lot
> of the benefits we could get.
>
> While I'd love to have consistency for this field across all of the
> different error.types, it appears that may not be reasonably possible
> because of the parsers.  So, how about something like hash all of the
> constant
> fields
> <https://github.com/apache/incubator-metron/blob/master/
> metron-platform/metron-common/src/main/java/org/apache/
> metron/common/Constants.java>
> excluding
> timestamp and original_string unless it is a parser, in which case hash
the
> entire message?  This gives us some measure of event uniqueness and it can
> grow as we define additional constant fields (I recall discussing with
> someone else on the list regarding expanding those standard fields to
> include things like usernames but I can't find the specific email
> exchange).
>
> Because some enrichments can be heavily relied on, I think it makes sense
> to put a message onto the error queue when it throws an exception.  Not
> only does this help troubleshoot edge cases, but it makes issues more
> obvious when assembling a new enrichment in dev/test.  I can't think of a
> scenario currently where an enrichment would only be "best effort" and
that
> I wouldn't want that error indexed and retrievable.  However, this gets
> interesting when talking about the various options to solve the "Enrich
> enrichment" discussion from earlier in the month.  We can keep that part
of
> this separate though, as I don't think that's being actively pursued right
> now.
>
> Jon
>
> On Wed, Jan 25, 2017 at 10:49 AM David Lyle <dlyle65...@gmail.com> wrote:
>
> RE: separate JIRA for MPack/Ansible. No objection to tracking them
> separately, but for this item to be complete, you'll need both the feature
> and the ability to install it.
>
> -D...
>
>
> On Tue, Jan 24, 2017 at 5:33 PM, Ryan Merriman <merrim...@gmail.com>
> wrote:
>
> > Assuming we're going to write all errors to a single error topic, I
think
> > it makes sense to agree on an error message schema and handle errors
> across
> > the 3 different topologies in the same way with a single implementation.
> > The implementation in ParserBolt (ErrorUtils.handleError) produces the
> most
> > verbose error object so I think it's a good candidate for the single
> > implementation.  Here is the message structure it currently produces:
> >
> > {
> >   "exception": "java.lang.Exception: there was an error",
> >   "hostname": "host",
> >   "stack": "java.lang.Exception: ...",
> >   "time": 1485295416563,
> >   "message": "there was an error",
> >   "rawMessage": "raw message",
> >   "rawMessage_bytes": [],
> >   "source.type": "bro_error"
> > }
> >
> > From our discussion so far we need to add a couple fields:  an error
type
> > and hash id.  Adding these to the message looks like:
> >
> > {
> >   "exception": "java.lang.Exception: there was an error",
> >   "hostname": "host",
> >   "stack": "java.lang.Exception: ...",
> >   "time": 1485295416563,
> >   "message": "there was an error",
> >   "rawMessage": "raw message",
> >   "rawMessage_bytes": [],
> >   "source.type": "bro_error",
> >   "error.type": "parser_error",
> >   "rawMessage_hash": "dde41b9920954f94066daf6291fb58a9"
> > }
> >
> > We should also consider expanding the error types I listed earlier.
> > Instead of just having "indexing_error" we could have
> > "elasticsearch_indexing_error", "hdfs_indexing_error" and so on.
> >
> > Jon, if an exception happens in an enrichment or threat intel bolt the
> > message is passed along with no error thrown (only logged).  Everywhere
> > else I'm having trouble identifying specific fields that should be
> hashed.
> > Would hashing the message in every case be acceptable?  Do you know of a
> > place where we could hash a field instead?  On the topic of exceptions
in
> > enrichments, are we ok with an error only being logged and not added to
> the
> > message or emitted to the error queue?
> >
> >
> >
> > On Tue, Jan 24, 2017 at 3:10 PM, Ryan Merriman <merrim...@gmail.com>
> > wrote:
> >
> > > That use case makes sense to me.  I don't think it will require that
> much
> > > additional effort either.
> > >
> > > On Tue, Jan 24, 2017 at 1:02 PM, zeo...@gmail.com <zeo...@gmail.com>
> > > wrote:
> > >
> > >> Regarding error vs validation - Either way I'm not very concerned.  I
> > >> initially assumed they would be combined and agree with that
approach,
> > but
> > >> splitting them out isn't a very big deal to me either.
> > >>
> > >> Re: Ryan.  Yes, exactly.  In the case of a parser issue (or anywhere
> > else
> > >> where it's not possible to pick out the exact thing causing the
issue)
> > it
> > >> would be a hash of the complete message.
> > >>
> > >> Regarding the architecture, I mostly agree with James except that I
> > think
> > >> step 3 needs to also be able to somehow group errors via the original
> > >> data (identify
> > >> replays, identify repeat issues with data in a specific field, issues
> > with
> > >> consistently different data, etc.).  This is essentially the first
> step
> > of
> > >> troubleshooting, which I assume you are doing if you're looking at
the
> > >> error dashboard.
> > >>
> > >> If the hash gets moved out of the initial implementation, I'm fairly
> > >> certain you lose this ability.  The point here isn't to handle long
> > fields
> > >> (although that's a benefit of this approach), it's to attach a unique
> > >> identifier to the error/validation issue message that links it to the
> > >> original problem.  I'd be happy to consider alternative solutions to
> > this
> > >> problem (for instance, actually sending across the data itself) I
just
> > >> haven't been able to think of another way to do this that I like
> better.
> > >>
> > >> Jon
> > >>
> > >> On Tue, Jan 24, 2017 at 1:13 PM Ryan Merriman <merrim...@gmail.com>
> > >> wrote:
> > >>
> > >> > We also need a JIRA for any install/Ansible/MPack work needed.
> > >> >
> > >> > On Tue, Jan 24, 2017 at 12:06 PM, James Sirota <jsir...@apache.org>
> > >> wrote:
> > >> >
> > >> > > Now that I had some time to think about it I would collapse all
> > error
> > >> and
> > >> > > validation topics into one.  We can differentiate between
> different
> > >> views
> > >> > > of the data (split by error source etc) via Kibana dashboards.  I
> > >> would
> > >> > > implement this feature incrementally.  First I would modify all
> the
> > >> bolts
> > >> > > to log to a single topic.  Second, I would get the error indexing
> > >> done by
> > >> > > attaching the indexing topology to the error topic. Third I would
> > >> create
> > >> > > the necessary dashboards to view errors and validation failures
by
> > >> > source.
> > >> > > Lastly, I would file a follow-on JIRA to introduce hashing of
> errors
> > >> or
> > >> > > fields that are too long.  It seems like a separate feature that
> we
> > >> need
> > >> > to
> > >> > > think through.  We may need a stellar function around that.
> > >> > >
> > >> > > Thanks,
> > >> > > James
> > >> > >
> > >> > > 24.01.2017, 10:25, "Ryan Merriman" <merrim...@gmail.com>:
> > >> > > > I understand what Jon is talking about. He's proposing we hash
> the
> > >> > value
> > >> > > > that caused the error, not necessarily the error message
itself.
> > >> For an
> > >> > > > enrichment this is easy. Just pass along the field value that
> > failed
> > >> > > > enrichment. For other cases the field that caused the error may
> > not
> > >> be
> > >> > so
> > >> > > > obvious. Take parser validation for example. The message is
> > >> validated
> > >> > as
> > >> > > > a whole and it may not be easy to determine which field is the
> > >> cause.
> > >> > In
> > >> > > > that case would a hash of the whole message work?
> > >> > > >
> > >> > > > There is a broader architectural discussion that needs to
happen
> > >> before
> > >> > > we
> > >> > > > can implement this. Currently we have an indexing topology that
> > >> reads
> > >> > > from
> > >> > > > 1 topic and writes messages to ES but errors are written to
> > several
> > >> > > > different topics:
> > >> > > >
> > >> > > >    - parser_error
> > >> > > >    - parser_invalid
> > >> > > >    - enrichments_error
> > >> > > >    - threatintel_error
> > >> > > >    - indexing_error
> > >> > > >
> > >> > > > I can see 4 possible approaches to implementing this:
> > >> > > >
> > >> > > >    1. Create an index topology for each error topic
> > >> > > >       1. Good because we can easily reuse the indexing topology
> > and
> > >> > would
> > >> > > >       require the least development effort
> > >> > > >       2. Bad because it would consume a lot of extra worker
> slots
> > >> > > >    2. Move the topic name into the error JSON message as a new
> > >> > > "error_type"
> > >> > > >    field and write all messages to the indexing topic
> > >> > > >       1. Good because we don't need to create a new topology
> > >> > > >       2. Bad because we would be flowing data and errors
through
> > the
> > >> > same
> > >> > > >       topology. A spike in errors could affect message
indexing.
> > >> > > >    3. Compromise between 1 and 2. Create another indexing
> topology
> > >> that
> > >> > > is
> > >> > > >    dedicated to indexing errors. Move the topic name into the
> > error
> > >> > JSON
> > >> > > >    message as a new "error_type" field and write all errors to
a
> > >> single
> > >> > > error
> > >> > > >    topic.
> > >> > > >    4. Write a completely new topology with multiple spouts (1
> for
> > >> each
> > >> > > >    error type listed above) that all feed into a single
> > >> > > BulkMessageWriterBolt.
> > >> > > >       1. Good because the current topologies would not need to
> > >> change
> > >> > > >       2. Bad because it would require the most development
> effort,
> > >> > would
> > >> > > >       not reuse existing topologies and takes up more worker
> slots
> > >> > than 3
> > >> > > >
> > >> > > > Are there other approaches I haven't thought of? I think 1 and
2
> > are
> > >> > off
> > >> > > > the table because they are shortcuts and not good long-term
> > >> solutions.
> > >> > 3
> > >> > > > would be my choice because it introduces less complexity than
4.
> > >> > > Thoughts?
> > >> > > >
> > >> > > > Ryan
> > >> > > >
> > >> > > > On Mon, Jan 23, 2017 at 5:44 PM, zeo...@gmail.com <
> > zeo...@gmail.com
> > >> >
> > >> > > wrote:
> > >> > > >
> > >> > > >>  In that case the hash would be of the value in the IP field,
> > such
> > >> as
> > >> > > >>  sha3(8.8.8.8).
> > >> > > >>
> > >> > > >>  Jon
> > >> > > >>
> > >> > > >>  On Mon, Jan 23, 2017, 6:41 PM James Sirota <
> jsir...@apache.org>
> > >> > wrote:
> > >> > > >>
> > >> > > >>  > Jon,
> > >> > > >>  >
> > >> > > >>  > I am still not entirely following why we would want to use
> > >> hashing.
> > >> > > For
> > >> > > >>  > example if my error is "Your IP field is invalid and failed
> > >> > > validation"
> > >> > > >>  > hashing this error string will always result in the same
> hash.
> > >> Why
> > >> > > not
> > >> > > >>  > just use the actual error string? Can you provide an
example
> > >> where
> > >> > > you
> > >> > > >>  > would use it?
> > >> > > >>  >
> > >> > > >>  > Thanks,
> > >> > > >>  > James
> > >> > > >>  >
> > >> > > >>  > 23.01.2017, 16:29, "zeo...@gmail.com" <zeo...@gmail.com>:
> > >> > > >>  > > For 1 - I'm good with that.
> > >> > > >>  > >
> > >> > > >>  > > I'm talking about hashing the relevant content itself not
> > the
> > >> > > error.
> > >> > > >>  Some
> > >> > > >>  > > benefits are (1) minimize load on search index (there's
> > >> minimal
> > >> > > benefit
> > >> > > >>  > in
> > >> > > >>  > > spending the CPU and disk to keep it at full fidelity
> > >> (tokenize
> > >> > and
> > >> > > >>  > store))
> > >> > > >>  > > (2) provide something to key on for dashboards (assuming
a
> > >> good
> > >> > > hash
> > >> > > >>  > > algorithm that avoids collisions and is second preimage
> > >> > resistant)
> > >> > > and
> > >> > > >>  > (3)
> > >> > > >>  > > specific to errors, if the issue is that it failed to
> > index, a
> > >> > hash
> > >> > > >>  gives
> > >> > > >>  > > us some protection that the issue will not occur twice.
> > >> > > >>  > >
> > >> > > >>  > > Jon
> > >> > > >>  > >
> > >> > > >>  > > On Mon, Jan 23, 2017, 2:47 PM James Sirota <
> > >> jsir...@apache.org>
> > >> > > wrote:
> > >> > > >>  > >
> > >> > > >>  > > Jon,
> > >> > > >>  > >
> > >> > > >>  > > With regards to 1, collapsing to a single dashboard for
> each
> > >> > would
> > >> > > be
> > >> > > >>  > > fine. So we would have one error index and one "failed to
> > >> > validate"
> > >> > > >>  > > index. The distinction is that errors would be things
that
> > >> went
> > >> > > wrong
> > >> > > >>  > > during stream processing (failed to parse, etc...), while
> > >> > > validation
> > >> > > >>  > > failures are messages that explicitly failed stellar
> > >> > > validation/schema
> > >> > > >>  > > enforcement. There should be relatively few of the second
> > >> type.
> > >> > > >>  > >
> > >> > > >>  > > With respect to 3, why do you want the error hashed? Why
> not
> > >> just
> > >> > > >>  search
> > >> > > >>  > > for the error text?
> > >> > > >>  > >
> > >> > > >>  > > Thanks,
> > >> > > >>  > > James
> > >> > > >>  > >
> > >> > > >>  > > 20.01.2017, 14:01, "zeo...@gmail.com" <zeo...@gmail.com>:
> > >> > > >>  > >> As someone who currently fills the platform engineer
> role,
> > I
> > >> can
> > >> > > give
> > >> > > >>  > this
> > >> > > >>  > >> idea a huge +1. My thoughts:
> > >> > > >>  > >>
> > >> > > >>  > >> 1. I think it depends on exactly what data is pushed
into
> > the
> > >> > > index
> > >> > > >>  > (#3).
> > >> > > >>  > >> However, assuming the errors you proposed recording, I
> > can't
> > >> see
> > >> > > huge
> > >> > > >>  > >> benefits to having more than one dashboard. I would be
> > happy
> > >> to
> > >> > be
> > >> > > >>  > >> persuaded otherwise.
> > >> > > >>  > >>
> > >> > > >>  > >> 2. I would say yes, storing the errors in HDFS in
> addition
> > to
> > >> > > >>  indexing
> > >> > > >>  > is
> > >> > > >>  > >> a good thing. Using METRON-510
> > >> > > >>  > >> <https://issues.apache.org/jira/browse/METRON-510> as a
> > case
> > >> > > study,
> > >> > > >>  > there
> > >> > > >>  > >> is the potential in this environment for
> > attacker-controlled
> > >> > data
> > >> > > to
> > >> > > >>  > >
> > >> > > >>  > > result
> > >> > > >>  > >> in processing errors which could be a method of evading
> > >> security
> > >> > > >>  > >> monitoring. Once an attack is identified, the long term
> > HDFS
> > >> > > storage
> > >> > > >>  > would
> > >> > > >>  > >> allow better historical analysis for
> > low-and-slow/persistent
> > >> > > attacks
> > >> > > >>  > (I'm
> > >> > > >>  > >> thinking of a method of data exfil that also won't
> > >> successfully
> > >> > > get
> > >> > > >>  > stored
> > >> > > >>  > >> in Lucene, but is hard to identify over a short period
of
> > >> time).
> > >> > > >>  > >> - Along this line, I think that there are various parts
> of
> > >> > Metron
> > >> > > >>  > (this
> > >> > > >>  > >> included) which could benefit from having method of
> > >> configuring
> > >> > > data
> > >> > > >>  > aging
> > >> > > >>  > >> by bucket in HDFS (Following Nick's comments here
> > >> > > >>  > >> <https://issues.apache.org/jira/browse/METRON-477>).
> > >> > > >>  > >>
> > >> > > >>  > >> 3. I would potentially add a hash of the content that
> > failed
> > >> > > >>  > validation to
> > >> > > >>  > >> help identify repeats over time with less of a concern
> that
> > >> > you'd
> > >> > > >>  have
> > >> > > >>  > >
> > >> > > >>  > > back
> > >> > > >>  > >> to back failures (i.e. instead of storing the value
> > itself).
> > >> > > >>  > Additionally,
> > >> > > >>  > >> I think it's helpful to be able to search all times
there
> > >> was an
> > >> > > >>  > indexing
> > >> > > >>  > >> error (instead of it hitting the catch-all).
> > >> > > >>  > >>
> > >> > > >>  > >> Jon
> > >> > > >>  > >>
> > >> > > >>  > >> On Fri, Jan 20, 2017 at 1:17 PM James Sirota <
> > >> > jsir...@apache.org>
> > >> > > >>  > wrote:
> > >> > > >>  > >>
> > >> > > >>  > >> We already have a capability to capture bolt errors and
> > >> > validation
> > >> > > >>  > errors
> > >> > > >>  > >> and pipe them into a Kafka topic. I want to propose that
> we
> > >> > > attach a
> > >> > > >>  > >> writer topology to the error and validation failed kafka
> > >> topics
> > >> > so
> > >> > > >>  > that we
> > >> > > >>  > >> can (a) create a new ES index for these errors and (b)
> > >> create a
> > >> > > new
> > >> > > >>  > Kibana
> > >> > > >>  > >> dashboard to visualize them. The benefit would be that
> > errors
> > >> > and
> > >> > > >>  > >> validation failures would be easier to see and analyze.
> > >> > > >>  > >>
> > >> > > >>  > >> I am seeking feedback on the following:
> > >> > > >>  > >>
> > >> > > >>  > >> - How granular would we want this feature to be? Think
we
> > >> would
> > >> > > want
> > >> > > >>  > one
> > >> > > >>  > >> index/dashboard per source? Or would it be better to
> > collapse
> > >> > > >>  > everything
> > >> > > >>  > >> into the same index?
> > >> > > >>  > >> - Do we care about storing these errors in HDFS as well?
> Or
> > >> is
> > >> > > >>  indexing
> > >> > > >>  > >> them enough?
> > >> > > >>  > >> - What types of errors should we record? I am proposing:
> > >> > > >>  > >>
> > >> > > >>  > >> For error reporting:
> > >> > > >>  > >> --Message failed to parse
> > >> > > >>  > >> --Enrichment failed to enrich
> > >> > > >>  > >> --Threat intel feed failures
> > >> > > >>  > >> --Generic catch-all for all other errors
> > >> > > >>  > >>
> > >> > > >>  > >> For validation reporting:
> > >> > > >>  > >> --What part of message failed validation
> > >> > > >>  > >> --What stellar validator caused the failure
> > >> > > >>  > >>
> > >> > > >>  > >> -------------------
> > >> > > >>  > >> Thank you,
> > >> > > >>  > >>
> > >> > > >>  > >> James Sirota
> > >> > > >>  > >> PPMC- Apache Metron (Incubating)
> > >> > > >>  > >> jsirota AT apache DOT org
> > >> > > >>  > >>
> > >> > > >>  > >> --
> > >> > > >>  > >>
> > >> > > >>  > >> Jon
> > >> > > >>  > >>
> > >> > > >>  > >> Sent from my mobile device
> > >> > > >>  > >
> > >> > > >>  > > -------------------
> > >> > > >>  > > Thank you,
> > >> > > >>  > >
> > >> > > >>  > > James Sirota
> > >> > > >>  > > PPMC- Apache Metron (Incubating)
> > >> > > >>  > > jsirota AT apache DOT org
> > >> > > >>  > >
> > >> > > >>  > > --
> > >> > > >>  > >
> > >> > > >>  > > Jon
> > >> > > >>  > >
> > >> > > >>  > > Sent from my mobile device
> > >> > > >>  >
> > >> > > >>  > -------------------
> > >> > > >>  > Thank you,
> > >> > > >>  >
> > >> > > >>  > James Sirota
> > >> > > >>  > PPMC- Apache Metron (Incubating)
> > >> > > >>  > jsirota AT apache DOT org
> > >> > > >>  >
> > >> > > >>  --
> > >> > > >>
> > >> > > >>  Jon
> > >> > > >>
> > >> > > >>  Sent from my mobile device
> > >> > >
> > >> > > -------------------
> > >> > > Thank you,
> > >> > >
> > >> > > James Sirota
> > >> > > PPMC- Apache Metron (Incubating)
> > >> > > jsirota AT apache DOT org
> > >> > >
> > >> >
> > >> --
> > >>
> > >> Jon
> > >>
> > >> Sent from my mobile device
> > >>
> > >
> > >
> >
>
> --
>
> Jon
>
> Sent from my mobile device
>

-- 

Jon

Sent from my mobile device

-- 

Jon

Sent from my mobile device

Re: [DISCUSS] Error Indexing

Reply via email to