Re: [Discuss] Improve Alerting

Nick Allen Thu, 02 Feb 2017 06:47:06 -0800

Oh, I see.  Yes, very useful.


On Thu, Feb 2, 2017 at 9:39 AM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> That’s a part of it, certainly (and fixes another of my bug bears, so
> thank you!)
>
> In addition to the aggregation being stellar, I want score to be a stellar
> statement, I’ve put in a separate ticket for that.
> https://issues.apache.org/jira/browse/METRON-685 <
> https://issues.apache.org/jira/browse/METRON-685>
>
> Simon
>
> > On 2 Feb 2017, at 14:31, Nick Allen <n...@nickallen.org> wrote:
> >
> >> I would much rather be able to say something like score = some stellar
> >> statement that returns a float...
> >
> >
> > Completely agree.  FYI - We added METRON-683 yesterday that I believe
> > supports what you are saying.  Feel free to add commentary.
> >
> > https://issues.apache.org/jira/browse/METRON-683
> >
> > On Thu, Feb 2, 2017 at 9:02 AM, Simon Elliston Ball <
> > si...@simonellistonball.com> wrote:
> >
> >> I completely agree with Nick’s transparency comments, and like the
> design
> >> of the configuration, especially provision for messaging around the
> nature
> >> of the rule fired.
> >>
> >> I would just like to add a small point on the capabilities here. If the
> >> message could have embedded values through some sort of template for a
> >> stellar statement, it would make for a better more dynamic alert reason.
> >>
> >> I would also like to see the score field capable of outputting the value
> >> of a stellar statement. At the moment the idea of a static score being
> >> passed on means that if I have a probabilistic result I want to combine
> >> with other triage sources, I have to do a lot of bucketing into fixed
> >> values. I would much rather be able to say something like score = some
> >> stellar statement that returns a float, ‘alertness' = threshold of this.
> >> That way I can combine multiple triage rules to trigger an overall
> alert,
> >> making the aggregators more meaningful.
> >>
> >> Simon
> >>
> >>
> >>> On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
> >>>
> >>> For profiler alerts it will be helpful during analysis to see the
> alerts
> >> that caused the anomaly.  The meta alert is useful for incidents
> involving
> >> correlation of multiple events.
> >>>
> >>> Also you will need to filter out known hosts that trigger anomalies.
> >> For example vulnerability scanning software.
> >>>
> >>> One final thing to consider is anomalies happen every day without a
> >> security incident.  Depending on the network the profiler alerts could
> get
> >> very noisy so it might be better to correlate profiler alerts with other
> >> alerts.
> >>>
> >>> Thanks
> >>> Carolyn
> >>>
> >>>
> >>>
> >>> Sent from my Verizon, Samsung Galaxy smartphone
> >>>
> >>>
> >>> -------- Original message --------
> >>> From: Casey Stella <ceste...@gmail.com>
> >>> Date: 2/1/17 2:28 PM (GMT-05:00)
> >>> To: dev@metron.incubator.apache.org
> >>> Subject: Re: [Discuss] Improve Alerting
> >>>
> >>> I like the direction.  One thing that we may want is for comment to
> just
> >> be
> >>> a stellar expression and construct a function to essentially do
> >>> String.format().  So, that'd become:
> >>> "triageConfig" : {
> >>> "riskLevelRules" : [
> >>>   {
> >>>     "name" : "Abnormal Value",
> >>>     "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
> >>> hostname, value, value_threshold)"
> >>>     "rule" : "value > value_threshold",
> >>>     "score" : 10
> >>>   }
> >>> ],
> >>> "aggregator" : "MAX"
> >>> }
> >>>
> >>> The reason:
> >>>
> >>>  - It's integrated and stellar is our default scripting layer
> >>>  - It supports doing some computation in the message
> >>>
> >>>
> >>> On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <n...@nickallen.org> wrote:
> >>>
> >>>> Like I said, here is a proposed solution to one of the gaps I
> >> identified in
> >>>> the previous email.
> >>>>
> >>>> *Problem*
> >>>>
> >>>> There is little transparency into the Threat Triage process itself.
> >> When
> >>>> Threat Triage runs, all I get is a score.  I don't know how that score
> >> was
> >>>> arrived at, which rules were triggered, and the specific values that
> >> caused
> >>>> a rule to trigger.
> >>>>
> >>>> More specifically, there is no way to generate a message that looks
> like
> >>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> the
> >>>> threshold of '202'".  This makes it difficult for an analyst to action
> >> the
> >>>> alert.
> >>>>
> >>>> *Proposed Solution*
> >>>>
> >>>> To improve the transparency of the Threat Triage process, I am
> proposing
> >>>> these enhancements.
> >>>>
> >>>> 1. Threat Triage should attach to each message all of the rules that
> >> fired
> >>>> in addition to the total calculated threat triage score.
> >>>>
> >>>> 2. Threat Triage should allow a custom message to be generated for
> each
> >>>> rule.  The custom message would allow for some form of string
> >> interpolation
> >>>> so that I can add specific values from each message to the generated
> >>>> alert.  We could allow this in one or both of the new fields that
> Casey
> >>>> just added, name and comment.
> >>>>
> >>>>
> >>>> *Example*
> >>>>
> >>>> 1. In this example, we have a telemetry message with a field called
> >> 'value'
> >>>> that we need to monitor.  In Enrichment, I calculate some sort of
> value
> >>>> threshold, over which an alert should be generated.
> >>>>
> >>>>
> >>>> 2. In Threat Triage, I use the calculated value threshold to alert on
> >> any
> >>>> message that has a value exceeding this threshold.
> >>>>
> >>>> 3. I can embed values from the message, like the hostname, value, and
> >> value
> >>>> threshold, into the alert produced by Threat Triage.  Notice that I am
> >>>> using ${this} for string interpolation, but it could be any syntax
> that
> >> we
> >>>> choose.
> >>>>
> >>>>
> >>>> "triageConfig" : {
> >>>> "riskLevelRules" : [
> >>>>   {
> >>>>     "name" : "Abnormal Value",
> >>>>     "comment" : "For ${hostname}; the value ${value} exceeds threshold
> >> of
> >>>> ${value_threshold}",
> >>>>     "rule" : "value > value_threshold",
> >>>>     "score" : 10
> >>>>   }
> >>>> ],
> >>>> "aggregator" : "MAX"
> >>>> }
> >>>>
> >>>>
> >>>> 4. The Threat Triage process today would add only the total calculated
> >>>> score.
> >>>>
> >>>> "threat.triage.level": 10.0
> >>>>
> >>>>
> >>>> With this proposal, Threat Triage would add the following to the
> >> message.
> >>>>
> >>>> Notice how each of the ${variables} have been replaced with the actual
> >>>> values extracted from the message.  This allows for more contextual
> >>>> information to action the alert.
> >>>>
> >>>> "threat.triage": {
> >>>>   "score": 10.0,
> >>>>   "rules": [
> >>>>     {
> >>>>       "name": "Abnormal Value",
> >>>>       "comment" : "For 10.0.0.1; the value 101 exceeds threshold of
> >> 42",
> >>>>       "score" : 10
> >>>>     }
> >>>>   ]
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>> What do you think?  Any alternative ideas?
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <n...@nickallen.org>
> wrote:
> >>>>
> >>>>> I'd like to explore the functionality that we have in Metron using a
> >>>>> motivating example.  I think this will help highlight some gaps where
> >> we
> >>>>> can enhance Metron.
> >>>>>
> >>>>> The motivating example is that I would like to create an alert if the
> >>>>> number of inbound flows to any host over a 15 minute interval is
> >>>> abnormal.
> >>>>> I would like the alert to contain the specific information below to
> >>>>> streamline the triage process.
> >>>>>
> >>>>> Rule: Abnormal number of inbound flows
> >>>>> Bin: 15 mins
> >>>>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
> >> exceeding
> >>>>> the threshold of '202'
> >>>>>
> >>>>>
> >>>>> *What Works*
> >>>>>
> >>>>> In some ways, this example is similar to the "Outlier Detection" demo
> >>>> that
> >>>>> I performed with the Profiler a few months back.   We have most of
> what
> >>>> we
> >>>>> need to do this with a couple caveats.
> >>>>>
> >>>>> 1. An enrichment would be added to enrich the message with the
> correct
> >>>>> internal hostname 'powned.svr.bank.com'.
> >>>>>
> >>>>> 2. With the Profiler, I can capture some idea of what "normal" is for
> >> the
> >>>>> number of inbound flows across 15 minute intervals.
> >>>>> 3. With Threat Triage, I can create rules that alert when a value
> >> exceeds
> >>>>> what the Profiler defines as normal.
> >>>>>
> >>>>>
> >>>>> *What's Missing*
> >>>>>
> >>>>> Its nice to know that we are almost all the way there with this
> >> example.
> >>>>> Unfortunately, there are two gaps that fall out of this.
> >>>>>
> >>>>> 1. *Threat Triage Transparency*
> >>>>>
> >>>>> There is little transparency into the Threat Triage process itself.
> >> When
> >>>>> Threat Triage runs, all I get is a score.  I don't know how that
> score
> >>>> was
> >>>>> arrived at, which rules were triggered, and the specific values that
> >>>> caused
> >>>>> a rule to trigger.
> >>>>>
> >>>>> More specifically, there is no way to generate a message that looks
> >> like
> >>>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> the
> >>>>> threshold of '202'".
> >>>>>
> >>>>>
> >>>>> 2. *Triage Calculated Values from the Profiler*
> >>>>>
> >>>>> Also, the value being interrogated here, the number of inbound flows,
> >> is
> >>>>> not a static value contained within any single telemetry message.
> This
> >>>>> value is calculated across multiple messages by the Profiler.  The
> >>>> current
> >>>>> Threat Triage process cannot be used to interrogate values calculated
> >> by
> >>>>> the Profiler.
> >>>>>
> >>>>>
> >>>>> To try and keep this email concise and digestible, I am going to
> send a
> >>>>> follow-on discussing proposed solutions for each of these separately.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Nick Allen <n...@nickallen.org>
> >>>>
> >>
> >>
>
>

Re: [Discuss] Improve Alerting

Reply via email to