Re: [Discuss] Improve Alerting

Nick Allen Thu, 02 Feb 2017 06:37:48 -0800

> I would much rather be able to say something like score = some stellar
> statement that returns a float...



Completely agree.  FYI - We added METRON-683 yesterday that I believe
supports what you are saying.  Feel free to add commentary.

https://issues.apache.org/jira/browse/METRON-683

On Thu, Feb 2, 2017 at 9:02 AM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> I completely agree with Nick’s transparency comments, and like the design
> of the configuration, especially provision for messaging around the nature
> of the rule fired.
>
> I would just like to add a small point on the capabilities here. If the
> message could have embedded values through some sort of template for a
> stellar statement, it would make for a better more dynamic alert reason.
>
> I would also like to see the score field capable of outputting the value
> of a stellar statement. At the moment the idea of a static score being
> passed on means that if I have a probabilistic result I want to combine
> with other triage sources, I have to do a lot of bucketing into fixed
> values. I would much rather be able to say something like score = some
> stellar statement that returns a float, ‘alertness' = threshold of this.
> That way I can combine multiple triage rules to trigger an overall alert,
> making the aggregators more meaningful.
>
> Simon
>
>
> > On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
> >
> > For profiler alerts it will be helpful during analysis to see the alerts
> that caused the anomaly.  The meta alert is useful for incidents involving
> correlation of multiple events.
> >
> > Also you will need to filter out known hosts that trigger anomalies.
> For example vulnerability scanning software.
> >
> > One final thing to consider is anomalies happen every day without a
> security incident.  Depending on the network the profiler alerts could get
> very noisy so it might be better to correlate profiler alerts with other
> alerts.
> >
> > Thanks
> > Carolyn
> >
> >
> >
> > Sent from my Verizon, Samsung Galaxy smartphone
> >
> >
> > -------- Original message --------
> > From: Casey Stella <ceste...@gmail.com>
> > Date: 2/1/17 2:28 PM (GMT-05:00)
> > To: dev@metron.incubator.apache.org
> > Subject: Re: [Discuss] Improve Alerting
> >
> > I like the direction.  One thing that we may want is for comment to just
> be
> > a stellar expression and construct a function to essentially do
> > String.format().  So, that'd become:
> > "triageConfig" : {
> >  "riskLevelRules" : [
> >    {
> >      "name" : "Abnormal Value",
> >      "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
> > hostname, value, value_threshold)"
> >      "rule" : "value > value_threshold",
> >      "score" : 10
> >    }
> >  ],
> >  "aggregator" : "MAX"
> > }
> >
> > The reason:
> >
> >   - It's integrated and stellar is our default scripting layer
> >   - It supports doing some computation in the message
> >
> >
> > On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <n...@nickallen.org> wrote:
> >
> >> Like I said, here is a proposed solution to one of the gaps I
> identified in
> >> the previous email.
> >>
> >> *Problem*
> >>
> >> There is little transparency into the Threat Triage process itself.
> When
> >> Threat Triage runs, all I get is a score.  I don't know how that score
> was
> >> arrived at, which rules were triggered, and the specific values that
> caused
> >> a rule to trigger.
> >>
> >> More specifically, there is no way to generate a message that looks like
> >> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> >> threshold of '202'".  This makes it difficult for an analyst to action
> the
> >> alert.
> >>
> >> *Proposed Solution*
> >>
> >> To improve the transparency of the Threat Triage process, I am proposing
> >> these enhancements.
> >>
> >> 1. Threat Triage should attach to each message all of the rules that
> fired
> >> in addition to the total calculated threat triage score.
> >>
> >> 2. Threat Triage should allow a custom message to be generated for each
> >> rule.  The custom message would allow for some form of string
> interpolation
> >> so that I can add specific values from each message to the generated
> >> alert.  We could allow this in one or both of the new fields that Casey
> >> just added, name and comment.
> >>
> >>
> >> *Example*
> >>
> >> 1. In this example, we have a telemetry message with a field called
> 'value'
> >> that we need to monitor.  In Enrichment, I calculate some sort of value
> >> threshold, over which an alert should be generated.
> >>
> >>
> >> 2. In Threat Triage, I use the calculated value threshold to alert on
> any
> >> message that has a value exceeding this threshold.
> >>
> >> 3. I can embed values from the message, like the hostname, value, and
> value
> >> threshold, into the alert produced by Threat Triage.  Notice that I am
> >> using ${this} for string interpolation, but it could be any syntax that
> we
> >> choose.
> >>
> >>
> >> "triageConfig" : {
> >>  "riskLevelRules" : [
> >>    {
> >>      "name" : "Abnormal Value",
> >>      "comment" : "For ${hostname}; the value ${value} exceeds threshold
> of
> >> ${value_threshold}",
> >>      "rule" : "value > value_threshold",
> >>      "score" : 10
> >>    }
> >>  ],
> >>  "aggregator" : "MAX"
> >> }
> >>
> >>
> >> 4. The Threat Triage process today would add only the total calculated
> >> score.
> >>
> >> "threat.triage.level": 10.0
> >>
> >>
> >> With this proposal, Threat Triage would add the following to the
> message.
> >>
> >> Notice how each of the ${variables} have been replaced with the actual
> >> values extracted from the message.  This allows for more contextual
> >> information to action the alert.
> >>
> >> "threat.triage": {
> >>    "score": 10.0,
> >>    "rules": [
> >>      {
> >>        "name": "Abnormal Value",
> >>        "comment" : "For 10.0.0.1; the value 101 exceeds threshold of
> 42",
> >>        "score" : 10
> >>      }
> >>    ]
> >> }
> >>
> >>
> >>
> >> What do you think?  Any alternative ideas?
> >>
> >>
> >>
> >> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <n...@nickallen.org> wrote:
> >>
> >>> I'd like to explore the functionality that we have in Metron using a
> >>> motivating example.  I think this will help highlight some gaps where
> we
> >>> can enhance Metron.
> >>>
> >>> The motivating example is that I would like to create an alert if the
> >>> number of inbound flows to any host over a 15 minute interval is
> >> abnormal.
> >>> I would like the alert to contain the specific information below to
> >>> streamline the triage process.
> >>>
> >>> Rule: Abnormal number of inbound flows
> >>> Bin: 15 mins
> >>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
> exceeding
> >>> the threshold of '202'
> >>>
> >>>
> >>> *What Works*
> >>>
> >>> In some ways, this example is similar to the "Outlier Detection" demo
> >> that
> >>> I performed with the Profiler a few months back.   We have most of what
> >> we
> >>> need to do this with a couple caveats.
> >>>
> >>> 1. An enrichment would be added to enrich the message with the correct
> >>> internal hostname 'powned.svr.bank.com'.
> >>>
> >>> 2. With the Profiler, I can capture some idea of what "normal" is for
> the
> >>> number of inbound flows across 15 minute intervals.
> >>> 3. With Threat Triage, I can create rules that alert when a value
> exceeds
> >>> what the Profiler defines as normal.
> >>>
> >>>
> >>> *What's Missing*
> >>>
> >>> Its nice to know that we are almost all the way there with this
> example.
> >>> Unfortunately, there are two gaps that fall out of this.
> >>>
> >>> 1. *Threat Triage Transparency*
> >>>
> >>> There is little transparency into the Threat Triage process itself.
> When
> >>> Threat Triage runs, all I get is a score.  I don't know how that score
> >> was
> >>> arrived at, which rules were triggered, and the specific values that
> >> caused
> >>> a rule to trigger.
> >>>
> >>> More specifically, there is no way to generate a message that looks
> like
> >>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> >>> threshold of '202'".
> >>>
> >>>
> >>> 2. *Triage Calculated Values from the Profiler*
> >>>
> >>> Also, the value being interrogated here, the number of inbound flows,
> is
> >>> not a static value contained within any single telemetry message.  This
> >>> value is calculated across multiple messages by the Profiler.  The
> >> current
> >>> Threat Triage process cannot be used to interrogate values calculated
> by
> >>> the Profiler.
> >>>
> >>>
> >>> To try and keep this email concise and digestible, I am going to send a
> >>> follow-on discussing proposed solutions for each of these separately.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Nick Allen <n...@nickallen.org>
> >>
>
>

Re: [Discuss] Improve Alerting

Reply via email to