Re: [Discuss] Improve Alerting

Simon Elliston Ball Thu, 02 Feb 2017 06:40:01 -0800

That’s a part of it, certainly (and fixes another of my bug bears, so thank 
you!)


In addition to the aggregation being stellar, I want score to be a stellar 
statement, I’ve put in a separate ticket for that. 
https://issues.apache.org/jira/browse/METRON-685 
<https://issues.apache.org/jira/browse/METRON-685>

Simon

> On 2 Feb 2017, at 14:31, Nick Allen <n...@nickallen.org> wrote:
> 
>> I would much rather be able to say something like score = some stellar
>> statement that returns a float...
> 
> 
> Completely agree.  FYI - We added METRON-683 yesterday that I believe
> supports what you are saying.  Feel free to add commentary.
> 
> https://issues.apache.org/jira/browse/METRON-683
> 
> On Thu, Feb 2, 2017 at 9:02 AM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
> 
>> I completely agree with Nick’s transparency comments, and like the design
>> of the configuration, especially provision for messaging around the nature
>> of the rule fired.
>> 
>> I would just like to add a small point on the capabilities here. If the
>> message could have embedded values through some sort of template for a
>> stellar statement, it would make for a better more dynamic alert reason.
>> 
>> I would also like to see the score field capable of outputting the value
>> of a stellar statement. At the moment the idea of a static score being
>> passed on means that if I have a probabilistic result I want to combine
>> with other triage sources, I have to do a lot of bucketing into fixed
>> values. I would much rather be able to say something like score = some
>> stellar statement that returns a float, ‘alertness' = threshold of this.
>> That way I can combine multiple triage rules to trigger an overall alert,
>> making the aggregators more meaningful.
>> 
>> Simon
>> 
>> 
>>> On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
>>> 
>>> For profiler alerts it will be helpful during analysis to see the alerts
>> that caused the anomaly.  The meta alert is useful for incidents involving
>> correlation of multiple events.
>>> 
>>> Also you will need to filter out known hosts that trigger anomalies.
>> For example vulnerability scanning software.
>>> 
>>> One final thing to consider is anomalies happen every day without a
>> security incident.  Depending on the network the profiler alerts could get
>> very noisy so it might be better to correlate profiler alerts with other
>> alerts.
>>> 
>>> Thanks
>>> Carolyn
>>> 
>>> 
>>> 
>>> Sent from my Verizon, Samsung Galaxy smartphone
>>> 
>>> 
>>> -------- Original message --------
>>> From: Casey Stella <ceste...@gmail.com>
>>> Date: 2/1/17 2:28 PM (GMT-05:00)
>>> To: dev@metron.incubator.apache.org
>>> Subject: Re: [Discuss] Improve Alerting
>>> 
>>> I like the direction.  One thing that we may want is for comment to just
>> be
>>> a stellar expression and construct a function to essentially do
>>> String.format().  So, that'd become:
>>> "triageConfig" : {
>>> "riskLevelRules" : [
>>>   {
>>>     "name" : "Abnormal Value",
>>>     "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
>>> hostname, value, value_threshold)"
>>>     "rule" : "value > value_threshold",
>>>     "score" : 10
>>>   }
>>> ],
>>> "aggregator" : "MAX"
>>> }
>>> 
>>> The reason:
>>> 
>>>  - It's integrated and stellar is our default scripting layer
>>>  - It supports doing some computation in the message
>>> 
>>> 
>>> On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <n...@nickallen.org> wrote:
>>> 
>>>> Like I said, here is a proposed solution to one of the gaps I
>> identified in
>>>> the previous email.
>>>> 
>>>> *Problem*
>>>> 
>>>> There is little transparency into the Threat Triage process itself.
>> When
>>>> Threat Triage runs, all I get is a score.  I don't know how that score
>> was
>>>> arrived at, which rules were triggered, and the specific values that
>> caused
>>>> a rule to trigger.
>>>> 
>>>> More specifically, there is no way to generate a message that looks like
>>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
>>>> threshold of '202'".  This makes it difficult for an analyst to action
>> the
>>>> alert.
>>>> 
>>>> *Proposed Solution*
>>>> 
>>>> To improve the transparency of the Threat Triage process, I am proposing
>>>> these enhancements.
>>>> 
>>>> 1. Threat Triage should attach to each message all of the rules that
>> fired
>>>> in addition to the total calculated threat triage score.
>>>> 
>>>> 2. Threat Triage should allow a custom message to be generated for each
>>>> rule.  The custom message would allow for some form of string
>> interpolation
>>>> so that I can add specific values from each message to the generated
>>>> alert.  We could allow this in one or both of the new fields that Casey
>>>> just added, name and comment.
>>>> 
>>>> 
>>>> *Example*
>>>> 
>>>> 1. In this example, we have a telemetry message with a field called
>> 'value'
>>>> that we need to monitor.  In Enrichment, I calculate some sort of value
>>>> threshold, over which an alert should be generated.
>>>> 
>>>> 
>>>> 2. In Threat Triage, I use the calculated value threshold to alert on
>> any
>>>> message that has a value exceeding this threshold.
>>>> 
>>>> 3. I can embed values from the message, like the hostname, value, and
>> value
>>>> threshold, into the alert produced by Threat Triage.  Notice that I am
>>>> using ${this} for string interpolation, but it could be any syntax that
>> we
>>>> choose.
>>>> 
>>>> 
>>>> "triageConfig" : {
>>>> "riskLevelRules" : [
>>>>   {
>>>>     "name" : "Abnormal Value",
>>>>     "comment" : "For ${hostname}; the value ${value} exceeds threshold
>> of
>>>> ${value_threshold}",
>>>>     "rule" : "value > value_threshold",
>>>>     "score" : 10
>>>>   }
>>>> ],
>>>> "aggregator" : "MAX"
>>>> }
>>>> 
>>>> 
>>>> 4. The Threat Triage process today would add only the total calculated
>>>> score.
>>>> 
>>>> "threat.triage.level": 10.0
>>>> 
>>>> 
>>>> With this proposal, Threat Triage would add the following to the
>> message.
>>>> 
>>>> Notice how each of the ${variables} have been replaced with the actual
>>>> values extracted from the message.  This allows for more contextual
>>>> information to action the alert.
>>>> 
>>>> "threat.triage": {
>>>>   "score": 10.0,
>>>>   "rules": [
>>>>     {
>>>>       "name": "Abnormal Value",
>>>>       "comment" : "For 10.0.0.1; the value 101 exceeds threshold of
>> 42",
>>>>       "score" : 10
>>>>     }
>>>>   ]
>>>> }
>>>> 
>>>> 
>>>> 
>>>> What do you think?  Any alternative ideas?
>>>> 
>>>> 
>>>> 
>>>> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <n...@nickallen.org> wrote:
>>>> 
>>>>> I'd like to explore the functionality that we have in Metron using a
>>>>> motivating example.  I think this will help highlight some gaps where
>> we
>>>>> can enhance Metron.
>>>>> 
>>>>> The motivating example is that I would like to create an alert if the
>>>>> number of inbound flows to any host over a 15 minute interval is
>>>> abnormal.
>>>>> I would like the alert to contain the specific information below to
>>>>> streamline the triage process.
>>>>> 
>>>>> Rule: Abnormal number of inbound flows
>>>>> Bin: 15 mins
>>>>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
>> exceeding
>>>>> the threshold of '202'
>>>>> 
>>>>> 
>>>>> *What Works*
>>>>> 
>>>>> In some ways, this example is similar to the "Outlier Detection" demo
>>>> that
>>>>> I performed with the Profiler a few months back.   We have most of what
>>>> we
>>>>> need to do this with a couple caveats.
>>>>> 
>>>>> 1. An enrichment would be added to enrich the message with the correct
>>>>> internal hostname 'powned.svr.bank.com'.
>>>>> 
>>>>> 2. With the Profiler, I can capture some idea of what "normal" is for
>> the
>>>>> number of inbound flows across 15 minute intervals.
>>>>> 3. With Threat Triage, I can create rules that alert when a value
>> exceeds
>>>>> what the Profiler defines as normal.
>>>>> 
>>>>> 
>>>>> *What's Missing*
>>>>> 
>>>>> Its nice to know that we are almost all the way there with this
>> example.
>>>>> Unfortunately, there are two gaps that fall out of this.
>>>>> 
>>>>> 1. *Threat Triage Transparency*
>>>>> 
>>>>> There is little transparency into the Threat Triage process itself.
>> When
>>>>> Threat Triage runs, all I get is a score.  I don't know how that score
>>>> was
>>>>> arrived at, which rules were triggered, and the specific values that
>>>> caused
>>>>> a rule to trigger.
>>>>> 
>>>>> More specifically, there is no way to generate a message that looks
>> like
>>>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
>>>>> threshold of '202'".
>>>>> 
>>>>> 
>>>>> 2. *Triage Calculated Values from the Profiler*
>>>>> 
>>>>> Also, the value being interrogated here, the number of inbound flows,
>> is
>>>>> not a static value contained within any single telemetry message.  This
>>>>> value is calculated across multiple messages by the Profiler.  The
>>>> current
>>>>> Threat Triage process cannot be used to interrogate values calculated
>> by
>>>>> the Profiler.
>>>>> 
>>>>> 
>>>>> To try and keep this email concise and digestible, I am going to send a
>>>>> follow-on discussing proposed solutions for each of these separately.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Nick Allen <n...@nickallen.org>
>>>> 
>> 
>>

Re: [Discuss] Improve Alerting

Reply via email to