Re: [Discuss] Improve Alerting

Simon Elliston Ball Thu, 02 Feb 2017 06:04:12 -0800

I completely agree with Nick’s transparency comments, and like the design of 
the configuration, especially provision for messaging around the nature of the 
rule fired.


I would just like to add a small point on the capabilities here. If the message 
could have embedded values through some sort of template for a stellar 
statement, it would make for a better more dynamic alert reason. 

I would also like to see the score field capable of outputting the value of a 
stellar statement. At the moment the idea of a static score being passed on 
means that if I have a probabilistic result I want to combine with other triage 
sources, I have to do a lot of bucketing into fixed values. I would much rather 
be able to say something like score = some stellar statement that returns a 
float, ‘alertness' = threshold of this. That way I can combine multiple triage 
rules to trigger an overall alert, making the aggregators more meaningful.

Simon


> On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
> 
> For profiler alerts it will be helpful during analysis to see the alerts that 
> caused the anomaly.  The meta alert is useful for incidents involving 
> correlation of multiple events.
> 
> Also you will need to filter out known hosts that trigger anomalies.  For 
> example vulnerability scanning software.
> 
> One final thing to consider is anomalies happen every day without a security 
> incident.  Depending on the network the profiler alerts could get very noisy 
> so it might be better to correlate profiler alerts with other alerts.
> 
> Thanks
> Carolyn
> 
> 
> 
> Sent from my Verizon, Samsung Galaxy smartphone
> 
> 
> -------- Original message --------
> From: Casey Stella <ceste...@gmail.com>
> Date: 2/1/17 2:28 PM (GMT-05:00)
> To: dev@metron.incubator.apache.org
> Subject: Re: [Discuss] Improve Alerting
> 
> I like the direction.  One thing that we may want is for comment to just be
> a stellar expression and construct a function to essentially do
> String.format().  So, that'd become:
> "triageConfig" : {
>  "riskLevelRules" : [
>    {
>      "name" : "Abnormal Value",
>      "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
> hostname, value, value_threshold)"
>      "rule" : "value > value_threshold",
>      "score" : 10
>    }
>  ],
>  "aggregator" : "MAX"
> }
> 
> The reason:
> 
>   - It's integrated and stellar is our default scripting layer
>   - It supports doing some computation in the message
> 
> 
> On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <n...@nickallen.org> wrote:
> 
>> Like I said, here is a proposed solution to one of the gaps I identified in
>> the previous email.
>> 
>> *Problem*
>> 
>> There is little transparency into the Threat Triage process itself.  When
>> Threat Triage runs, all I get is a score.  I don't know how that score was
>> arrived at, which rules were triggered, and the specific values that caused
>> a rule to trigger.
>> 
>> More specifically, there is no way to generate a message that looks like
>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
>> threshold of '202'".  This makes it difficult for an analyst to action the
>> alert.
>> 
>> *Proposed Solution*
>> 
>> To improve the transparency of the Threat Triage process, I am proposing
>> these enhancements.
>> 
>> 1. Threat Triage should attach to each message all of the rules that fired
>> in addition to the total calculated threat triage score.
>> 
>> 2. Threat Triage should allow a custom message to be generated for each
>> rule.  The custom message would allow for some form of string interpolation
>> so that I can add specific values from each message to the generated
>> alert.  We could allow this in one or both of the new fields that Casey
>> just added, name and comment.
>> 
>> 
>> *Example*
>> 
>> 1. In this example, we have a telemetry message with a field called 'value'
>> that we need to monitor.  In Enrichment, I calculate some sort of value
>> threshold, over which an alert should be generated.
>> 
>> 
>> 2. In Threat Triage, I use the calculated value threshold to alert on any
>> message that has a value exceeding this threshold.
>> 
>> 3. I can embed values from the message, like the hostname, value, and value
>> threshold, into the alert produced by Threat Triage.  Notice that I am
>> using ${this} for string interpolation, but it could be any syntax that we
>> choose.
>> 
>> 
>> "triageConfig" : {
>>  "riskLevelRules" : [
>>    {
>>      "name" : "Abnormal Value",
>>      "comment" : "For ${hostname}; the value ${value} exceeds threshold of
>> ${value_threshold}",
>>      "rule" : "value > value_threshold",
>>      "score" : 10
>>    }
>>  ],
>>  "aggregator" : "MAX"
>> }
>> 
>> 
>> 4. The Threat Triage process today would add only the total calculated
>> score.
>> 
>> "threat.triage.level": 10.0
>> 
>> 
>> With this proposal, Threat Triage would add the following to the message.
>> 
>> Notice how each of the ${variables} have been replaced with the actual
>> values extracted from the message.  This allows for more contextual
>> information to action the alert.
>> 
>> "threat.triage": {
>>    "score": 10.0,
>>    "rules": [
>>      {
>>        "name": "Abnormal Value",
>>        "comment" : "For 10.0.0.1; the value 101 exceeds threshold of 42",
>>        "score" : 10
>>      }
>>    ]
>> }
>> 
>> 
>> 
>> What do you think?  Any alternative ideas?
>> 
>> 
>> 
>> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <n...@nickallen.org> wrote:
>> 
>>> I'd like to explore the functionality that we have in Metron using a
>>> motivating example.  I think this will help highlight some gaps where we
>>> can enhance Metron.
>>> 
>>> The motivating example is that I would like to create an alert if the
>>> number of inbound flows to any host over a 15 minute interval is
>> abnormal.
>>> I would like the alert to contain the specific information below to
>>> streamline the triage process.
>>> 
>>> Rule: Abnormal number of inbound flows
>>> Bin: 15 mins
>>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
>>> the threshold of '202'
>>> 
>>> 
>>> *What Works*
>>> 
>>> In some ways, this example is similar to the "Outlier Detection" demo
>> that
>>> I performed with the Profiler a few months back.   We have most of what
>> we
>>> need to do this with a couple caveats.
>>> 
>>> 1. An enrichment would be added to enrich the message with the correct
>>> internal hostname 'powned.svr.bank.com'.
>>> 
>>> 2. With the Profiler, I can capture some idea of what "normal" is for the
>>> number of inbound flows across 15 minute intervals.
>>> 3. With Threat Triage, I can create rules that alert when a value exceeds
>>> what the Profiler defines as normal.
>>> 
>>> 
>>> *What's Missing*
>>> 
>>> Its nice to know that we are almost all the way there with this example.
>>> Unfortunately, there are two gaps that fall out of this.
>>> 
>>> 1. *Threat Triage Transparency*
>>> 
>>> There is little transparency into the Threat Triage process itself.  When
>>> Threat Triage runs, all I get is a score.  I don't know how that score
>> was
>>> arrived at, which rules were triggered, and the specific values that
>> caused
>>> a rule to trigger.
>>> 
>>> More specifically, there is no way to generate a message that looks like
>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
>>> threshold of '202'".
>>> 
>>> 
>>> 2. *Triage Calculated Values from the Profiler*
>>> 
>>> Also, the value being interrogated here, the number of inbound flows, is
>>> not a static value contained within any single telemetry message.  This
>>> value is calculated across multiple messages by the Profiler.  The
>> current
>>> Threat Triage process cannot be used to interrogate values calculated by
>>> the Profiler.
>>> 
>>> 
>>> To try and keep this email concise and digestible, I am going to send a
>>> follow-on discussing proposed solutions for each of these separately.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> --
>> Nick Allen <n...@nickallen.org>
>>

Re: [Discuss] Improve Alerting

Reply via email to