Re: [Discuss] Improve Alerting

Nick Allen Mon, 06 Feb 2017 08:46:04 -0800

To close out this discussion, I created another JIRA to take care of
the "*Triage
Calculated Values from the Profiler" *problem.  Feel free to let me know if
anything else was missed.


[1] Triage Metrics Produced by the Profiler
https://issues.apache.org/jira/browse/METRON-701



On Thu, Feb 2, 2017 at 10:15 AM, Nick Allen <n...@nickallen.org> wrote:

> I created 3 separate JIRAs to track the "Threat Triage Transparency"
> portion of the work falling out of this discussion thread.  The first would
> create a mechanism to do string interpolation.  The second would enhance
> threat triage to use the string interpolation.  The third would enhance the
> output of threat triage.
>
> [1] Create String Formatting Function for Stellar
> https://issues.apache.org/jira/browse/METRON-687
>
> [2] Allow Threat Triage Comment Field to Contain Stellar Expressions
> https://issues.apache.org/jira/browse/METRON-688
>
> [3] Record of Rule Set that Fired During Threat Triage
> https://issues.apache.org/jira/browse/METRON-686
>
> Please let me know if anyone's concerns were not captured.  I will create
> additional JIRAs for the other portion of the effort (*Triage Calculated
> Values from the Profiler)* once I've given everyone a little more time to
> voice an opinion.
> 
>
> On Thu, Feb 2, 2017 at 9:46 AM, Nick Allen <n...@nickallen.org> wrote:
>
>> Oh, I see.  Yes, very useful.
>>
>>
>> On Thu, Feb 2, 2017 at 9:39 AM, Simon Elliston Ball <
>> si...@simonellistonball.com> wrote:
>>
>>> That’s a part of it, certainly (and fixes another of my bug bears, so
>>> thank you!)
>>>
>>> In addition to the aggregation being stellar, I want score to be a
>>> stellar statement, I’ve put in a separate ticket for that.
>>> https://issues.apache.org/jira/browse/METRON-685 <
>>> https://issues.apache.org/jira/browse/METRON-685>
>>>
>>> Simon
>>>
>>> > On 2 Feb 2017, at 14:31, Nick Allen <n...@nickallen.org> wrote:
>>> >
>>> >> I would much rather be able to say something like score = some stellar
>>> >> statement that returns a float...
>>> >
>>> >
>>> > Completely agree.  FYI - We added METRON-683 yesterday that I believe
>>> > supports what you are saying.  Feel free to add commentary.
>>> >
>>> > https://issues.apache.org/jira/browse/METRON-683
>>> >
>>> > On Thu, Feb 2, 2017 at 9:02 AM, Simon Elliston Ball <
>>> > si...@simonellistonball.com> wrote:
>>> >
>>> >> I completely agree with Nick’s transparency comments, and like the
>>> design
>>> >> of the configuration, especially provision for messaging around the
>>> nature
>>> >> of the rule fired.
>>> >>
>>> >> I would just like to add a small point on the capabilities here. If
>>> the
>>> >> message could have embedded values through some sort of template for a
>>> >> stellar statement, it would make for a better more dynamic alert
>>> reason.
>>> >>
>>> >> I would also like to see the score field capable of outputting the
>>> value
>>> >> of a stellar statement. At the moment the idea of a static score being
>>> >> passed on means that if I have a probabilistic result I want to
>>> combine
>>> >> with other triage sources, I have to do a lot of bucketing into fixed
>>> >> values. I would much rather be able to say something like score = some
>>> >> stellar statement that returns a float, ‘alertness' = threshold of
>>> this.
>>> >> That way I can combine multiple triage rules to trigger an overall
>>> alert,
>>> >> making the aggregators more meaningful.
>>> >>
>>> >> Simon
>>> >>
>>> >>
>>> >>> On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
>>> >>>
>>> >>> For profiler alerts it will be helpful during analysis to see the
>>> alerts
>>> >> that caused the anomaly.  The meta alert is useful for incidents
>>> involving
>>> >> correlation of multiple events.
>>> >>>
>>> >>> Also you will need to filter out known hosts that trigger anomalies.
>>> >> For example vulnerability scanning software.
>>> >>>
>>> >>> One final thing to consider is anomalies happen every day without a
>>> >> security incident.  Depending on the network the profiler alerts
>>> could get
>>> >> very noisy so it might be better to correlate profiler alerts with
>>> other
>>> >> alerts.
>>> >>>
>>> >>> Thanks
>>> >>> Carolyn
>>> >>>
>>> >>>
>>> >>>
>>> >>> Sent from my Verizon, Samsung Galaxy smartphone
>>> >>>
>>> >>>
>>> >>> -------- Original message --------
>>> >>> From: Casey Stella <ceste...@gmail.com>
>>> >>> Date: 2/1/17 2:28 PM (GMT-05:00)
>>> >>> To: dev@metron.incubator.apache.org
>>> >>> Subject: Re: [Discuss] Improve Alerting
>>> >>>
>>> >>> I like the direction.  One thing that we may want is for comment to
>>> just
>>> >> be
>>> >>> a stellar expression and construct a function to essentially do
>>> >>> String.format().  So, that'd become:
>>> >>> "triageConfig" : {
>>> >>> "riskLevelRules" : [
>>> >>>   {
>>> >>>     "name" : "Abnormal Value",
>>> >>>     "comment" : "FORMAT('For %s; the value %s exceeds threshold of
>>> %d',
>>> >>> hostname, value, value_threshold)"
>>> >>>     "rule" : "value > value_threshold",
>>> >>>     "score" : 10
>>> >>>   }
>>> >>> ],
>>> >>> "aggregator" : "MAX"
>>> >>> }
>>> >>>
>>> >>> The reason:
>>> >>>
>>> >>>  - It's integrated and stellar is our default scripting layer
>>> >>>  - It supports doing some computation in the message
>>> >>>
>>> >>>
>>> >>> On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <n...@nickallen.org>
>>> wrote:
>>> >>>
>>> >>>> Like I said, here is a proposed solution to one of the gaps I
>>> >> identified in
>>> >>>> the previous email.
>>> >>>>
>>> >>>> *Problem*
>>> >>>>
>>> >>>> There is little transparency into the Threat Triage process itself.
>>> >> When
>>> >>>> Threat Triage runs, all I get is a score.  I don't know how that
>>> score
>>> >> was
>>> >>>> arrived at, which rules were triggered, and the specific values that
>>> >> caused
>>> >>>> a rule to trigger.
>>> >>>>
>>> >>>> More specifically, there is no way to generate a message that looks
>>> like
>>> >>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
>>> the
>>> >>>> threshold of '202'".  This makes it difficult for an analyst to
>>> action
>>> >> the
>>> >>>> alert.
>>> >>>>
>>> >>>> *Proposed Solution*
>>> >>>>
>>> >>>> To improve the transparency of the Threat Triage process, I am
>>> proposing
>>> >>>> these enhancements.
>>> >>>>
>>> >>>> 1. Threat Triage should attach to each message all of the rules that
>>> >> fired
>>> >>>> in addition to the total calculated threat triage score.
>>> >>>>
>>> >>>> 2. Threat Triage should allow a custom message to be generated for
>>> each
>>> >>>> rule.  The custom message would allow for some form of string
>>> >> interpolation
>>> >>>> so that I can add specific values from each message to the generated
>>> >>>> alert.  We could allow this in one or both of the new fields that
>>> Casey
>>> >>>> just added, name and comment.
>>> >>>>
>>> >>>>
>>> >>>> *Example*
>>> >>>>
>>> >>>> 1. In this example, we have a telemetry message with a field called
>>> >> 'value'
>>> >>>> that we need to monitor.  In Enrichment, I calculate some sort of
>>> value
>>> >>>> threshold, over which an alert should be generated.
>>> >>>>
>>> >>>>
>>> >>>> 2. In Threat Triage, I use the calculated value threshold to alert
>>> on
>>> >> any
>>> >>>> message that has a value exceeding this threshold.
>>> >>>>
>>> >>>> 3. I can embed values from the message, like the hostname, value,
>>> and
>>> >> value
>>> >>>> threshold, into the alert produced by Threat Triage.  Notice that I
>>> am
>>> >>>> using ${this} for string interpolation, but it could be any syntax
>>> that
>>> >> we
>>> >>>> choose.
>>> >>>>
>>> >>>>
>>> >>>> "triageConfig" : {
>>> >>>> "riskLevelRules" : [
>>> >>>>   {
>>> >>>>     "name" : "Abnormal Value",
>>> >>>>     "comment" : "For ${hostname}; the value ${value} exceeds
>>> threshold
>>> >> of
>>> >>>> ${value_threshold}",
>>> >>>>     "rule" : "value > value_threshold",
>>> >>>>     "score" : 10
>>> >>>>   }
>>> >>>> ],
>>> >>>> "aggregator" : "MAX"
>>> >>>> }
>>> >>>>
>>> >>>>
>>> >>>> 4. The Threat Triage process today would add only the total
>>> calculated
>>> >>>> score.
>>> >>>>
>>> >>>> "threat.triage.level": 10.0
>>> >>>>
>>> >>>>
>>> >>>> With this proposal, Threat Triage would add the following to the
>>> >> message.
>>> >>>>
>>> >>>> Notice how each of the ${variables} have been replaced with the
>>> actual
>>> >>>> values extracted from the message.  This allows for more contextual
>>> >>>> information to action the alert.
>>> >>>>
>>> >>>> "threat.triage": {
>>> >>>>   "score": 10.0,
>>> >>>>   "rules": [
>>> >>>>     {
>>> >>>>       "name": "Abnormal Value",
>>> >>>>       "comment" : "For 10.0.0.1; the value 101 exceeds threshold of
>>> >> 42",
>>> >>>>       "score" : 10
>>> >>>>     }
>>> >>>>   ]
>>> >>>> }
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> What do you think?  Any alternative ideas?
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <n...@nickallen.org>
>>> wrote:
>>> >>>>
>>> >>>>> I'd like to explore the functionality that we have in Metron using
>>> a
>>> >>>>> motivating example.  I think this will help highlight some gaps
>>> where
>>> >> we
>>> >>>>> can enhance Metron.
>>> >>>>>
>>> >>>>> The motivating example is that I would like to create an alert if
>>> the
>>> >>>>> number of inbound flows to any host over a 15 minute interval is
>>> >>>> abnormal.
>>> >>>>> I would like the alert to contain the specific information below to
>>> >>>>> streamline the triage process.
>>> >>>>>
>>> >>>>> Rule: Abnormal number of inbound flows
>>> >>>>> Bin: 15 mins
>>> >>>>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
>>> >> exceeding
>>> >>>>> the threshold of '202'
>>> >>>>>
>>> >>>>>
>>> >>>>> *What Works*
>>> >>>>>
>>> >>>>> In some ways, this example is similar to the "Outlier Detection"
>>> demo
>>> >>>> that
>>> >>>>> I performed with the Profiler a few months back.   We have most of
>>> what
>>> >>>> we
>>> >>>>> need to do this with a couple caveats.
>>> >>>>>
>>> >>>>> 1. An enrichment would be added to enrich the message with the
>>> correct
>>> >>>>> internal hostname 'powned.svr.bank.com'.
>>> >>>>>
>>> >>>>> 2. With the Profiler, I can capture some idea of what "normal" is
>>> for
>>> >> the
>>> >>>>> number of inbound flows across 15 minute intervals.
>>> >>>>> 3. With Threat Triage, I can create rules that alert when a value
>>> >> exceeds
>>> >>>>> what the Profiler defines as normal.
>>> >>>>>
>>> >>>>>
>>> >>>>> *What's Missing*
>>> >>>>>
>>> >>>>> Its nice to know that we are almost all the way there with this
>>> >> example.
>>> >>>>> Unfortunately, there are two gaps that fall out of this.
>>> >>>>>
>>> >>>>> 1. *Threat Triage Transparency*
>>> >>>>>
>>> >>>>> There is little transparency into the Threat Triage process itself.
>>> >> When
>>> >>>>> Threat Triage runs, all I get is a score.  I don't know how that
>>> score
>>> >>>> was
>>> >>>>> arrived at, which rules were triggered, and the specific values
>>> that
>>> >>>> caused
>>> >>>>> a rule to trigger.
>>> >>>>>
>>> >>>>> More specifically, there is no way to generate a message that looks
>>> >> like
>>> >>>>> "The host 'powned.svr.bank.com' has '230' inbound flows,
>>> exceeding the
>>> >>>>> threshold of '202'".
>>> >>>>>
>>> >>>>>
>>> >>>>> 2. *Triage Calculated Values from the Profiler*
>>> >>>>>
>>> >>>>> Also, the value being interrogated here, the number of inbound
>>> flows,
>>> >> is
>>> >>>>> not a static value contained within any single telemetry message.
>>> This
>>> >>>>> value is calculated across multiple messages by the Profiler.  The
>>> >>>> current
>>> >>>>> Threat Triage process cannot be used to interrogate values
>>> calculated
>>> >> by
>>> >>>>> the Profiler.
>>> >>>>>
>>> >>>>>
>>> >>>>> To try and keep this email concise and digestible, I am going to
>>> send a
>>> >>>>> follow-on discussing proposed solutions for each of these
>>> separately.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Nick Allen <n...@nickallen.org>
>>> >>>>
>>> >>
>>> >>
>>>
>>>
>>
>

Re: [Discuss] Improve Alerting

Reply via email to