[prometheus-users] Re: Alertmanager configuration: routes

Brian Candler Fri, 10 Sep 2021 07:37:43 -0700

Sorry, I don't know what you mean by  parameter -service=<alertname> or  
parameter 
-service="team: oncall"


I still don't know where you're getting "service" from here.  There are 
only labels, which are name/value pairs.  Each alert can carry one or more 
labels. e.g.

alertname="foo"    # label "alertname" with value "foo"
team="oncall"     # label "team" with value "oncall"
service="web"    # label "service" with value "web"

Something like "service=team=oncall" doesn't make any sense.

As for the dash, I think you may be confused by YAML syntax.  A dash starts 
a member of a list.  For example:

colours:
  - red
  - green
  - blue

which can also be written in YAML as

colours: [red, green, blue]

and is equivalent to the following JSON:

{"colors": ["red", "green", "blue"]}

If a list contains objects, then the start of each object is marked by a 
dash. e.g.

shirts:
  - colour: red
    size: small
  - colour: red
    size: medium
  - colour: green
    size: medium

(Note how important getting the alignment right is!)

This equates to JSON:

{"shirts": [
  {"color":"red", "size":"small"},
  {"color":"red", "size":"medium"},
  {"color":"green", "size":"medium"}
]}

(which doesn't care about alignment because there are explicit opening and 
closing braces and brackets)

Your alerting rules are similar to this: each rule starts with a dash, and 
then there are one or more settings below it.  Formatting your original 
example properly:

- receiver: 'database-pager'
  group_wait: 10s
  matchers:
    - service=~"mysql|cassandra"

This is a single rule within a list of rules (the first "-" marks this as a 
list element)
This rule has three settings: receiver, group_wait, and matchers.
matchers is itself a list.
There is one element in this list (also marked by a dash)
The content of that element is a string:
    service=~"mysql|cassandra"

Each element under "matchers" is a PromQL matching rule.  In this case, it 
matches the label "service" against that regular expression, which matches 
either the value "mysql" or "cassandra".

Therefore: that rule matches alerts which have a label "service" with value 
"mysql" or "cassandra".  If the rule matches, then the alert is delivered 
to "database-pager" with group_wait of 10 seconds.  If the alert doesn't 
match, then it moves onto the next rule.  And if none of the rules match, 
it falls back to the default receiver selected elsewhere in the config.

It looks like I'm not expressing myself clearly, so perhaps someone else 
might be able to explain it more clearly than me.

On Friday, 10 September 2021 at 12:53:47 UTC+1 74cm...@gmail.com wrote:

> Thanks for this information.
>
> If my understanding is correct, the alert name, specified in in file 
> *rules.yml* with parameter -alert: <alertname> must be used in file 
> alertmanager.yml with parameter -service=<alertname>. Optionally one 
> could add labels in rules.yml, e.g. team: oncall and then use this with 
> -service="team: 
> oncall".
>
> Is this correct?
>
> Brian Candler schrieb am Freitag, 3. September 2021 um 20:05:37 UTC+2:
>
>> And I forgot to say: given an alerting rule like
>>
>>   - alert: UpDown
>>     expr: up == 0
>>     for: 3m
>>
>> then the label alertname="UpDown" is also added automatically (similar to 
>> how "job" and "instance" labels are added automatically at scrape time).
>>
>> So at the end, you have a mixture of labels from the exporter, plus 
>> system-generated labels like "job" and "instance" and "alertname", plus any 
>> labels you've chosen to add yourself.  The "matchers" in alertmanager can 
>> match any of these.
>>
>> On Friday, 3 September 2021 at 18:47:22 UTC+1 Brian Candler wrote:
>>
>>> No, definitely not. There is no such thing as "service" in Prometheus - 
>>> Alertmanager config.
>>>
>>> But if you wish, you can have a *label* on your timeseries called 
>>> "service", or called "environment", or anything you like.  You can add 
>>> labels at scrape time:
>>>
>>>   - job_name: node
>>>     scrape_interval: 1m
>>>     static_configs:
>>>       - targets:
>>>           - bar:9100
>>>           - baz:9100
>>>         # these labels are added to every timeseries scraped from those 
>>> targets
>>> *        labels:*
>>> *          environment: prod*
>>>
>>> (note that "job" and "instance" labels are also added automatically as 
>>> part of the scrape; the remaining labels come from the exporter).
>>>
>>> Or you can add a label in your alerting rule:
>>>
>>> groups:
>>> - name: UpDown
>>>   rules:
>>>   - alert: UpDown
>>>     expr: up == 0
>>>     for: 3m
>>>     # these labels are added to every alert generated from this rule
>>> *    labels:*
>>> *      environment: prod*
>>>
>>> Note: it would be unusual to add label "environment: prod" in an 
>>> alerting rule, but adding a label like "severity: critical" or "team: 
>>> oncall" is more common - something which is specific to that alert, rather 
>>> than the server.
>>>
>>> In either of these cases, the alert which arrives at alertmanager will 
>>> have the given labels on it.  Hence you can match on it in alertmanager, to 
>>> decide how to route the alert.
>>>
>>> On Friday, 3 September 2021 at 09:26:35 UTC+1 74cm...@gmail.com wrote:
>>>
>>>> This means
>>>> alert in Prometheus - Rules config
>>>> is equal to
>>>> service in Prometheus - Alertmanager config
>>>> ?
>>>>
>>>> Brian Candler schrieb am Freitag, 3. September 2021 um 10:13:24 UTC+2:
>>>>
>>>>> Note that an "alertname" label is added automatically, so you could 
>>>>> match on alertname="TargetDown" if you want.  Doesn't scale very well, 
>>>>> but 
>>>>> with a small number of rules that approach will get you started.
>>>>>
>>>>> If you go to your prometheus web interface, at prometheus:9090, and 
>>>>> click on the "Alerts" tab at the top, then you can see firing alerts, 
>>>>> including all the labels on them.
>>>>>
>>>>> [image: img1.png]
>>>>>
>>>>> On Friday, 3 September 2021 at 09:09:56 UTC+1 Brian Candler wrote:
>>>>>
>>>>>> The only labels you can match on from that rule are "severity: 
>>>>>> warning", and the "job" and "instance" labels.
>>>>>>
>>>>>> > What must the alertmanager config be for this rule?
>>>>>>
>>>>>> You don't need *any* matching rules in alertmanager.  At simplest, 
>>>>>> you can just have
>>>>>>
>>>>>> route:
>>>>>>   receiver: default
>>>>>>
>>>>>> receivers:
>>>>>> - name: default
>>>>>>   email_configs:
>>>>>>   - to: us...@example.com
>>>>>>     send_resolved: true
>>>>>>   - to: us...@example.com
>>>>>>     send_resolved: true
>>>>>>
>>>>>> Any more than that, and it depends on your business requirements.  Do 
>>>>>> you want all alerts with severity "warning" to be treated differently?  
>>>>>> Use 
>>>>>> a routing rule (in the "routes" section under "route").  Do you want a 
>>>>>> certain subset of targets to be handled by a particular team? Then 
>>>>>> either 
>>>>>> add a label in the alerting rules themselves, or ensure that those 
>>>>>> targets 
>>>>>> already have a particular label in their scrape config, and match that 
>>>>>> label in the "routes" section.
>>>>>>
>>>>>> On Friday, 3 September 2021 at 08:20:49 UTC+1 74cm...@gmail.com 
>>>>>> wrote:
>>>>>>
>>>>>>> It's clear that the config
>>>>>>> - service=~"mysql|cassandra"
>>>>>>> does not match the rule.
>>>>>>> This was just an example.
>>>>>>>
>>>>>>> But this question is still open:
>>>>>>> What must the alertmanager config be for this rule?
>>>>>>> groups:
>>>>>>> - name: general.rules
>>>>>>>   rules:
>>>>>>>   - alert: TargetDown
>>>>>>>     annotations:
>>>>>>>       message: '{{ printf "%.4g" $value }}% of the {{ $labels.job 
>>>>>>> }}/{{ $labels.instance
>>>>>>>         }} instances are down.'
>>>>>>>     expr: 100 * (count(up == 0) BY (job, instance) / count(up) BY 
>>>>>>> (job,
>>>>>>>       instance)) > 10
>>>>>>>     for: 10m
>>>>>>>     labels:
>>>>>>>       severity: warning
>>>>>>>
>>>>>>> Brian Candler schrieb am Donnerstag, 2. September 2021 um 19:18:37 
>>>>>>> UTC+2:
>>>>>>>
>>>>>>>> Remove the match on service=~"mysql|cassandra" in your routing rule.
>>>>>>>>
>>>>>>>> I'm not saying with 100% certainty that your alert *doesn't* have a 
>>>>>>>> service=xxx label; it's possible that it was added via other means, 
>>>>>>>> such as 
>>>>>>>> external_labels or alert_relabel_configs.  If you go into the 
>>>>>>>> prometheus or 
>>>>>>>> alertmanager web interface, you can see active alerts and their 
>>>>>>>> labels, so 
>>>>>>>> you'll know what you have.
>>>>>>>>
>>>>>>>> There was a nice web-based interface for testing alerting rules 
>>>>>>>> here:
>>>>>>>> https://prometheus.io/webtools/alerting/routing-tree-editor/
>>>>>>>> but it doesn't seem to work properly any more.
>>>>>>>>
>>>>>>>> On Thursday, 2 September 2021 at 15:48:57 UTC+1 74cm...@gmail.com 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> What should be the configuration in alertmanager.yml to match to 
>>>>>>>>> the rule?
>>>>>>>>>
>>>>>>>>> Brian Candler schrieb am Donnerstag, 2. September 2021 um 15:22:55 
>>>>>>>>> UTC+2:
>>>>>>>>>
>>>>>>>>>> Correct, that expression will only give "job" and "instance" 
>>>>>>>>>> labels.
>>>>>>>>>>
>>>>>>>>>> I don't think your alertmanager rule will ever match on this 
>>>>>>>>>> alert.
>>>>>>>>>>
>>>>>>>>>> On Thursday, 2 September 2021 at 14:05:22 UTC+1 74cm...@gmail.com 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I have defined several rule files, e.g. this general.rules.yml:
>>>>>>>>>>> groups:
>>>>>>>>>>> - name: general.rules
>>>>>>>>>>>   rules:
>>>>>>>>>>>   - alert: TargetDown
>>>>>>>>>>>     annotations:
>>>>>>>>>>>       message: '{{ printf "%.4g" $value }}% of the {{ 
>>>>>>>>>>> $labels.job }}/{{ $labels.instance
>>>>>>>>>>>         }} instances are down.'
>>>>>>>>>>>     expr: 100 * (count(up == 0) BY (job, instance) / count(up) 
>>>>>>>>>>> BY (job,
>>>>>>>>>>>       instance)) > 10
>>>>>>>>>>>     for: 10m
>>>>>>>>>>>     labels:
>>>>>>>>>>>       severity: warning
>>>>>>>>>>>
>>>>>>>>>>> However, I don't see the correlation to service.
>>>>>>>>>>>
>>>>>>>>>>> Brian Candler schrieb am Donnerstag, 2. September 2021 um 
>>>>>>>>>>> 13:58:11 UTC+2:
>>>>>>>>>>>
>>>>>>>>>>>> It looks like "service" is a label that you have set in the 
>>>>>>>>>>>> prometheus alerting rule.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thursday, 2 September 2021 at 11:52:20 UTC+1 
>>>>>>>>>>>> 74cm...@gmail.com wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> can you please advise what is represented by a service in 
>>>>>>>>>>>>> alertmanager configuration, e.g.
>>>>>>>>>>>>> routes: 
>>>>>>>>>>>>> # All alerts with service=mysql or service=cassandra 
>>>>>>>>>>>>> # are dispatched to the database pager. - receiver: 
>>>>>>>>>>>>> 'database-pager' group_wait: 10s matchers: 
>>>>>>>>>>>>>  - service=~"mysql|cassandra"
>>>>>>>>>>>>>
>>>>>>>>>>>>> Where do I find the service in the rules or in Prometheus -> 
>>>>>>>>>>>>> Alerts?
>>>>>>>>>>>>>
>>>>>>>>>>>>> THX
>>>>>>>>>>>>>
>>>>>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2ee7b84b-82da-492c-868d-9ffba1b51769n%40googlegroups.com.

[prometheus-users] Re: Alertmanager configuration: routes

Reply via email to