There are two problems.

Your regexp is missing a capture group.
You are using METRIC_relabel_configs.

METRIC_relabel_configs happen after the scrape, after the metadata has
already been set. You want to use "relabel_configs" to be able to read the
discovery metadata.

Note the default value for this is "(.*)". Simply omit these fields from
your config and the defaults will be used.

relabel_configs:
- source_labels: [__meta_netbox_tags]
  target_label: tags

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config

As for your alert, it won't work because your string matching is literal.
Since your tags are comma lists, you will have to use a regexp.

expr: 100 - (avg by(instance)
(irate(node_cpu_seconds_total{mode="idle",tags!~".*highcpuexempt.*"}[5m]))
* 100) > 85

Unfortunately, the netbox tag list doesn't contain leading and trailing
commas in the list, so you can't use the match of
tags!~".*,highcpuexempt,.*" to get more precise matching.

Lastly, I would not use this alert anyway. It's a type of "Cause alert",
rather than a "Symptom alert". As you have noticed, you have to make
exceptions. This style of "My CPU is too high" is toil prone. They generate
false positives leading to alert fatigue.

See: https://sre.google/sre-book/practical-alerting/


On Sun, Jul 3, 2022 at 3:14 AM Michael Kogelman <
michael.kogel...@azuleng.com> wrote:

> Hey all,
>
> First time long time here -- love Prom.
>
> I'm a bit stumped and was hoping maybe someone could tell me where I'm not
> connecting.
>
> I'm currently using a service discovery plugin to pull inventory from our
> source of truth, netbox.
>
> Netbox returns a __meta_netbox_tags in the form of tag,tag,tag
>
> I'm trying to relabel this field and save it into the timeseries so that
> it can be used to exempt an object from certain alerts utilizing the
> absence of the tag !=.
>
> Here's what the output looks like for service discovery:
>
>         "targets": [
>             "server"
>         ],
>         "labels": {
>             "__meta_netbox_status": "active",
>             "__meta_netbox_model": "VirtualMachine",
>             "__meta_netbox_name": "server",
>             "__meta_netbox_primary_ip": "x.x.x.x",
>             "__meta_netbox_primary_ip4": "x.x.x.x",
>             "__meta_netbox_platform": "Linux (64-bit)",
>             "__meta_netbox_platform_slug": "linux-64-bit",
>             "__meta_netbox_tags": "NetBox-synced,prod,exempt-highcpu,
> exempt-highmem",
>             "__meta_netbox_tag_slugs": "exempt-highmem, exempt-highcpu",
>             "__meta_netbox_cluster": "production",
>             "__meta_netbox_cluster_group": "XXX",
>             "__meta_netbox_cluster_type": "VMware",
>             "__meta_netbox_site": "XXX",
>             "__meta_netbox_site_slug": "xxx",
>             "__meta_netbox_role": "Server",
>             "__meta_netbox_role_slug": "server"
>
> Here's the latest iteration what I'm trying to do in prometheus.yml (when
> i try to use the separators to tell it there's a comma the yaml stops
> parsing):
>
>     metric_relabel_configs:
>       - source_labels: [__meta_netbox_tags]
>         regex: '.*'
>         replacement: '$1'
>         target_label: tags
>
> And here's what I'm trying to do in our rules.yml:
>
>   - alert: HostHighCpuLoad
>     expr: 100 - (avg by(instance)
> (irate(node_cpu_seconds_total{mode="idle",tags!="highcpuexempt"}[5m])) *
> 100) > 85
>     for: 10m
>     labels:
>       severity: warning
>     annotations:
>       identifier: '{{ $labels.instance }}'
>       summary: "Host high CPU load (instance {{ $labels.instance }})"
>       description: "CPU load is > 85% for 10 minutes\n  VALUE = {{ $value
> }}\n  LABELS: {{ $labels }}"
>
> Anyone assistance would be greatly appreciated!!
>
> Thanks,
> Mike
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/591bccb8-b04a-4150-a0ad-fe3edd7dd0f7n%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/591bccb8-b04a-4150-a0ad-fe3edd7dd0f7n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmoOzb1%2BFCjaX27pOJ8Uj%3DR%2Bna%3D7xu%2BV%3DhYRTizP3-7-5Q%40mail.gmail.com.

Reply via email to