[prometheus-users] Https issue when using prometheus federation

2022-07-18 Thread Shi Yan
I am trying to configure the prometheus federation, but the target is not 
up and the only error I can see is `read: connection reset by peer`

The scrape_config I've added is as follows:

- job_name: federate
  scrape_interval: 30s
  scrape_timeout: 15s
  scheme: https
  honor_labels: true
  metrics_path: "/federate"
  params:
match[]:
- '{job="jobname"}'
  static_configs:
  - targets:
- another_prom_server

But if I use `curl` command from this central prometheus server, it works 
and can return the metrics correctly. 

another_prom_server is the one deployed by kube-prometheus-stack helm 
chart. Not sure what is the issue here? Could anyone help advise, thanks! 

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a5a85deb-3f97-46f9-906f-d53045251a61n%40googlegroups.com.


Re: [prometheus-users] Best way to export status

2022-07-18 Thread Ben Kochie
With PromQL, the state label with a boolean value tends to be more
user-friendly.

For example, you can do things like `avg_over_time(foo{state="some
state"}[10m])` to detect problems, but maybe ignore one or two state
changes.

Similarly, you can be more specific about states with `changes_over_time()`.

On Tue, Jul 19, 2022 at 12:21 AM Roman Baeriswyl 
wrote:

> True, the amount should not be an issue at all.
> I wonder what is more convenient for the end user: having 10 states per
> sensor but with their state name as label, or just having one with the
> numerical value (which would allow > and < operations for alerts). I cannot
> decide between those two.
>
> Regarding the other projects: I've looked thru many projects. The first
> one you mention need to actually run on the dell server itself, which I do
> not want. The second contains only a few metrics and uses the Redfish api
> (basically JSON, but I think a bit limited, especially for older systems).
> There are also a lot of others, mostly based on prometheus/snmp_exporter
> but they also lack a lot of metrics. In my first try, I created my own
> snmp_exporter generator (https://github.com/Roemer/idrac-snmp-exporter)
> even with a fully working automatic pipeline. But I find the generator way
> too restrictive.
> I am now working on a node based exporter with express, prom-client and
> net-snmp and it seems to work fairly well. I can export what I want,
> exactly how I want. This is the v2 branch which only exposes one set of
> metrics.
>
>
> Am Mo., 18. Juli 2022 um 23:14 Uhr schrieb Ben Kochie :
>
>> Let's do the math:
>>
>> 100 servers * 10 states * 20 sensors = 20,000 metrics
>>
>> Worst case, say you have 5000 metrics each for 100 servers, that's still
>> only 500,000 series. This will probably take about 4GiB of memory. It
>> should still fit easily in an 8GiB memory instance.
>>
>> A single Prometheus can handle millions of metrics if you capacity plan
>> accordingly.
>>
>> Rather than SNMP, have you looked at
>> https://github.com/galexrt/dellhw_exporter? Or maybe
>> https://github.com/mrlhansen/idrac_exporter?
>>
>> On Mon, Jul 18, 2022 at 10:50 PM Roman Baeriswyl 
>> wrote:
>>
>>> Thanks for the answer. Well, it is not only fans, there are dozens of
>>> other status fields as well (i'm doing an idrac snmp exporter). And that
>>> for technically dozens of servers. Should I try to stick with the StateSet
>>> or should I switch to just expose the numerical represenation?
>>>
>>> sup...@gmail.com schrieb am Sonntag, 17. Juli 2022 um 10:50:43 UTC+2:
>>>
 For things that have state changes you care about, I usually recommend
 EnumAsStateSet.

 The good news is that Prometheus deals with compressing the boolean
 values very well. And since all fans have the same set of states, those
 values are deduplicated in the index.

 So while it looks like a lot in the metric output, it stores well in
 the TSDB.

 The question is, how many fans on how many servers are we talking about?

 On Sun, Jul 17, 2022 at 6:26 AM Roman Baeriswyl 
 wrote:

> Hey all
> I am working on a Dell iDRAC SNMP Exporter and I struggle with
> "Status" fields.
> I think there are three main possibilities:
>
> 1. EnumAsStateSet
> The downside here is that it can really clutter the output. For
> example the Dell Fans have 10 possible status, so each fan has 10 fields
> where only one is set to "1".
>
> 2. EnumAsInfo
> The downside here is that have not so nice time history and it is
> probably harder to create alerts.
>
> 3. Use the numeric value
> The downside here is that you need to do the enum lookup in the alert
> / dashboard.
>
> What do you think is in general the best way for such status?
>
> Thanks for your input.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/2b0defe0-0a8e-4ae5-be10-bc0efcadcd73n%40googlegroups.com
> 
> .
>
 --
>>> You received this message because you are subscribed to the Google
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to prometheus-users+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-users/6e4f9bd9-239a-4f24-9735-c5540e9b1411n%40googlegroups.com
>>> 

Re: [prometheus-users] Best way to export status

2022-07-18 Thread Roman Baeriswyl
True, the amount should not be an issue at all.
I wonder what is more convenient for the end user: having 10 states per
sensor but with their state name as label, or just having one with the
numerical value (which would allow > and < operations for alerts). I cannot
decide between those two.

Regarding the other projects: I've looked thru many projects. The first one
you mention need to actually run on the dell server itself, which I do not
want. The second contains only a few metrics and uses the Redfish api
(basically JSON, but I think a bit limited, especially for older systems).
There are also a lot of others, mostly based on prometheus/snmp_exporter
but they also lack a lot of metrics. In my first try, I created my own
snmp_exporter generator (https://github.com/Roemer/idrac-snmp-exporter)
even with a fully working automatic pipeline. But I find the generator way
too restrictive.
I am now working on a node based exporter with express, prom-client and
net-snmp and it seems to work fairly well. I can export what I want,
exactly how I want. This is the v2 branch which only exposes one set of
metrics.


Am Mo., 18. Juli 2022 um 23:14 Uhr schrieb Ben Kochie :

> Let's do the math:
>
> 100 servers * 10 states * 20 sensors = 20,000 metrics
>
> Worst case, say you have 5000 metrics each for 100 servers, that's still
> only 500,000 series. This will probably take about 4GiB of memory. It
> should still fit easily in an 8GiB memory instance.
>
> A single Prometheus can handle millions of metrics if you capacity plan
> accordingly.
>
> Rather than SNMP, have you looked at
> https://github.com/galexrt/dellhw_exporter? Or maybe
> https://github.com/mrlhansen/idrac_exporter?
>
> On Mon, Jul 18, 2022 at 10:50 PM Roman Baeriswyl 
> wrote:
>
>> Thanks for the answer. Well, it is not only fans, there are dozens of
>> other status fields as well (i'm doing an idrac snmp exporter). And that
>> for technically dozens of servers. Should I try to stick with the StateSet
>> or should I switch to just expose the numerical represenation?
>>
>> sup...@gmail.com schrieb am Sonntag, 17. Juli 2022 um 10:50:43 UTC+2:
>>
>>> For things that have state changes you care about, I usually recommend
>>> EnumAsStateSet.
>>>
>>> The good news is that Prometheus deals with compressing the boolean
>>> values very well. And since all fans have the same set of states, those
>>> values are deduplicated in the index.
>>>
>>> So while it looks like a lot in the metric output, it stores well in the
>>> TSDB.
>>>
>>> The question is, how many fans on how many servers are we talking about?
>>>
>>> On Sun, Jul 17, 2022 at 6:26 AM Roman Baeriswyl 
>>> wrote:
>>>
 Hey all
 I am working on a Dell iDRAC SNMP Exporter and I struggle with "Status"
 fields.
 I think there are three main possibilities:

 1. EnumAsStateSet
 The downside here is that it can really clutter the output. For example
 the Dell Fans have 10 possible status, so each fan has 10 fields where only
 one is set to "1".

 2. EnumAsInfo
 The downside here is that have not so nice time history and it is
 probably harder to create alerts.

 3. Use the numeric value
 The downside here is that you need to do the enum lookup in the alert /
 dashboard.

 What do you think is in general the best way for such status?

 Thanks for your input.

 --
 You received this message because you are subscribed to the Google
 Groups "Prometheus Users" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to prometheus-use...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/prometheus-users/2b0defe0-0a8e-4ae5-be10-bc0efcadcd73n%40googlegroups.com
 
 .

>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to prometheus-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/prometheus-users/6e4f9bd9-239a-4f24-9735-c5540e9b1411n%40googlegroups.com
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CA%2BKmifFt-azV1gesiK_qPxmtfa5toLG%3DC5y-TiQTSAND5rVt2g%40mail.gmail.com.


Re: [prometheus-users] Best way to export status

2022-07-18 Thread Ben Kochie
Let's do the math:

100 servers * 10 states * 20 sensors = 20,000 metrics

Worst case, say you have 5000 metrics each for 100 servers, that's still
only 500,000 series. This will probably take about 4GiB of memory. It
should still fit easily in an 8GiB memory instance.

A single Prometheus can handle millions of metrics if you capacity plan
accordingly.

Rather than SNMP, have you looked at
https://github.com/galexrt/dellhw_exporter? Or maybe
https://github.com/mrlhansen/idrac_exporter?

On Mon, Jul 18, 2022 at 10:50 PM Roman Baeriswyl 
wrote:

> Thanks for the answer. Well, it is not only fans, there are dozens of
> other status fields as well (i'm doing an idrac snmp exporter). And that
> for technically dozens of servers. Should I try to stick with the StateSet
> or should I switch to just expose the numerical represenation?
>
> sup...@gmail.com schrieb am Sonntag, 17. Juli 2022 um 10:50:43 UTC+2:
>
>> For things that have state changes you care about, I usually recommend
>> EnumAsStateSet.
>>
>> The good news is that Prometheus deals with compressing the boolean
>> values very well. And since all fans have the same set of states, those
>> values are deduplicated in the index.
>>
>> So while it looks like a lot in the metric output, it stores well in the
>> TSDB.
>>
>> The question is, how many fans on how many servers are we talking about?
>>
>> On Sun, Jul 17, 2022 at 6:26 AM Roman Baeriswyl 
>> wrote:
>>
>>> Hey all
>>> I am working on a Dell iDRAC SNMP Exporter and I struggle with "Status"
>>> fields.
>>> I think there are three main possibilities:
>>>
>>> 1. EnumAsStateSet
>>> The downside here is that it can really clutter the output. For example
>>> the Dell Fans have 10 possible status, so each fan has 10 fields where only
>>> one is set to "1".
>>>
>>> 2. EnumAsInfo
>>> The downside here is that have not so nice time history and it is
>>> probably harder to create alerts.
>>>
>>> 3. Use the numeric value
>>> The downside here is that you need to do the enum lookup in the alert /
>>> dashboard.
>>>
>>> What do you think is in general the best way for such status?
>>>
>>> Thanks for your input.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to prometheus-use...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-users/2b0defe0-0a8e-4ae5-be10-bc0efcadcd73n%40googlegroups.com
>>> 
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/6e4f9bd9-239a-4f24-9735-c5540e9b1411n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmqWadp-qtBSgCAr6C8L2HR3bS2pdbPWMbwu70SXvo4ahg%40mail.gmail.com.


Re: [prometheus-users] Use remote-write instead of federation

2022-07-18 Thread Ben Kochie
Yes, Thanos will eliminate the need for instance-4. At the same time it's
more efficient because it doesn't use remote write or federation. It can
query data from all your Prometheus instances.

On Mon, Jul 18, 2022 at 10:53 PM tejaswini vadlamudi 
wrote:

> @Ben: Thanks for the suggestion! I heard that remote-write consumes more
> system resources like CPU utilization when compared to the federation. I
> can test and cross-check it myself but I would like to hear feedback from
> the Prometheus experts.
> @Stuart: Ideally, it is possible to manage the complete stack with
> instance-1 but the current case is about deploying and monitoring multiple
> workloads/software owned by different vendors.
>
> /Teja
> On Monday, July 18, 2022 at 8:52:59 PM UTC+2 Stuart Clark wrote:
>
>> On 18/07/2022 18:00, tejaswini vadlamudi wrote:
>>
>> Hello Stuart,
>>
>> I have the 4 Prometheus instances in the same cluster.
>>
>>- Instance-1, monitoring k8s & cadvisor
>>- Instance-2, monitoring workload-1 in namespace-1
>>- Instance-3, monitoring workload-2 in namespace-2
>>- Instance-4 is the central one collecting metrics from all 3
>>instances (for global querying and alerting). not sure if the federation 
>> is
>>a good fit for this sort of deployment pattern.
>>
>> What's the reason for having all the different instances? Are these all
>> full instances of Prometheus (with local storage) or using agent mode?
>>
>> If you are just going to copy everything to the "central" instance on the
>> same cluster, why not just do without the extra three clusters and have
>> just the one instance that monitors everything?
>>
>> --
>> Stuart Clark
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/cd955a3f-fb4c-43f8-87e2-4eb006bc7df8n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmrOjfax9YvqQioCJUfr8BB_b-wDsu5mWvuEKPXg_vDayA%40mail.gmail.com.


Re: [prometheus-users] Use remote-write instead of federation

2022-07-18 Thread tejaswini vadlamudi
@Ben: Thanks for the suggestion! I heard that remote-write consumes more 
system resources like CPU utilization when compared to the federation. I 
can test and cross-check it myself but I would like to hear feedback from 
the Prometheus experts.
@Stuart: Ideally, it is possible to manage the complete stack with 
instance-1 but the current case is about deploying and monitoring multiple 
workloads/software owned by different vendors.

/Teja
On Monday, July 18, 2022 at 8:52:59 PM UTC+2 Stuart Clark wrote:

> On 18/07/2022 18:00, tejaswini vadlamudi wrote:
>
> Hello Stuart,  
>
> I have the 4 Prometheus instances in the same cluster.   
>
>- Instance-1, monitoring k8s & cadvisor 
>- Instance-2, monitoring workload-1 in namespace-1 
>- Instance-3, monitoring workload-2 in namespace-2 
>- Instance-4 is the central one collecting metrics from all 3 
>instances (for global querying and alerting). not sure if the federation 
> is 
>a good fit for this sort of deployment pattern. 
>
> What's the reason for having all the different instances? Are these all 
> full instances of Prometheus (with local storage) or using agent mode?
>
> If you are just going to copy everything to the "central" instance on the 
> same cluster, why not just do without the extra three clusters and have 
> just the one instance that monitors everything?
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/cd955a3f-fb4c-43f8-87e2-4eb006bc7df8n%40googlegroups.com.


Re: [prometheus-users] Extracting Group Data String for Alert Grouping

2022-07-18 Thread Ben Kochie
If you have your ifAlias well standardized you can use
metric_relabel_configs to extract data.

metric_relabel_configs:
- source_labels: [ifAlias]
  regex: "(.+) - (.+) - (.+)"
  replacement: "$1"
  target_label: port_description
- source_labels: [ifAlias]
  regex: "(.+) - (.+) - (.+)"
  replacement: "$2"
  target_label: port_location
- source_labels: [ifAlias]
  regex: "(.+) - (.+) - (.+)"
  replacement: "$3"
  target_label: cable_id

This will separate out your ifAlias into the component label parts.

On Mon, Jul 18, 2022 at 10:45 PM Brian Bowen  wrote:

> Hi all,
>
> We are attempting to set up alerting with Prometheus and Alertmanager
> using some SNMP data. The basic use case is that we would like to group by
> a substring of label data rather than an entire label. Let's say our
> interfaces have the ifAlias label in the following format:
> ifAlias=" - device 1 port 5 to device 2 port 7 -  ID>" and I want to group alerts only by "device 1 port 5 to device 2 port
> 7" (assuming this description is consistent across  both devices), leaving
> the rest of the description and cableID out.
>
> Is there a way to do this? We have not had success extracting this as a
> separate label through snmp_exporter. I thought potentially we could do
> some regex matching under the group_by rules with Alertmanager, but I
> haven't seen any documentation/examples showing how to do this either.
>
> Let me know if there are any files I should attach.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/31b5a66a-0aa5-432b-b527-764ac392e1d4n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmokKkLOJgkSrKu%3DbLWNQE6RSSe4s1jgCGx1THCLhC9pSA%40mail.gmail.com.


Re: [prometheus-users] Best way to export status

2022-07-18 Thread Roman Baeriswyl
Thanks for the answer. Well, it is not only fans, there are dozens of other 
status fields as well (i'm doing an idrac snmp exporter). And that for 
technically dozens of servers. Should I try to stick with the StateSet or 
should I switch to just expose the numerical represenation?

sup...@gmail.com schrieb am Sonntag, 17. Juli 2022 um 10:50:43 UTC+2:

> For things that have state changes you care about, I usually recommend 
> EnumAsStateSet.
>
> The good news is that Prometheus deals with compressing the boolean values 
> very well. And since all fans have the same set of states, those values are 
> deduplicated in the index.
>
> So while it looks like a lot in the metric output, it stores well in the 
> TSDB.
>
> The question is, how many fans on how many servers are we talking about?
>
> On Sun, Jul 17, 2022 at 6:26 AM Roman Baeriswyl  
> wrote:
>
>> Hey all
>> I am working on a Dell iDRAC SNMP Exporter and I struggle with "Status" 
>> fields.
>> I think there are three main possibilities:
>>
>> 1. EnumAsStateSet
>> The downside here is that it can really clutter the output. For example 
>> the Dell Fans have 10 possible status, so each fan has 10 fields where only 
>> one is set to "1".
>>
>> 2. EnumAsInfo
>> The downside here is that have not so nice time history and it is 
>> probably harder to create alerts.
>>
>> 3. Use the numeric value
>> The downside here is that you need to do the enum lookup in the alert / 
>> dashboard.
>>
>> What do you think is in general the best way for such status?
>>
>> Thanks for your input.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/2b0defe0-0a8e-4ae5-be10-bc0efcadcd73n%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6e4f9bd9-239a-4f24-9735-c5540e9b1411n%40googlegroups.com.


[prometheus-users] Extracting Group Data String for Alert Grouping

2022-07-18 Thread Brian Bowen
Hi all,

We are attempting to set up alerting with Prometheus and Alertmanager using 
some SNMP data. The basic use case is that we would like to group by a 
substring of label data rather than an entire label. Let's say our 
interfaces have the ifAlias label in the following format:
ifAlias=" - device 1 port 5 to device 2 port 7 - " and I want to group alerts only by "device 1 port 5 to device 2 port 
7" (assuming this description is consistent across  both devices), leaving 
the rest of the description and cableID out.

Is there a way to do this? We have not had success extracting this as a 
separate label through snmp_exporter. I thought potentially we could do 
some regex matching under the group_by rules with Alertmanager, but I 
haven't seen any documentation/examples showing how to do this either.

Let me know if there are any files I should attach.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/31b5a66a-0aa5-432b-b527-764ac392e1d4n%40googlegroups.com.


Re: [prometheus-users] Use remote-write instead of federation

2022-07-18 Thread Stuart Clark

On 18/07/2022 18:00, tejaswini vadlamudi wrote:

Hello Stuart,

I have the 4 Prometheus instances in the same cluster.

  * Instance-1, monitoring k8s & cadvisor
  * Instance-2, monitoring workload-1 in namespace-1
  * Instance-3, monitoring workload-2 in namespace-2
  * Instance-4 is the central one collecting metrics from all 3
instances (for global querying and alerting). not sure if the
federation is a good fit for this sort of deployment pattern.

What's the reason for having all the different instances? Are these all 
full instances of Prometheus (with local storage) or using agent mode?


If you are just going to copy everything to the "central" instance on 
the same cluster, why not just do without the extra three clusters and 
have just the one instance that monitors everything?


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d047d5f6-ad6b-a334-699d-8d7a4399e26a%40Jahingo.com.


Re: [prometheus-users] Use remote-write instead of federation

2022-07-18 Thread Ben Kochie
I would probably skip federation and remote write with that setup and use
Thanos to create a single pane view of all of them.

On Mon, Jul 18, 2022, 7:00 PM tejaswini vadlamudi 
wrote:

> Hello Stuart,
>
> I have the 4 Prometheus instances in the same cluster.
>
>- Instance-1, monitoring k8s & cadvisor
>- Instance-2, monitoring workload-1 in namespace-1
>- Instance-3, monitoring workload-2 in namespace-2
>- Instance-4 is the central one collecting metrics from all 3
>instances (for global querying and alerting). not sure if the federation is
>a good fit for this sort of deployment pattern.
>
>
> Thanks, Teja
>
>
> On Monday, July 18, 2022 at 6:49:45 PM UTC+2 Stuart Clark wrote:
>
>> On 18/07/2022 17:21, tejaswini vadlamudi wrote:
>> > Can someone point me to the advantages of using remote-write over
>> > federation?
>> > I understand that remote-write is more of a standard interface in the
>> > monitoring domain.
>> > Are there any handy performance measurements that were
>> observed/recorded?
>> >
>> They are really quite different.
>>
>> Federation is a way of pulling data from a remote Prometheus into
>> (generally) a local one. The puller gets to choose how often to pull
>> data and what data to fetch. If the puller can't fetch the data for any
>> reason (local/remote outage, network issues, etc.) there will be gaps.
>>
>> Remote write is a way of pushing data from a Prometheus server to
>> "something else", which could be another Prometheus or one of the many
>> things which implement the API (e.g. various databases, Thanos, custom
>> analytics tools, etc.). For these you get all the data (basically as
>> soon as it has been scraped) with the ability to do filtering via
>> relabling. If there is an outage/disconnect data will be queued for a
>> while (too long and things will get lost) so small issues can be handled
>> transparently.
>>
>> So you have a difference in what data you get - either all (filtered)
>> data or data on a schedule (so in effect a form of built-in
>> downsampling), and who controls that - either the data source Prometheus
>> or the destination.
>>
>> Which is "better" depends on what you are trying to achieve and the
>> constraints you might have (for example difficulties with accepting
>> network connections or data storage/transfer limits). Don't forget the
>> organisation differences too - for remote write adding/changing a
>> destination (or filter rules) needs changes to every data source
>> Prometheus where federation is purely controlled at the other end, which
>> might be a good or bad thing depending on team responsibilities/timings.
>>
>> --
>> Stuart Clark
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/a174584b-8f1d-4ab8-bcda-bfae9401af0en%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmorLg2%3DOGzKP8ydVBy715FC%3D4ZAZ-1DJ4U3EL2EQnGk5g%40mail.gmail.com.


Re: [prometheus-users] Use remote-write instead of federation

2022-07-18 Thread tejaswini vadlamudi
Hello Stuart, 

I have the 4 Prometheus instances in the same cluster.  

   - Instance-1, monitoring k8s & cadvisor
   - Instance-2, monitoring workload-1 in namespace-1
   - Instance-3, monitoring workload-2 in namespace-2
   - Instance-4 is the central one collecting metrics from all 3 instances 
   (for global querying and alerting). not sure if the federation is a good 
   fit for this sort of deployment pattern.


Thanks, Teja


On Monday, July 18, 2022 at 6:49:45 PM UTC+2 Stuart Clark wrote:

> On 18/07/2022 17:21, tejaswini vadlamudi wrote:
> > Can someone point me to the advantages of using remote-write over 
> > federation?
> > I understand that remote-write is more of a standard interface in the 
> > monitoring domain.
> > Are there any handy performance measurements that were observed/recorded?
> >
> They are really quite different.
>
> Federation is a way of pulling data from a remote Prometheus into 
> (generally) a local one. The puller gets to choose how often to pull 
> data and what data to fetch. If the puller can't fetch the data for any 
> reason (local/remote outage, network issues, etc.) there will be gaps.
>
> Remote write is a way of pushing data from a Prometheus server to 
> "something else", which could be another Prometheus or one of the many 
> things which implement the API (e.g. various databases, Thanos, custom 
> analytics tools, etc.). For these you get all the data (basically as 
> soon as it has been scraped) with the ability to do filtering via 
> relabling. If there is an outage/disconnect data will be queued for a 
> while (too long and things will get lost) so small issues can be handled 
> transparently.
>
> So you have a difference in what data you get - either all (filtered) 
> data or data on a schedule (so in effect a form of built-in 
> downsampling), and who controls that - either the data source Prometheus 
> or the destination.
>
> Which is "better" depends on what you are trying to achieve and the 
> constraints you might have (for example difficulties with accepting 
> network connections or data storage/transfer limits). Don't forget the 
> organisation differences too - for remote write adding/changing a 
> destination (or filter rules) needs changes to every data source 
> Prometheus where federation is purely controlled at the other end, which 
> might be a good or bad thing depending on team responsibilities/timings.
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a174584b-8f1d-4ab8-bcda-bfae9401af0en%40googlegroups.com.


Re: [prometheus-users] Use remote-write instead of federation

2022-07-18 Thread Stuart Clark

On 18/07/2022 17:21, tejaswini vadlamudi wrote:
Can someone point me to the advantages of using remote-write over 
federation?
I understand that remote-write is more of a standard interface in the 
monitoring domain.

Are there any handy performance measurements that were observed/recorded?


They are really quite different.

Federation is a way of pulling data from a remote Prometheus into 
(generally) a local one. The puller gets to choose how often to pull 
data and what data to fetch. If the puller can't fetch the data for any 
reason (local/remote outage, network issues, etc.) there will be gaps.


Remote write is a way of pushing data from a Prometheus server to 
"something else", which could be another Prometheus or one of the many 
things which implement the API (e.g. various databases, Thanos, custom 
analytics tools, etc.). For these you get all the data (basically as 
soon as it has been scraped) with the ability to do filtering via 
relabling. If there is an outage/disconnect data will be queued for a 
while (too long and things will get lost) so small issues can be handled 
transparently.


So you have a difference in what data you get - either all (filtered) 
data or data on a schedule (so in effect a form of built-in 
downsampling), and who controls that - either the data source Prometheus 
or the destination.


Which is "better" depends on what you are trying to achieve and the 
constraints you might have (for example difficulties with accepting 
network connections or data storage/transfer limits). Don't forget the 
organisation differences too - for remote write adding/changing a 
destination (or filter rules) needs changes to every data source 
Prometheus where federation is purely controlled at the other end, which 
might be a good or bad thing depending on team responsibilities/timings.


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0a925bd8-4cbb-99ff-c372-311488751943%40Jahingo.com.


[prometheus-users] Use remote-write instead of federation

2022-07-18 Thread tejaswini vadlamudi
Can someone point me to the advantages of using remote-write over 
federation?
I understand that remote-write is more of a standard interface in the 
monitoring domain.
Are there any handy performance measurements that were observed/recorded?

Thanks, Teja

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/31d14713-fe98-416f-9984-c185ce969363n%40googlegroups.com.


Re: [prometheus-users] Need help for alert test using Promtool

2022-07-18 Thread Brian Candler
Of course.  What you are doing is *testing your alerting rules*.  You give 
it a specific set of inputs and an alerting rule, and you tell it whether 
you expect an alert to be generated (or not).

Clearly, if you change the inputs, you may get different alerts generated 
(or not).

Therefore, you should make different test cases for these different 
scenarios, to test your alerts under different conditions. Like:

1. If the two input metrics are the same, I expect no alert to be raised.
2. If the two input metrics are different, I expect an alert to be raised 
with particular labels / annotations.
3. etc

If the alerts don't behave how you expect - e.g. you don't get an alert 
when you think there should be one, or you get a different set of labels or 
annotations than you expect - then this is a good way for you to learn how 
alerting works in prometheus.

> But if the count is the same, I still get an error.

The system is telling you what *actually happens* under the input 
conditions you have given. In other words: the error tells you exactly how 
the actual alert generated (or not generated) differs from the alert(s) you 
told it to expect. 

It's up to you to decide: is your alerting rule working correctly, but your 
declared expectations were wrong?  Or the alerting rule itself needs to be 
changed, so it works in the way you expect?

On Monday, 18 July 2022 at 11:03:18 UTC+1 shivana...@gmail.com wrote:

> Yes,
> Now in the  above case both Replicaset and Deployment replica count are 
> the same.
> If the count is different its like below
>  input_series:
>- series: kube_replicaset_spec_replicas{job="prometheus", 
> namespace="auth-proxy"}
>  values: '5+0x9 *5*+0x20'
>- series: 
> kube_deployment_status_replicas_available{job="prometheus", 
> namespace="auth-proxy"}
>  values: '5+0x9 *4*+0x20'
> It will generate an alert.
>
> But if the count is the same, I still get an error.
>
> On Mon, Jul 18, 2022 at 3:22 PM Brian Candler  wrote:
>
>> It's saying there's no alert firing at time 20m ("got" is empty).
>>
>> That seems correct to me.  You've now made the two input_series have 
>> identical values:
>>
>>  input_series:
>>- series: kube_replicaset_spec_replicas{job="prometheus", 
>> namespace="auth-proxy"}
>> * values: '5+0x9 5+0x20'*
>>- series: 
>> kube_deployment_status_replicas_available{job="prometheus", 
>> namespace="auth-proxy"}
>> * values: '5+0x9 5+0x20'*
>>
>> Hence you wouldn't expect the alert to fire, would you?
>>
>> On Monday, 18 July 2022 at 10:34:22 UTC+1 shivana...@gmail.com wrote:
>>
>>> Hi,
>>>
>>> I did some changes as you suggested. After that I am getting below error.
>>>
>>> Unit Testing:  replicas_mismatch_test.yaml
>>>
>>>   FAILED:
>>>
>>> alertname: KubernetesDeploymentReplicasMismatch-authproxy, time: 
>>> 20m, 
>>>
>>> exp:[
>>>
>>> 0:
>>>
>>>   
>>> Labels:{alertname="KubernetesDeploymentReplicasMismatch-authproxy", 
>>> job="prometheus", namespace="auth-proxy", severity="critical"}
>>>
>>>   Annotations:{description="Deployment Replicas mismatch\n 
>>> VALUE = {{ $value }}\n LABELS = {{ $labels }}", summary="Kubernetes 
>>> Deployment replicas mismatch (instance {{$labels.instance }})"}
>>>
>>> ], 
>>>
>>> got:[]
>>>
>>>
>>> Alert:
>>>
>>>
>>> groups:
>>> - name: replicas-mismatch
>>> rules:
>>> - alert: KubernetesDeploymentReplicasMismatch-authproxy
>>> expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} != 
>>> kube_deployment_status_replicas_available{namespace="auth-proxy"}
>>> for: 10m
>>> labels:
>>> severity: critical
>>> annotations:
>>> summary: Kubernetes Deployment replicas mismatch (instance {{ 
>>> $labels.instance }})
>>> description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n 
>>> LABELS = {{ $labels }}"
>>>
>>> TestCase:
>>> rule_files:
>>> - /testdata/deployment_replicas_mismatch.yaml
>>> evaluation_interval: 1m
>>> tests:
>>> - interval: 1m
>>> # Series Data
>>> input_series:
>>> - series: kube_replicaset_spec_replicas{job="prometheus", 
>>> namespace="auth-proxy"}
>>> values: '5+0x9 5+0x20'
>>> - series: kube_deployment_status_replicas_available{job="prometheus", 
>>> namespace="auth-proxy"}
>>> values: '5+0x9 5+0x20'
>>> alert_rule_test:
>>> # Unit Test 1
>>> - eval_time: 9m
>>> alertname: KubernetesDeploymentReplicasMismatch-authproxy
>>> exp_alerts:
>>>
>>> - eval_time: 20m
>>> alertname: KubernetesDeploymentReplicasMismatch-authproxy
>>> exp_alerts:
>>> - exp_labels:
>>> namespace: auth-proxy
>>> job: prometheus
>>> severity: critical
>>> exp_annotations:
>>> summary: Kubernetes Deployment replicas mismatch (instance 
>>> {{$labels.instance }})
>>> description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n 
>>> LABELS = {{ $labels }}"
>>>
>>>
>>> On Mon, Jul 18, 2022 at 12:46 PM David Leadbeater  wrote:
>>>

Re: [prometheus-users] Need help for alert test using Promtool

2022-07-18 Thread Shivanand Shete
Yes,
Now in the  above case both Replicaset and Deployment replica count are the
same.
If the count is different its like below
 input_series:
   - series: kube_replicaset_spec_replicas{job="prometheus",
namespace="auth-proxy"}
 values: '5+0x9 *5*+0x20'
   - series:
kube_deployment_status_replicas_available{job="prometheus",
namespace="auth-proxy"}
 values: '5+0x9 *4*+0x20'
It will generate an alert.

But if the count is the same, I still get an error.

On Mon, Jul 18, 2022 at 3:22 PM Brian Candler  wrote:

> It's saying there's no alert firing at time 20m ("got" is empty).
>
> That seems correct to me.  You've now made the two input_series have
> identical values:
>
>  input_series:
>- series: kube_replicaset_spec_replicas{job="prometheus",
> namespace="auth-proxy"}
> * values: '5+0x9 5+0x20'*
>- series:
> kube_deployment_status_replicas_available{job="prometheus",
> namespace="auth-proxy"}
> * values: '5+0x9 5+0x20'*
>
> Hence you wouldn't expect the alert to fire, would you?
>
> On Monday, 18 July 2022 at 10:34:22 UTC+1 shivana...@gmail.com wrote:
>
>> Hi,
>>
>> I did some changes as you suggested. After that I am getting below error.
>>
>> Unit Testing:  replicas_mismatch_test.yaml
>>
>>   FAILED:
>>
>> alertname: KubernetesDeploymentReplicasMismatch-authproxy, time: 20m,
>>
>>
>> exp:[
>>
>> 0:
>>
>>   
>> Labels:{alertname="KubernetesDeploymentReplicasMismatch-authproxy",
>> job="prometheus", namespace="auth-proxy", severity="critical"}
>>
>>   Annotations:{description="Deployment Replicas mismatch\n
>> VALUE = {{ $value }}\n LABELS = {{ $labels }}", summary="Kubernetes
>> Deployment replicas mismatch (instance {{$labels.instance }})"}
>>
>> ],
>>
>> got:[]
>>
>>
>> Alert:
>>
>>
>> groups:
>> - name: replicas-mismatch
>> rules:
>> - alert: KubernetesDeploymentReplicasMismatch-authproxy
>> expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} !=
>> kube_deployment_status_replicas_available{namespace="auth-proxy"}
>> for: 10m
>> labels:
>> severity: critical
>> annotations:
>> summary: Kubernetes Deployment replicas mismatch (instance {{
>> $labels.instance }})
>> description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n
>> LABELS = {{ $labels }}"
>>
>> TestCase:
>> rule_files:
>> - /testdata/deployment_replicas_mismatch.yaml
>> evaluation_interval: 1m
>> tests:
>> - interval: 1m
>> # Series Data
>> input_series:
>> - series: kube_replicaset_spec_replicas{job="prometheus",
>> namespace="auth-proxy"}
>> values: '5+0x9 5+0x20'
>> - series: kube_deployment_status_replicas_available{job="prometheus",
>> namespace="auth-proxy"}
>> values: '5+0x9 5+0x20'
>> alert_rule_test:
>> # Unit Test 1
>> - eval_time: 9m
>> alertname: KubernetesDeploymentReplicasMismatch-authproxy
>> exp_alerts:
>>
>> - eval_time: 20m
>> alertname: KubernetesDeploymentReplicasMismatch-authproxy
>> exp_alerts:
>> - exp_labels:
>> namespace: auth-proxy
>> job: prometheus
>> severity: critical
>> exp_annotations:
>> summary: Kubernetes Deployment replicas mismatch (instance
>> {{$labels.instance }})
>> description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n
>> LABELS = {{ $labels }}"
>>
>>
>> On Mon, Jul 18, 2022 at 12:46 PM David Leadbeater  wrote:
>>
>>> You're alerting in the rules with annotations as follows:
>>>
>>> annotations:
>>>   summary: Kubernetes Deployment replicas mismatch (instance {{
>>> $labels.instance }})
>>>   description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n
>>> LABELS = {{ $labels }}"
>>>
>>> Then expecting they match:
>>>
>>> exp_annotations:
>>>   summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
>>>   description: "YaRD_Kubernetes Deployment Replicas Mismatch in
>>> authproxy namespace from 11 min getting alert"
>>>
>>>  You need to update the expected annotations to match that the rules
>>> are generating.
>>>
>>> David
>>>
>>> On Mon, 18 Jul 2022 at 17:09, Shivanand Shete 
>>> wrote:
>>> >
>>> > Hi David,
>>> >
>>> > I corrected the alert name and please find the attached update .yaml
>>> files.
>>> > Alert:
>>> >
>>> > groups:
>>> > - name: replicas-mismatch
>>> > rules:
>>> > - alert: KubernetesDeploymentReplicasMismatch-authproxy
>>> > expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} !=
>>> kube_deployment_status_replicas_available{namespace="auth-proxy"}
>>> > for: 10m
>>> > labels:
>>> > severity: critical
>>> > annotations:
>>> > summary: Kubernetes Deployment replicas mismatch (instance {{
>>> $labels.instance }})
>>> > description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n
>>> LABELS = {{ $labels }}"
>>> >
>>> >
>>> > TestCase:
>>> >
>>> > rule_files:
>>> > - /testdata/deployment_replicas_mismatch.yaml
>>> > evaluation_interval: 1m
>>> > tests:
>>> > - interval: 1m
>>> > # Series Data
>>> > input_series:
>>> > - series: 

Re: [prometheus-users] Need help for alert test using Promtool

2022-07-18 Thread Brian Candler
It's saying there's no alert firing at time 20m ("got" is empty).

That seems correct to me.  You've now made the two input_series have 
identical values:

 input_series:
   - series: kube_replicaset_spec_replicas{job="prometheus", 
namespace="auth-proxy"}
* values: '5+0x9 5+0x20'*
   - series: 
kube_deployment_status_replicas_available{job="prometheus", 
namespace="auth-proxy"}
* values: '5+0x9 5+0x20'*

Hence you wouldn't expect the alert to fire, would you?

On Monday, 18 July 2022 at 10:34:22 UTC+1 shivana...@gmail.com wrote:

> Hi,
>
> I did some changes as you suggested. After that I am getting below error.
>
> Unit Testing:  replicas_mismatch_test.yaml
>
>   FAILED:
>
> alertname: KubernetesDeploymentReplicasMismatch-authproxy, time: 20m, 
>
> exp:[
>
> 0:
>
>   
> Labels:{alertname="KubernetesDeploymentReplicasMismatch-authproxy", 
> job="prometheus", namespace="auth-proxy", severity="critical"}
>
>   Annotations:{description="Deployment Replicas mismatch\n 
> VALUE = {{ $value }}\n LABELS = {{ $labels }}", summary="Kubernetes 
> Deployment replicas mismatch (instance {{$labels.instance }})"}
>
> ], 
>
> got:[]
>
>
> Alert:
>
>
> groups:
> - name: replicas-mismatch
> rules:
> - alert: KubernetesDeploymentReplicasMismatch-authproxy
> expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} != 
> kube_deployment_status_replicas_available{namespace="auth-proxy"}
> for: 10m
> labels:
> severity: critical
> annotations:
> summary: Kubernetes Deployment replicas mismatch (instance {{ 
> $labels.instance }})
> description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n 
> LABELS = {{ $labels }}"
>
> TestCase:
> rule_files:
> - /testdata/deployment_replicas_mismatch.yaml
> evaluation_interval: 1m
> tests:
> - interval: 1m
> # Series Data
> input_series:
> - series: kube_replicaset_spec_replicas{job="prometheus", 
> namespace="auth-proxy"}
> values: '5+0x9 5+0x20'
> - series: kube_deployment_status_replicas_available{job="prometheus", 
> namespace="auth-proxy"}
> values: '5+0x9 5+0x20'
> alert_rule_test:
> # Unit Test 1
> - eval_time: 9m
> alertname: KubernetesDeploymentReplicasMismatch-authproxy
> exp_alerts:
>
> - eval_time: 20m
> alertname: KubernetesDeploymentReplicasMismatch-authproxy
> exp_alerts:
> - exp_labels:
> namespace: auth-proxy
> job: prometheus
> severity: critical
> exp_annotations:
> summary: Kubernetes Deployment replicas mismatch (instance 
> {{$labels.instance }})
> description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n 
> LABELS = {{ $labels }}"
>
>
> On Mon, Jul 18, 2022 at 12:46 PM David Leadbeater  wrote:
>
>> You're alerting in the rules with annotations as follows:
>>
>> annotations:
>>   summary: Kubernetes Deployment replicas mismatch (instance {{
>> $labels.instance }})
>>   description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n
>> LABELS = {{ $labels }}"
>>
>> Then expecting they match:
>>
>> exp_annotations:
>>   summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
>>   description: "YaRD_Kubernetes Deployment Replicas Mismatch in
>> authproxy namespace from 11 min getting alert"
>>
>>  You need to update the expected annotations to match that the rules
>> are generating.
>>
>> David
>>
>> On Mon, 18 Jul 2022 at 17:09, Shivanand Shete  
>> wrote:
>> >
>> > Hi David,
>> >
>> > I corrected the alert name and please find the attached update .yaml 
>> files.
>> > Alert:
>> >
>> > groups:
>> > - name: replicas-mismatch
>> > rules:
>> > - alert: KubernetesDeploymentReplicasMismatch-authproxy
>> > expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} != 
>> kube_deployment_status_replicas_available{namespace="auth-proxy"}
>> > for: 10m
>> > labels:
>> > severity: critical
>> > annotations:
>> > summary: Kubernetes Deployment replicas mismatch (instance {{ 
>> $labels.instance }})
>> > description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n 
>> LABELS = {{ $labels }}"
>> >
>> >
>> > TestCase:
>> >
>> > rule_files:
>> > - /testdata/deployment_replicas_mismatch.yaml
>> > evaluation_interval: 1m
>> > tests:
>> > - interval: 1m
>> > # Series Data
>> > input_series:
>> > - series: kube_replicaset_spec_replicas{job="prometheus", 
>> namespace="auth-proxy"}
>> > values: '5+0x9 5+0x20 5+0x10'
>> > - series: kube_deployment_status_replicas_available{job="prometheus", 
>> namespace="auth-proxy"}
>> > values: '5+0x9 4+0x20 5+0x10'
>> > alert_rule_test:
>> > # Unit Test 1
>> > - eval_time: 9m
>> > alertname: KubernetesDeploymentReplicasMismatch-authproxy
>> > exp_alerts:
>> >
>> > - eval_time: 20m
>> > alertname: KubernetesDeploymentReplicasMismatch-authproxy
>> > exp_alerts:
>> > - exp_labels:
>> > namespace: auth-proxy
>> > job: prometheus
>> > severity: critical
>> > exp_annotations:
>> > summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
>> > description: "YaRD_Kubernetes 

Re: [prometheus-users] Need help for alert test using Promtool

2022-07-18 Thread Shivanand Shete
Hi,

I did some changes as you suggested. After that I am getting below error.

Unit Testing:  replicas_mismatch_test.yaml

  FAILED:

alertname: KubernetesDeploymentReplicasMismatch-authproxy, time: 20m,

exp:[

0:

  
Labels:{alertname="KubernetesDeploymentReplicasMismatch-authproxy",
job="prometheus", namespace="auth-proxy", severity="critical"}

  Annotations:{description="Deployment Replicas mismatch\n
VALUE = {{ $value }}\n LABELS = {{ $labels }}", summary="Kubernetes
Deployment replicas mismatch (instance {{$labels.instance }})"}

],

got:[]


Alert:


groups:
- name: replicas-mismatch
rules:
- alert: KubernetesDeploymentReplicasMismatch-authproxy
expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} !=
kube_deployment_status_replicas_available{namespace="auth-proxy"}
for: 10m
labels:
severity: critical
annotations:
summary: Kubernetes Deployment replicas mismatch (instance {{
$labels.instance }})
description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n LABELS
= {{ $labels }}"

TestCase:
rule_files:
- /testdata/deployment_replicas_mismatch.yaml
evaluation_interval: 1m
tests:
- interval: 1m
# Series Data
input_series:
- series: kube_replicaset_spec_replicas{job="prometheus",
namespace="auth-proxy"}
values: '5+0x9 5+0x20'
- series: kube_deployment_status_replicas_available{job="prometheus",
namespace="auth-proxy"}
values: '5+0x9 5+0x20'
alert_rule_test:
# Unit Test 1
- eval_time: 9m
alertname: KubernetesDeploymentReplicasMismatch-authproxy
exp_alerts:

- eval_time: 20m
alertname: KubernetesDeploymentReplicasMismatch-authproxy
exp_alerts:
- exp_labels:
namespace: auth-proxy
job: prometheus
severity: critical
exp_annotations:
summary: Kubernetes Deployment replicas mismatch (instance
{{$labels.instance }})
description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n LABELS
= {{ $labels }}"


On Mon, Jul 18, 2022 at 12:46 PM David Leadbeater  wrote:

> You're alerting in the rules with annotations as follows:
>
> annotations:
>   summary: Kubernetes Deployment replicas mismatch (instance {{
> $labels.instance }})
>   description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n
> LABELS = {{ $labels }}"
>
> Then expecting they match:
>
> exp_annotations:
>   summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
>   description: "YaRD_Kubernetes Deployment Replicas Mismatch in
> authproxy namespace from 11 min getting alert"
>
>  You need to update the expected annotations to match that the rules
> are generating.
>
> David
>
> On Mon, 18 Jul 2022 at 17:09, Shivanand Shete 
> wrote:
> >
> > Hi David,
> >
> > I corrected the alert name and please find the attached update .yaml
> files.
> > Alert:
> >
> > groups:
> > - name: replicas-mismatch
> > rules:
> > - alert: KubernetesDeploymentReplicasMismatch-authproxy
> > expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} !=
> kube_deployment_status_replicas_available{namespace="auth-proxy"}
> > for: 10m
> > labels:
> > severity: critical
> > annotations:
> > summary: Kubernetes Deployment replicas mismatch (instance {{
> $labels.instance }})
> > description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n
> LABELS = {{ $labels }}"
> >
> >
> > TestCase:
> >
> > rule_files:
> > - /testdata/deployment_replicas_mismatch.yaml
> > evaluation_interval: 1m
> > tests:
> > - interval: 1m
> > # Series Data
> > input_series:
> > - series: kube_replicaset_spec_replicas{job="prometheus",
> namespace="auth-proxy"}
> > values: '5+0x9 5+0x20 5+0x10'
> > - series: kube_deployment_status_replicas_available{job="prometheus",
> namespace="auth-proxy"}
> > values: '5+0x9 4+0x20 5+0x10'
> > alert_rule_test:
> > # Unit Test 1
> > - eval_time: 9m
> > alertname: KubernetesDeploymentReplicasMismatch-authproxy
> > exp_alerts:
> >
> > - eval_time: 20m
> > alertname: KubernetesDeploymentReplicasMismatch-authproxy
> > exp_alerts:
> > - exp_labels:
> > namespace: auth-proxy
> > job: prometheus
> > severity: critical
> > exp_annotations:
> > summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
> > description: "YaRD_Kubernetes Deployment Replicas Mismatch in authproxy
> namespace from 11 min getting alert"
> >
> > Regards,
> > Shivanand Shete.
> >
> > On Mon, Jul 18, 2022 at 12:22 PM David Leadbeater  wrote:
> >>
> >> In your attachment the alertname in the rules and the test doesn't
> >> match -- the unit tests match the alert name first, so fix that first;
> >> then you can iterate on the other fields that need to match (it looks
> >> like the annotations need adjusting).
> >>
> >> David
> >>
> >> On Mon, 18 Jul 2022 at 15:49, Shivanand Shete <
> shivanand.sh...@gmail.com> wrote:
> >> >
> >> > Dear all,
> >> >
> >> > Please find the below alert rules and I want to test that alert using
> Promtool.
> >> >
> >> > groups:
> >> > - name: replicas-mismatch
> >> > rules:
> >> > - alert: KubernetesDeploymentReplicasMismatch-authproxy
> >> > expr: 

Re: [prometheus-users] Need help for alert test using Promtool

2022-07-18 Thread Brian Candler
If I run those tests, I get an error saying describing exactly what the 
problem is:

root@prometheus:~# /opt/prometheus/promtool test rules 
replicas_mismatch_test.yaml
Unit Testing:  replicas_mismatch_test.yaml
  FAILED:
alertname: KubernetesDeploymentReplicasMismatch-authproxy, time: 20m,
*exp*:[
0:
  
Labels:{alertname="KubernetesDeploymentReplicasMismatch-authproxy", 
job="prometheus", namespace="auth-proxy", severity="critical"}
*  Annotations:{description="YaRD_Kubernetes Deployment 
Replicas Mismatch in authproxy namespace from 11 min getting alert", 
summary="Kube_replicaset_spec_replicas_authproxy missmatches"}*
],
*got*:[
0:
  
Labels:{alertname="KubernetesDeploymentReplicasMismatch-authproxy", 
job="prometheus", namespace="auth-proxy", severity="critical"}
*  Annotations:{description="Deployment Replicas mismatch\n 
 VALUE = 5\n  LABELS = map[__name__:kube_replicaset_spec_replicas 
job:prometheus namespace:auth-proxy]", summary="Kubernetes Deployment 
replicas mismatch (instance )"}*
]

That's very clear.  Notice how "exp" (expected) is different to "got" (what 
the alerting rule actually produced).  You can either fix the "exp" to 
match the annotations generated by the alerting rule:

 exp_annotations:
   summary: "Kubernetes Deployment replicas 
mismatch (instance )"
   description: "Deployment Replicas mismatch\n 
 VALUE = 5\n  LABELS = map[__name__:kube_replicaset_spec_replicas 
job:prometheus namespace:auth-proxy]"

(Notice that the instance in the summary is blank, because you didn't set 
an instance label in your test data).

Or you can change the alerting rules themselves to give the annotations 
that you expect in your test.  That's the whole point of testing - to see 
that what you generate is what you expect.

Aside: I see no point in putting LABELS in an annotation.  They are already 
part of the alert.  It only duplicates information, and makes the alert 
harder to understand and harder to test.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/71b561b8-6961-4c26-b093-e0452f8ab056n%40googlegroups.com.


Re: [prometheus-users] Need help for alert test using Promtool

2022-07-18 Thread Shivanand Shete
Hi David,

I corrected the alert name and please find the attached update .yaml files.
Alert:

groups:
- name: replicas-mismatch
rules:
- alert: KubernetesDeploymentReplicasMismatch-authproxy
expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} !=
kube_deployment_status_replicas_available{namespace="auth-proxy"}
for: 10m
labels:
severity: critical
annotations:
summary: Kubernetes Deployment replicas mismatch (instance {{
$labels.instance }})
description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n LABELS
= {{ $labels }}"


TestCase:

rule_files:
- /testdata/deployment_replicas_mismatch.yaml
evaluation_interval: 1m
tests:
- interval: 1m
# Series Data
input_series:
- series: kube_replicaset_spec_replicas{job="prometheus",
namespace="auth-proxy"}
values: '5+0x9 5+0x20 5+0x10'
- series: kube_deployment_status_replicas_available{job="prometheus",
namespace="auth-proxy"}
values: '5+0x9 4+0x20 5+0x10'
alert_rule_test:
# Unit Test 1
- eval_time: 9m
alertname: KubernetesDeploymentReplicasMismatch-authproxy
exp_alerts:

- eval_time: 20m
alertname: KubernetesDeploymentReplicasMismatch-authproxy
exp_alerts:
- exp_labels:
namespace: auth-proxy
job: prometheus
severity: critical
exp_annotations:
summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
description: "YaRD_Kubernetes Deployment Replicas Mismatch in authproxy
namespace from 11 min getting alert"

Regards,
Shivanand Shete.

On Mon, Jul 18, 2022 at 12:22 PM David Leadbeater  wrote:

> In your attachment the alertname in the rules and the test doesn't
> match -- the unit tests match the alert name first, so fix that first;
> then you can iterate on the other fields that need to match (it looks
> like the annotations need adjusting).
>
> David
>
> On Mon, 18 Jul 2022 at 15:49, Shivanand Shete 
> wrote:
> >
> > Dear all,
> >
> > Please find the below alert rules and I want to test that alert using
> Promtool.
> >
> > groups:
> > - name: replicas-mismatch
> > rules:
> > - alert: KubernetesDeploymentReplicasMismatch-authproxy
> > expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} !=
> kube_deployment_status_replicas_available{namespace="auth-proxy"}
> > for: 10m
> > labels:
> > severity: critical
> > annotations:
> > summary: Kubernetes Deployment replicas mismatch (instance {{
> $labels.instance }})
> > description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n
> LABELS = {{ $labels }}"
> >
> > And also I have eaten the test case but its not working please suggest .
> >
> > rule_files:
> > - /testdata/deployment_replicas_mismatch.yaml
> > evaluation_interval: 1m
> > tests:
> > - interval: 1m
> > # Series Data
> > input_series:
> > - series: kube_replicaset_spec_replicas{job="prometheus",
> namespace="auth-proxy"}
> > values: '5+0x9 5+0x20 5+0x10'
> > - series: kube_deployment_status_replicas_available{job="prometheus",
> namespace="auth-proxy"}
> > values: '5+0x9 4+0x20 5+0x10'
> > alert_rule_test:
> > # Unit Test 1
> > - eval_time: 9m
> > alertname: KubernetesDeploymentReplicasMismatch-authproxy
> > exp_alerts:
> >
> > - eval_time: 20m
> > alertname: KubernetesDeploymentReplicasMismatch-authproxy
> > exp_alerts:
> > - exp_labels:
> > namespace: auth-proxy
> > job: prometheus
> > severity: critical
> > exp_annotations:
> > summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
> > description: "YaRD_Kubernetes Deployment Replicas Mismatch in authproxy
> namespace from 11 min getting alert"
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to prometheus-users+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/9af657d1-6240-4d9a-bbad-d44355e6650bn%40googlegroups.com
> .
>


-- 
*Thanks & Regards,*
Shivanand Shete
9422362618

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CADYa5us80YVTVHJmWkAFQrem4tbG-R-xbtqiDnZFDbnoCWyg2Q%40mail.gmail.com.


deployment_replicas_mismatch.yaml
Description: application/yaml


replicas_mismatch_test.yaml
Description: application/yaml


Re: [prometheus-users] Need help for alert test using Promtool

2022-07-18 Thread David Leadbeater
You're alerting in the rules with annotations as follows:

annotations:
  summary: Kubernetes Deployment replicas mismatch (instance {{
$labels.instance }})
  description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n
LABELS = {{ $labels }}"

Then expecting they match:

exp_annotations:
  summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
  description: "YaRD_Kubernetes Deployment Replicas Mismatch in
authproxy namespace from 11 min getting alert"

 You need to update the expected annotations to match that the rules
are generating.

David

On Mon, 18 Jul 2022 at 17:09, Shivanand Shete  wrote:
>
> Hi David,
>
> I corrected the alert name and please find the attached update .yaml files.
> Alert:
>
> groups:
> - name: replicas-mismatch
> rules:
> - alert: KubernetesDeploymentReplicasMismatch-authproxy
> expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} != 
> kube_deployment_status_replicas_available{namespace="auth-proxy"}
> for: 10m
> labels:
> severity: critical
> annotations:
> summary: Kubernetes Deployment replicas mismatch (instance {{ 
> $labels.instance }})
> description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n LABELS = 
> {{ $labels }}"
>
>
> TestCase:
>
> rule_files:
> - /testdata/deployment_replicas_mismatch.yaml
> evaluation_interval: 1m
> tests:
> - interval: 1m
> # Series Data
> input_series:
> - series: kube_replicaset_spec_replicas{job="prometheus", 
> namespace="auth-proxy"}
> values: '5+0x9 5+0x20 5+0x10'
> - series: kube_deployment_status_replicas_available{job="prometheus", 
> namespace="auth-proxy"}
> values: '5+0x9 4+0x20 5+0x10'
> alert_rule_test:
> # Unit Test 1
> - eval_time: 9m
> alertname: KubernetesDeploymentReplicasMismatch-authproxy
> exp_alerts:
>
> - eval_time: 20m
> alertname: KubernetesDeploymentReplicasMismatch-authproxy
> exp_alerts:
> - exp_labels:
> namespace: auth-proxy
> job: prometheus
> severity: critical
> exp_annotations:
> summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
> description: "YaRD_Kubernetes Deployment Replicas Mismatch in authproxy 
> namespace from 11 min getting alert"
>
> Regards,
> Shivanand Shete.
>
> On Mon, Jul 18, 2022 at 12:22 PM David Leadbeater  wrote:
>>
>> In your attachment the alertname in the rules and the test doesn't
>> match -- the unit tests match the alert name first, so fix that first;
>> then you can iterate on the other fields that need to match (it looks
>> like the annotations need adjusting).
>>
>> David
>>
>> On Mon, 18 Jul 2022 at 15:49, Shivanand Shete  
>> wrote:
>> >
>> > Dear all,
>> >
>> > Please find the below alert rules and I want to test that alert using 
>> > Promtool.
>> >
>> > groups:
>> > - name: replicas-mismatch
>> > rules:
>> > - alert: KubernetesDeploymentReplicasMismatch-authproxy
>> > expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} != 
>> > kube_deployment_status_replicas_available{namespace="auth-proxy"}
>> > for: 10m
>> > labels:
>> > severity: critical
>> > annotations:
>> > summary: Kubernetes Deployment replicas mismatch (instance {{ 
>> > $labels.instance }})
>> > description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n LABELS 
>> > = {{ $labels }}"
>> >
>> > And also I have eaten the test case but its not working please suggest .
>> >
>> > rule_files:
>> > - /testdata/deployment_replicas_mismatch.yaml
>> > evaluation_interval: 1m
>> > tests:
>> > - interval: 1m
>> > # Series Data
>> > input_series:
>> > - series: kube_replicaset_spec_replicas{job="prometheus", 
>> > namespace="auth-proxy"}
>> > values: '5+0x9 5+0x20 5+0x10'
>> > - series: kube_deployment_status_replicas_available{job="prometheus", 
>> > namespace="auth-proxy"}
>> > values: '5+0x9 4+0x20 5+0x10'
>> > alert_rule_test:
>> > # Unit Test 1
>> > - eval_time: 9m
>> > alertname: KubernetesDeploymentReplicasMismatch-authproxy
>> > exp_alerts:
>> >
>> > - eval_time: 20m
>> > alertname: KubernetesDeploymentReplicasMismatch-authproxy
>> > exp_alerts:
>> > - exp_labels:
>> > namespace: auth-proxy
>> > job: prometheus
>> > severity: critical
>> > exp_annotations:
>> > summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
>> > description: "YaRD_Kubernetes Deployment Replicas Mismatch in authproxy 
>> > namespace from 11 min getting alert"
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups 
>> > "Prometheus Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an 
>> > email to prometheus-users+unsubscr...@googlegroups.com.
>> > To view this discussion on the web visit 
>> > https://groups.google.com/d/msgid/prometheus-users/9af657d1-6240-4d9a-bbad-d44355e6650bn%40googlegroups.com.
>
>
>
> --
> Thanks & Regards,
> Shivanand Shete
> 9422362618

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: [prometheus-users] Need help for alert test using Promtool

2022-07-18 Thread David Leadbeater
In your attachment the alertname in the rules and the test doesn't
match -- the unit tests match the alert name first, so fix that first;
then you can iterate on the other fields that need to match (it looks
like the annotations need adjusting).

David

On Mon, 18 Jul 2022 at 15:49, Shivanand Shete  wrote:
>
> Dear all,
>
> Please find the below alert rules and I want to test that alert using 
> Promtool.
>
> groups:
> - name: replicas-mismatch
> rules:
> - alert: KubernetesDeploymentReplicasMismatch-authproxy
> expr: kube_replicaset_spec_replicas{namespace="auth-proxy"} != 
> kube_deployment_status_replicas_available{namespace="auth-proxy"}
> for: 10m
> labels:
> severity: critical
> annotations:
> summary: Kubernetes Deployment replicas mismatch (instance {{ 
> $labels.instance }})
> description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n LABELS = 
> {{ $labels }}"
>
> And also I have eaten the test case but its not working please suggest .
>
> rule_files:
> - /testdata/deployment_replicas_mismatch.yaml
> evaluation_interval: 1m
> tests:
> - interval: 1m
> # Series Data
> input_series:
> - series: kube_replicaset_spec_replicas{job="prometheus", 
> namespace="auth-proxy"}
> values: '5+0x9 5+0x20 5+0x10'
> - series: kube_deployment_status_replicas_available{job="prometheus", 
> namespace="auth-proxy"}
> values: '5+0x9 4+0x20 5+0x10'
> alert_rule_test:
> # Unit Test 1
> - eval_time: 9m
> alertname: KubernetesDeploymentReplicasMismatch-authproxy
> exp_alerts:
>
> - eval_time: 20m
> alertname: KubernetesDeploymentReplicasMismatch-authproxy
> exp_alerts:
> - exp_labels:
> namespace: auth-proxy
> job: prometheus
> severity: critical
> exp_annotations:
> summary: "Kube_replicaset_spec_replicas_authproxy missmatches"
> description: "YaRD_Kubernetes Deployment Replicas Mismatch in authproxy 
> namespace from 11 min getting alert"
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/9af657d1-6240-4d9a-bbad-d44355e6650bn%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAP9KPhAR5SxpDaAGaWvKh%2B_OAHMH9u26-uP4kS%3D_oBzsZUQS%2Bw%40mail.gmail.com.