Re: [prometheus-users] Alerts Description and Summary

2023-03-27 Thread sayf.eddi...@gmail.com
Thanks for the response

By generating automatic documentation I meant automatically creating 
developer documentation of the existing alerts and their descriptions from 
the yml files rather than runtime information.
which means the labels will not be replaced, so the less labels the more 
readable the result.

I am considering the addition of an extra annotation field for that purpose 
I think it is better to separate the concerns here.

On Monday, March 27, 2023 at 3:46:11 PM UTC+2 Stuart Clark wrote:

> On 2023-03-27 14:43, sayf.eddi...@gmail.com wrote:
> > Hello, I have looked online and I cant find any best practices for
> > filling up the description and the summary. from the examples I see
> > that Summary should be the shortest (plus the minimum usage of
> > labels). But maybe it is an observation bias.
> > 
> > I am trying to generate some automatic documentation around alerting
> > and having a lot of labels makes it as user friendly as reading the
> > yaml file directly
> > 
>
> It really depends how you are wanting to use those. If you are wanting 
> to use the summary in an email's subject line then you probably want it 
> to be fairly short for example. You can have as many labels/annotations 
> as you like, so you don't even have to have one called "summary" if you 
> don't want to, and there's nothing stopping you from having much more 
> specific labels (e.g. severity, service, environment) which you can then 
> include in email/ticket subjects.
>
> -- 
> Stuart Clark
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8e2f11b4-547f-4647-9cf0-ad6e39088dccn%40googlegroups.com.


Re: [prometheus-users] Alerts Description and Summary

2023-03-27 Thread Stuart Clark

On 2023-03-27 14:43, sayf.eddi...@gmail.com wrote:

Hello, I have looked online and I cant find any best practices for
filling up the description and the summary. from the examples I see
that Summary should be the shortest (plus the minimum usage of
labels). But maybe it is an observation bias.

I am trying to generate some automatic documentation around alerting
and having a lot of labels makes it as user friendly as reading the
yaml file directly



It really depends how you are wanting to use those. If you are wanting 
to use the summary in an email's subject line then you probably want it 
to be fairly short for example. You can have as many labels/annotations 
as you like, so you don't even have to have one called "summary" if you 
don't want to, and there's nothing stopping you from having much more 
specific labels (e.g. severity, service, environment) which you can then 
include in email/ticket subjects.


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0a4beb659ca258bac3946b69f7280b3d%40Jahingo.com.


[prometheus-users] Alerts Description and Summary

2023-03-27 Thread sayf.eddi...@gmail.com
Hello, I have looked online and I cant find any best practices for filling 
up the description and the summary. from the examples I see that Summary 
should be the shortest (plus the minimum usage of labels). But maybe it is 
an observation bias. 

I am trying to generate some automatic documentation around alerting and 
having a lot of labels makes it as user friendly as reading the yaml file 
directly

Regards

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/14525699-58bb-4ece-b76a-43ea92878400n%40googlegroups.com.


Re: [prometheus-users] Re: promql - what is promql for calcuate percetile

2023-03-27 Thread Bjoern Rabenstein
On 20.03.23 03:28, Brian Candler wrote:
> > Note - I have no bucket metrics for histogram. 
> 
> What you say doesn't make sense to me.  What you showed *is* a histogram, 
> and the metrics *prometheus_rule_evaluation_duration_seconds* *are* the 
> buckets.

Strictly speaking, it's a summary, and the metrics labeled with
"quantile" are precalculated
quantiles. Cf. https://prometheus.io/docs/practices/histograms/ 

> Therefore, if those are the metrics you have, then the 50th percentile is 
> simply
> prometheus_rule_evaluation_duration_seconds{quantile="0.5"}
> and the 90th percentile is simply
> prometheus_rule_evaluation_duration_seconds{quantile="0.9"}
> 
> There is no need to "calculate" the p50/p90/p99 latencies because you 
> already have them.

That's correct. Note that there is no way to further aggregate the
pre-calculated quantile (or change them for example to a different
quantile or to a different time interval).

If you need aggregatability or more flexibility for add-hoc queries,
you have to use an actual histogram in your instrumentation of the
monitored target (either the classic histograms or the new
experimental native histograms).

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ZCGTQddnGDnFW5vL%40mail.rabenste.in.


Re: [prometheus-users] Graph how long a job takes

2023-03-27 Thread Roberto Nunnari
Hello.

Thank you Stuart and Brian for your answer.

Unfortunately english is not my native language, and I cannot see how I
could make my question more understandable. Anyways, Brian got it right. :-)

In the meantime I already found the solution:
1) on a linux box, combined crontab+bash script+python script: make the sql
query and measure how long it takes
2) push this on pushgateway
3) scratch from prometheus
4) scratch from grafana

Best regards.
Roberto


Il giorno lun 27 mar 2023 alle ore 11:14 Brian Candler 
ha scritto:

> I think the question was, "does sql_exporter include a metric for the
> execution time of a query"?
>
> If it doesn't, and you're only doing a single query in the scrape job,
> then you could use prometheus' own *scrape_duration_seconds* metric as a
> rough proxy.
>
> On Monday, 27 March 2023 at 09:41:33 UTC+1 Stuart Clark wrote:
>
>> On 2023-03-24 14:01, Nunni wrote:
>> > Hello.
>> >
>> > I need is observe how long it takes to complete the execution of a
>> > particular mssql query. The query is executed once every hour, and I
>> > want to graph that for say a period of ten days.
>> > The query is up and running using sql_exporter and prometheus gets the
>> > results and correctly graphs it, but that’s not what I need.
>> >
>>
>> Unfortunately it isn't clear what you are asking for. From your
>> description it sounds like you are graphing the data, but you also say
>> it isn't what you need (but don't say how or what you are hoping for).
>> If you could describe what you are doing, what is happening & what you
>> are wanting instead someone might be able to suggest something...
>>
>> --
>> Stuart Clark
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/51f23a79-0351-46ff-aa79-cd13f807c11en%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CACWyeot8VSaUYH%3D71J_%3D7CkQ4HZrrNiD%3Dj5zmSgysHUiwC%3DkRA%40mail.gmail.com.


[prometheus-users] snmp_exporter missing label

2023-03-27 Thread Kordován Szabolcs
Hi,

Here is a very simple snmp.yaml which collects the sysUpTime and adds
sysName as a label.
test_uptime:
  walk:
  - 1.3.6.1.2.1.1.5
  get:
  - 1.3.6.1.2.1.1.3.0
  metrics:
  - name: sysUpTime
oid: 1.3.6.1.2.1.1.3
type: gauge
help: The time (in hundredths of a second) since the network management
portion
  of the system was last re-initialized. - 1.3.6.1.2.1.1.3
lookups:
- labels: []
  labelname: sysName
  oid: 1.3.6.1.2.1.1.5
  type: DisplayString
  auth:
community: public

In the snmp_exporter output there is no label:

# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk
and processing).
# TYPE snmp_scrape_duration_seconds gauge
snmp_scrape_duration_seconds 0.016483272
# HELP snmp_scrape_packets_retried Packets retried for get, bulkget, and walk.
# TYPE snmp_scrape_packets_retried gauge
snmp_scrape_packets_retried 0
# HELP snmp_scrape_packets_sent Packets sent for get, bulkget, and
walk; including retries.
# TYPE snmp_scrape_packets_sent gauge
snmp_scrape_packets_sent 2
# HELP snmp_scrape_pdus_returned PDUs returned from get, bulkget, and walk.
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned 2
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds 0.016419321
# HELP sysUpTime The time (in hundredths of a second) since the
network management portion of the system was last re-initialized. -
1.3.6.1.2.1.1.3
# TYPE sysUpTime gauge
sysUpTime 2.67342443e+08

It should be like this: sysUpTime{sysName="foo"} 2.67342443e+08

What's wrong?

Regards,

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CANKf1Ty5B%2B%3DOnmWN5CQUUa3JpJ3N_sT%3Dc7cFBt45YVEO0FxHHQ%40mail.gmail.com.


Re: [prometheus-users] Graph how long a job takes

2023-03-27 Thread Brian Candler
I think the question was, "does sql_exporter include a metric for the 
execution time of a query"?

If it doesn't, and you're only doing a single query in the scrape job, then 
you could use prometheus' own *scrape_duration_seconds* metric as a rough 
proxy.

On Monday, 27 March 2023 at 09:41:33 UTC+1 Stuart Clark wrote:

> On 2023-03-24 14:01, Nunni wrote:
> > Hello.
> > 
> > I need is observe how long it takes to complete the execution of a
> > particular mssql query. The query is executed once every hour, and I
> > want to graph that for say a period of ten days.
> > The query is up and running using sql_exporter and prometheus gets the
> > results and correctly graphs it, but that’s not what I need.
> > 
>
> Unfortunately it isn't clear what you are asking for. From your 
> description it sounds like you are graphing the data, but you also say 
> it isn't what you need (but don't say how or what you are hoping for). 
> If you could describe what you are doing, what is happening & what you 
> are wanting instead someone might be able to suggest something...
>
> -- 
> Stuart Clark
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/51f23a79-0351-46ff-aa79-cd13f807c11en%40googlegroups.com.


Re: [prometheus-users] Graph how long a job takes

2023-03-27 Thread Stuart Clark

On 2023-03-24 14:01, Nunni wrote:

Hello.

I need is observe how long it takes to complete the execution of a
particular mssql query. The query is executed once every hour, and I
want to graph that for say a period of ten days.
The query is up and running using sql_exporter and prometheus gets the
results and correctly graphs it, but that’s not what I need.



Unfortunately it isn't clear what you are asking for. From your 
description it sounds like you are graphing the data, but you also say 
it isn't what you need (but don't say how or what you are hoping for). 
If you could describe what you are doing, what is happening & what you 
are wanting instead someone might be able to suggest something...


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/07bd9e311b187f0b1dd544d9bd4ed92e%40Jahingo.com.


Re: [prometheus-users] Separate endpoint for aggregate metrics?

2023-03-27 Thread Stuart Clark

On 2023-03-25 07:30, Kevin Z wrote:

Hi,

We have a server that has a high cardinality of metrics, mainly due to
a label that is tagged on the majority of the metrics. However, most
of our dashboards/queries don't use this label, and just use aggregate
queries. There are specific scenarios where we would need to debug and
sort based on the label, but this doesn't happen that often.

Is it a common design pattern to separate out two metrics endpoints,
one for aggregates, one for labelled metrics, with different scrape
intervals? This way we could limit the impact of the high cardinality
time series, by scraping the labelled metrics less frequently.

Couple of follow-up questions:
- When a query that uses the aggregate metric comes in, does it matter
that the data is potentially duplicated between the two endpoints? How
do we ensure that it doesn't try loading all the different time series
with the label and then aggregating, and instead directly use the
aggregate metric itself?
- How could we make sure this new setup is more efficient than the old
one? What criteria/metrics would be best (query evaluation time?
amount of data ingested?)



You certainly could split things into two endpoints and scrape at 
different intervals, however it is unlikely to make little/any 
difference. From the Prometheus side data points within a time series 
are very low impact. So for your aggregate endpoint you might be 
scraping every 30 seconds and the full data every 2 minutes (the slowest 
available scrape interval) meaning there are 4x less data points, which 
has very little memory impact.


You mention that there is a high cardinality - that is the thing which 
you need to fix, as that will be having the impact. You say there is a 
problematic label applied to most of the metrics. Can it be removed? 
What makes it problematic?


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/37bb8439f657caf18a5923d94b7db4f0%40Jahingo.com.


Re: [prometheus-users] Scrape only a particular namespace and ignore all other namespaces

2023-03-27 Thread Himanshu Gude
Hello Julien, 
I have used the method which you have mentioned. I think this way will stop 
the prometheus from scraping metrics for all the other namespaces except 
devsecops. If this works it will also fix the issue that i m facing. 
But I think supppressing the alerts from alertmanager will be a better fix 
for me. If you can suggest something for that it will help. 

Thanks again!

On Monday, March 27, 2023 at 12:24:34 AM UTC+5:30 Julien Pivotto wrote:

> You can update the prometheus config:
>
>
> scrape_configs:
> - job_name: 'kubernetes-pods'
> kubernetes_sd_configs:
> - role: pod
> namespaces:
> names:
> - devsecops
>
> On 25 Mar 23:00, Himanshu Gude wrote:
> > Hello guys,
> > 
> > There is a requirement in my project which states that, the client only 
> > wants alerts from "devsecops" namespace and alerts from all other 
> > namespaces should be ignored. which means alerts from namespaces other 
> than 
> > "devsecops" should not be captured by the alertmanager and should not 
> > trigger tickets in ServiceNow.
> > 
> > In short, prometheus should only scrape metrics from only one namespace 
> ie; 
> > devsecops
> > 
> > would really appreciate some sort of hel here.
> > Thanks in advance
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Prometheus Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to prometheus-use...@googlegroups.com.
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/e4ef8176-930d-4743-9118-90bbab5f2164n%40googlegroups.com
> .
>
>
> -- 
> Julien Pivotto
> @roidelapluie
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/91c46a6e-7232-45dd-8139-03ef44e38ecen%40googlegroups.com.


[prometheus-users] Re: Scrape only a particular namespace and ignore all other namespaces

2023-03-27 Thread Himanshu Gude
Thank you Brian. This is exactly what I was looking for. 

On Monday, March 27, 2023 at 12:30:31 PM UTC+5:30 Brian Candler wrote:

> There are actually a a couple of options.
>
> In each alerting rule (in prometheus) you can match on labels in the 
> actual alerting expression: e.g.
>
> expr: node_boot_time_seconds > (node_boot_time_seconds offset 5m + 
> 5)
> becomes
> expr: node_boot_time_seconds*{namespace="devsecops"}* > 
> (node_boot_time_seconds offset 5m + 5)
>
> But I'd say it's simpler and more flexible to allow all the alerts to 
> fire, but suppress the notifications in alertmanager routing rules:
>
> route:
>   receiver: noc
>   routes:
> - receiver: discard
>   matchers:
> - namespace!="devsecops"
> # more routing rules if desired go here
>
> receivers:
>   - name: discard
>   - name: noc
> email_configs:
>   - to: n...@example.com
> send_resolved: false
>
> Then you can *see* alerts firing in other environments in the prometheus 
> web UI (which helps you test and tune the rules), but notifications aren't 
> sent out for anything apart from devsecops.
>
> This does require that all the alerting expressions pass through a 
> namespace label. Pasting each alerting expression into the PromQL browser 
> in the Prometheus web UI will let you see exactly what timeseries values it 
> generates (including the labels), as long as it's generating at least one 
> alert of course.
>
> On Monday, 27 March 2023 at 06:56:58 UTC+1 Himanshu Gude wrote:
>
>> yes, I m new to prometheus env. could you pls give an example for each 
>> case, which will help me to figure out what changes to be made in the 
>> configurations. Your first case is what I am looking for. 
>> Thanks in advance! 
>>
>> On Sunday, March 26, 2023 at 9:20:38 PM UTC+5:30 Brian Candler wrote:
>>
>>> That's two different things you're asking.
>>>
>>> * alerting only from "devsecops" namespace: ensure every metric has a 
>>> label with the namespace, and configure alerting rules to match only where 
>>> namespace=devsecops
>>>
>>> * scraping only from one namespace: if you can't configure the scrape 
>>> URL to fetch only one namespace, then you can use metric_relabel_rules to 
>>> discard all metrics except the ones of interest (again, matching on some 
>>> namespace label to identify which ones to keep or drop)
>>>
>>> On Sunday, 26 March 2023 at 07:28:36 UTC+1 Himanshu Gude wrote:
>>>
 Hello guys,

 There is a requirement in my project which states that, the client only 
 wants alerts from "devsecops" namespace and alerts from all other 
 namespaces should be ignored. which means alerts from namespaces other 
 than 
 "devsecops" should not be captured by the alertmanager and should not 
 trigger tickets in ServiceNow.

 In short, prometheus should only scrape metrics from only one namespace 
 ie; devsecops

 would really appreciate some sort of hel here.
 Thanks in advance

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/98a6a9fa-50f4-4144-9877-015e6aae8243n%40googlegroups.com.


[prometheus-users] Re: Scrape only a particular namespace and ignore all other namespaces

2023-03-27 Thread Brian Candler
There are actually a a couple of options.

In each alerting rule (in prometheus) you can match on labels in the actual 
alerting expression: e.g.

expr: node_boot_time_seconds > (node_boot_time_seconds offset 5m + 
5)
becomes
expr: node_boot_time_seconds*{namespace="devsecops"}* > 
(node_boot_time_seconds offset 5m + 5)

But I'd say it's simpler and more flexible to allow all the alerts to fire, 
but suppress the notifications in alertmanager routing rules:

route:
  receiver: noc
  routes:
- receiver: discard
  matchers:
- namespace!="devsecops"
# more routing rules if desired go here

receivers:
  - name: discard
  - name: noc
email_configs:
  - to: n...@example.com
send_resolved: false

Then you can *see* alerts firing in other environments in the prometheus 
web UI (which helps you test and tune the rules), but notifications aren't 
sent out for anything apart from devsecops.

This does require that all the alerting expressions pass through a 
namespace label. Pasting each alerting expression into the PromQL browser 
in the Prometheus web UI will let you see exactly what timeseries values it 
generates (including the labels), as long as it's generating at least one 
alert of course.

On Monday, 27 March 2023 at 06:56:58 UTC+1 Himanshu Gude wrote:

> yes, I m new to prometheus env. could you pls give an example for each 
> case, which will help me to figure out what changes to be made in the 
> configurations. Your first case is what I am looking for. 
> Thanks in advance! 
>
> On Sunday, March 26, 2023 at 9:20:38 PM UTC+5:30 Brian Candler wrote:
>
>> That's two different things you're asking.
>>
>> * alerting only from "devsecops" namespace: ensure every metric has a 
>> label with the namespace, and configure alerting rules to match only where 
>> namespace=devsecops
>>
>> * scraping only from one namespace: if you can't configure the scrape URL 
>> to fetch only one namespace, then you can use metric_relabel_rules to 
>> discard all metrics except the ones of interest (again, matching on some 
>> namespace label to identify which ones to keep or drop)
>>
>> On Sunday, 26 March 2023 at 07:28:36 UTC+1 Himanshu Gude wrote:
>>
>>> Hello guys,
>>>
>>> There is a requirement in my project which states that, the client only 
>>> wants alerts from "devsecops" namespace and alerts from all other 
>>> namespaces should be ignored. which means alerts from namespaces other than 
>>> "devsecops" should not be captured by the alertmanager and should not 
>>> trigger tickets in ServiceNow.
>>>
>>> In short, prometheus should only scrape metrics from only one namespace 
>>> ie; devsecops
>>>
>>> would really appreciate some sort of hel here.
>>> Thanks in advance
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7bce9b99-125b-41da-997e-d2af0e0518adn%40googlegroups.com.