[prometheus-users] Re: Sharing selected data between 2 Prometheis

2023-09-03 Thread yagyans...@gmail.com
@Brain
Is there a way at all to do this via remote_read?

On Sunday, September 3, 2023 at 1:58:31 AM UTC+5:30 yagyans...@gmail.com 
wrote:

> Federation might match your use case, since you can filter metrics by 
> regex. But effectively you are rescraping the data, so the timestamps won't 
> be the same. (Although sometimes this is an advantage, e.g. if you actually 
> want lower resolution data in your copy).
> >> Hm, I did not think about timestamps being slightly different.
>
> Otherwise, look at remote write, using write_relabel_configs to filter the 
> data, which will copy the data exactly, and buffer it if the remote system 
> is temporarily down. I don't understand why you discarded that option - 
> what makes you think resource utilization is poor for prom A remote-writing 
> to prom B?
> >> According to the official docs and quite a few blogs, remote_write 
> increases the resource utilization of source Prometheus by ~ 25%. That is 
> the only reason I'm trying to avoid that.
>
> Remote read is not an option for syncing data as far as I know - only for 
> performing queries from a remote data source.  There is "backfill 
> <https://prometheus.io/docs/prometheus/latest/storage/#backfilling-from-openmetrics-format>"
>  
> but it is only for historical data; it doesn't work for the head block.
> >> Maybe, I used the word sync incorrectly. I actually want to query some 
> data available in Prometheus 1 in my 2nd Prometheus and use them for some 
> autoscaling use-case.
>
>
>
> On Sunday, September 3, 2023 at 1:20:54 AM UTC+5:30 Brian Candler wrote:
>
>> Federation might match your use case, since you can filter metrics by 
>> regex. But effectively you are rescraping the data, so the timestamps won't 
>> be the same. (Although sometimes this is an advantage, e.g. if you actually 
>> want lower resolution data in your copy).
>>
>> Otherwise, look at remote write, using write_relabel_configs to filter 
>> the data, which will copy the data exactly, and buffer it if the remote 
>> system is temporarily down. I don't understand why you discarded that 
>> option - what makes you think resource utilization is poor for prom A 
>> remote-writing to prom B?
>>
>> Remote read is not an option for syncing data as far as I know - only for 
>> performing queries from a remote data source.  There is "backfill 
>> <https://prometheus.io/docs/prometheus/latest/storage/#backfilling-from-openmetrics-format>"
>>  
>> but it is only for historical data; it doesn't work for the head block.
>>
>> On Saturday, 2 September 2023 at 20:40:55 UTC+1 yagyans...@gmail.com 
>> wrote:
>>
>>> I have a use-case wherein I need to send some filtered data (metrics 
>>> that match a regex) from one Prometheus to another.
>>> What is the suggested approach for such cases? I was thinking of using 
>>> remote_read (over remote_write because remote_read is better in terms of 
>>> resource utilization), but I couldn't find anything in the docs to suggest 
>>> that I can filter what data to read while using remote_read.
>>> Prometheus Version - v2.34.0
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/33123d93-af22-4f01-ab22-58eca28ec9f3n%40googlegroups.com.


[prometheus-users] Re: Sharing selected data between 2 Prometheis

2023-09-02 Thread yagyans...@gmail.com
Federation might match your use case, since you can filter metrics by 
regex. But effectively you are rescraping the data, so the timestamps won't 
be the same. (Although sometimes this is an advantage, e.g. if you actually 
want lower resolution data in your copy).
>> Hm, I did not think about timestamps being slightly different.

Otherwise, look at remote write, using write_relabel_configs to filter the 
data, which will copy the data exactly, and buffer it if the remote system 
is temporarily down. I don't understand why you discarded that option - 
what makes you think resource utilization is poor for prom A remote-writing 
to prom B?
>> According to the official docs and quite a few blogs, remote_write 
increases the resource utilization of source Prometheus by ~ 25%. That is 
the only reason I'm trying to avoid that.

Remote read is not an option for syncing data as far as I know - only for 
performing queries from a remote data source.  There is "backfill 
<https://prometheus.io/docs/prometheus/latest/storage/#backfilling-from-openmetrics-format>"
 
but it is only for historical data; it doesn't work for the head block.
>> Maybe, I used the word sync incorrectly. I actually want to query some 
data available in Prometheus 1 in my 2nd Prometheus and use them for some 
autoscaling use-case.



On Sunday, September 3, 2023 at 1:20:54 AM UTC+5:30 Brian Candler wrote:

> Federation might match your use case, since you can filter metrics by 
> regex. But effectively you are rescraping the data, so the timestamps won't 
> be the same. (Although sometimes this is an advantage, e.g. if you actually 
> want lower resolution data in your copy).
>
> Otherwise, look at remote write, using write_relabel_configs to filter the 
> data, which will copy the data exactly, and buffer it if the remote system 
> is temporarily down. I don't understand why you discarded that option - 
> what makes you think resource utilization is poor for prom A remote-writing 
> to prom B?
>
> Remote read is not an option for syncing data as far as I know - only for 
> performing queries from a remote data source.  There is "backfill 
> <https://prometheus.io/docs/prometheus/latest/storage/#backfilling-from-openmetrics-format>"
>  
> but it is only for historical data; it doesn't work for the head block.
>
> On Saturday, 2 September 2023 at 20:40:55 UTC+1 yagyans...@gmail.com 
> wrote:
>
>> I have a use-case wherein I need to send some filtered data (metrics that 
>> match a regex) from one Prometheus to another.
>> What is the suggested approach for such cases? I was thinking of using 
>> remote_read (over remote_write because remote_read is better in terms of 
>> resource utilization), but I couldn't find anything in the docs to suggest 
>> that I can filter what data to read while using remote_read.
>> Prometheus Version - v2.34.0
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1eeef0a2-e47d-4613-80e6-e3d9f8829991n%40googlegroups.com.


[prometheus-users] Sharing selected data between 2 Prometheis

2023-09-02 Thread yagyans...@gmail.com
I have a use-case wherein I need to send some filtered data (metrics that 
match a regex) from one Prometheus to another.
What is the suggested approach for such cases? I was thinking of using 
remote_read (over remote_write because remote_read is better in terms of 
resource utilization), but I couldn't find anything in the docs to suggest 
that I can filter what data to read while using remote_read.
Prometheus Version - v2.34.0

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/86cf784b-caa5-4f58-9376-442cc67f7889n%40googlegroups.com.


[prometheus-users] Quantile in Summary for Python Client.

2021-07-08 Thread yagyans...@gmail.com

Hi. I want to integrate my python application with Prometheus for 
monitoring. For summary type metric, I have seen that we have quantiles 
that we can add to the metrics but for python client, I do not see the 
option to add quantiles to the summary type metric. For Go and Java, it is 
available. 
Am I missing something here or does the python_client does not have the 
option to add quantiles to the summary metric?

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c05d05ca-18f1-4c50-a62b-a2c499d92b36n%40googlegroups.com.


[prometheus-users] Usage of fail_if_body_not_matches_regexp and fail_if_body_matches_regexp.

2021-05-01 Thread yagyans...@gmail.com
Hi. I am using Blackbox Exporter version 0.18.0. I am want to know which 
field will be considered in case fail_if_body_not_matches_regexp and 
fail_if_body_matches_regexp contradict each other? For example:
  fail_if_body_not_matches_regexp: ['OK']
  fail_if_body_matches_regexp: ['NOK']

Now, if a URL is giving me a response - NOK, will that be mapped with 
body_matches or body_not_matches?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/398e7637-b563-4c85-99f6-a319eb3893f3n%40googlegroups.com.


[prometheus-users] Way too high Memory Usage for very less time series.

2021-04-27 Thread yagyans...@gmail.com

Hi. I am using Prometheus version 2.26.0. The Prometheus has currently 
around only 8 lakh time series(0.8 Million), but the memory usage of 
Prometheus is touching as high as 76GB.
I have another Prometheus v 2.22.0 setup that is scraping different targets 
but is currently handling around 10 Million time series and that uses 52-55 
GB of memory. I am not seeing anything abnormal in the logs of 2.26.0. 
Memory chunks are also usually at 5 Million in 2.26.0 whereas 2.22.0 
usually have 55 Million.

How do I debug why is the memory usage abnormally high? Can someone please 
point me in the right direction? 

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/cec9af3f-fb0e-4142-807b-c26ed35d977fn%40googlegroups.com.


[prometheus-users] Re: Working on Blackbox's "fail_if_body_not_matches_regexp".

2021-03-22 Thread yagyans...@gmail.com
Can someone please help! I am confused here.

On Monday, March 15, 2021 at 12:20:03 PM UTC+5:30 yagyans...@gmail.com 
wrote:

>
> Hi. I am using blackbox_exporter version 0.18.0 and I am using http prober 
> to check if the response by my URL is "OK" or not. Below is the 
> configuration of the module.
>
>   http_healthcheck_ok:
> prober: http
> timeout: 10s
> http:
>   valid_http_versions: ["HTTP/1.1", "HTTP/2.0", "HTTP/1.0"]
>  * fail_if_body_not_matches_regexp: ['OK']*
>   method: GET
>   no_follow_redirects: false
>   fail_if_ssl: false
>   fail_if_not_ssl: false
>   tls_config:
> insecure_skip_verify: true
>   preferred_ip_protocol: "ip4"
>
> So, whenever my URL throws anything other than OK in the response body, 
> probe_failed_due_to_regex should 1 right? But when the URL was throwing *NOK 
> *as the response body, probe_failed_due_to_regex was still 0, whereas it 
> should be 1. Am I missing something here?
>
> Thanks in advance!
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e6db4504-48e6-4ad9-a640-1e63dc6e472en%40googlegroups.com.


[prometheus-users] Working on Blackbox's "fail_if_body_not_matches_regexp".

2021-03-15 Thread yagyans...@gmail.com

Hi. I am using blackbox_exporter version 0.18.0 and I am using http prober 
to check if the response by my URL is "OK" or not. Below is the 
configuration of the module.

  http_healthcheck_ok:
prober: http
timeout: 10s
http:
  valid_http_versions: ["HTTP/1.1", "HTTP/2.0", "HTTP/1.0"]
 * fail_if_body_not_matches_regexp: ['OK']*
  method: GET
  no_follow_redirects: false
  fail_if_ssl: false
  fail_if_not_ssl: false
  tls_config:
insecure_skip_verify: true
  preferred_ip_protocol: "ip4"

So, whenever my URL throws anything other than OK in the response body, 
probe_failed_due_to_regex should 1 right? But when the URL was throwing *NOK 
*as the response body, probe_failed_due_to_regex was still 0, whereas it 
should be 1. Am I missing something here?

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/43ca8951-1ba1-4415-80b0-38c04861472an%40googlegroups.com.


[prometheus-users] Blackbox ICMP Probe not working properly.

2021-03-06 Thread yagyans...@gmail.com

Hi. I am using blackbox exporter version 0.18.0 and using ICMP module to 
check for Host Downs in my environment. I have started noticing a 
discrepancy from past few days that sometimes when the host is down 
actually Blackbox still returns probe_success as true.
On getting into it further, I found out that for some of my targets I am 
observing last log line for ICMP probe in the Blackbox logs 7 days ago, 2 
days ago for some etc. So, I assume blackbox has not sent a ICMP requests 
since 7 days to that target and hence it kept on giving the information it 
got whenever it last sent it.

What could be the reason for it? 

Sample example. Today is 7th March and last Probe Succeeded for this target 
is on March 2, not failures or success after that.

Mar  2 22:22:37 infra-prometheus-n2 blackbox_exporter: 
ts=2021-03-02T16:52:37.982Z caller=main.go:169 module=icmp_prober 
target=x.x.x.x level=debug msg="Probe succeeded" 
duration_seconds=0.001848452


Here is the module configuration:
  icmp_prober:
   prober: icmp
   timeout: 30s
   icmp:
 preferred_ip_protocol: ip4


Please help. Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ef0c90f6-3799-4731-b6c9-89907a80835bn%40googlegroups.com.


[prometheus-users] Change in format of "startsAt" time.

2021-02-10 Thread yagyans...@gmail.com

Hi. I upgraded the Prometheus version from 2.12.0 to 2.22.2 some days back. 
One thing I noticed is that in my alerts, the startsAt was in IST(UTC+5:30) 
when I was using 2.12.0 but it has changed to UTC time now. I did not see 
anything in the changelogs regarding this in all the subsequent versions 
after 2.12.0. Am I missing something here?

Sample time stamps:
2.12.0 - 
"severity":"CRITICAL","source":"prometheus","startsAt":"2020-05-27T16:34:53.612166298+05:30","status":"firing"

2.22.2
"severity":"CRITICAL","source":"prometheus","startsAt":"2021-02-10T15:40:53.612166298Z","status":"firing"
 

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/77aa0945-d463-4e16-970e-b67462d9f21fn%40googlegroups.com.


Re: [prometheus-users] Sudden & Permanent increase in Memory consumption.

2021-02-08 Thread yagyans...@gmail.com
Also, I had a look at Go memory utilization. I see that Go memory 
utilization(go_memstats_alloc_bytes) is around 50% of the total memory used 
by Prometheus(process_resident_memory_bytes)



On Tuesday, February 9, 2021 at 2:39:42 AM UTC+5:30 yagyans...@gmail.com 
wrote:

> Thanks, Ben. I'll upgrade to a newer Prometheus version and check if the 
> issue still persists.
>
> But I still have one doubt here, I am running this Prometheus instance for 
> almost an year now, but I have noticed this memory increase recently only. 
> First time on 31st Jan and 2nd time on Feb 8. If it really is because of 
> compaction, why didn't he happen before?
>
> On Tue, 9 Feb, 2021, 1:33 am Ben Kochie,  wrote:
>
>> Prometheus performs compactions at regular intervals. This is likely what 
>> generated some IO.
>>
>> Note, if you're just looking at RSS, this is not going to tell the whole 
>> story. Depending on which version of Go you built with, it may not be fast 
>> at reclaiming RSS memory.
>>
>> Look at the go_memstats_alloc_bytes value to see what Go is really using.
>>
>> I would also recommend upgrading to the latest release. There have been a 
>> number of memory use related improvements in the year and a half since 
>> 2.12.0
>>
>> On Mon, Feb 8, 2021 at 8:04 PM yagyans...@gmail.com  
>> wrote:
>>
>>>
>>> Hi. I am using Prometheus version 2.12.0. I am running Alertmanager 
>>> 0.21.0 in cluster mode. Since, last 9 days, I have observed twice that the 
>>> memory consumption by Prometheus increased by 10-12% and it remained to the 
>>> increased value there after. Interesting thing to note here is that both 
>>> the times the increase came around 11:15/11:30am ish. I noticed that the 
>>> number of Read IOps on my Prometheus instance also increased at that time 
>>> to almost 4000. How to go about in finding the cause?
>>> I do not see any increase in number of time series for my Prometheus.
>>>
>>>
>>> Thanks in advance!
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to prometheus-use...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/prometheus-users/27b3be20-cd23-4bde-824c-566d5369517en%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/prometheus-users/27b3be20-cd23-4bde-824c-566d5369517en%40googlegroups.com?utm_medium=email_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b9bb9202-a89d-4961-9d26-e2259f416b25n%40googlegroups.com.


[prometheus-users] Sudden & Permanent increase in Memory consumption.

2021-02-08 Thread yagyans...@gmail.com

Hi. I am using Prometheus version 2.12.0. I am running Alertmanager 0.21.0 
in cluster mode. Since, last 9 days, I have observed twice that the memory 
consumption by Prometheus increased by 10-12% and it remained to the 
increased value there after. Interesting thing to note here is that both 
the times the increase came around 11:15/11:30am ish. I noticed that the 
number of Read IOps on my Prometheus instance also increased at that time 
to almost 4000. How to go about in finding the cause?
I do not see any increase in number of time series for my Prometheus.


Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/27b3be20-cd23-4bde-824c-566d5369517en%40googlegroups.com.


[prometheus-users] Setting custom threshold for Disk Storage Space.

2021-02-02 Thread yagyans...@gmail.com

Hi. I am using Prometheus version 2.22.1 and I am using custom thresholds 
for many of my alerts using the recording rules. But for Disk Space 
Utilization I am getting stuck where I need to set a custom threshold only 
for a single mount point of a particular IP. For eg. My default disk space 
utilization threshold is 90%. On the IP x.x.x.x, there are 2 mounts - vol1 
and vol2. I want to set the threshold for vol2 as 95%. To set the threshold 
to 95% for all the mount points of an IP is easy and can be done in the way 
I have done below but How do I approach setting the custom threshold for 
one of the mount point?

Sample of custom thresholding that I am doing for one of the other alerts - 
Number of Threads.
  - record: custom_critical
expr: (up{job=~"node.*",instance="x.x.x.x:9100"}) + 

Alert Expression:
((node_processes_threads) > on(instance) group_left() (custom_critical or 
on(instance) count by (instance)(node_processes_threads) * 0 + 4000))

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e0292daf-b4c6-42ea-b419-de767fa24572n%40googlegroups.com.


[prometheus-users] Re: Alertmanager's wierd behaviour - Firing -> Resolved -> Firing -> Resolved.

2021-01-22 Thread yagyans...@gmail.com
Can someone please help!

On Wednesday, January 20, 2021 at 1:14:58 AM UTC+5:30 yagyans...@gmail.com 
wrote:

>
> Hi. I am using Prometheus version 2.12.0 and a HA Alertmanager setup with 
> Alertmanager version 0.21.0. Problem is whenever I receive a resolved 
> notification, I receive an extra Firing + Resolved notification pair for 
> every alert.
>
> For example, if an Alert A1 is firing, when this alert resolves I'll 
> receive 3 notifications - Resolved, Firing, Resolved i.e for every resolved 
> alert I am receiving an extra pair of (Firing,Resolved) notification. And 
> these 3 notifications(Actual resolved + Extra Pair) arrive at exactly the 
> same time. 
>
> Attaching the snapshot for one such alert.
>
> Please help.
> Thanks in advance!
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f310a0e9-7e6b-4e54-9c6a-cacc35fa8af4n%40googlegroups.com.


[prometheus-users] Alertmanager's wierd behaviour - Firing -> Resolved -> Firing -> Resolved.

2021-01-19 Thread yagyans...@gmail.com

Hi. I am using Prometheus version 2.12.0 and a HA Alertmanager setup with 
Alertmanager version 0.21.0. Problem is whenever I receive a resolved 
notification, I receive an extra Firing + Resolved notification pair for 
every alert.

For example, if an Alert A1 is firing, when this alert resolves I'll 
receive 3 notifications - Resolved, Firing, Resolved i.e for every resolved 
alert I am receiving an extra pair of (Firing,Resolved) notification. And 
these 3 notifications(Actual resolved + Extra Pair) arrive at exactly the 
same time. 

Attaching the snapshot for one such alert.

Please help.
Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7c385817-294e-4f51-969d-cc07af3ed2c4n%40googlegroups.com.


[prometheus-users] Sudden increase in Total Series.

2021-01-07 Thread yagyans...@gmail.com

Hi. I am running Prometheus v 2.12.0. I had a total of around 5 million 
series and my memory consumption used to be around 85% constantly. 
Suddenly, an hour back the total number of series hiked to 10 million+ 
causing the Prometheus to go down again and again.
No targets were changed. Please help!

Attaching the snapshot for the total number of series.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9b203ce3-a39b-4f44-9a40-2a3ae7b5095cn%40googlegroups.com.


[prometheus-users] Setting only Alertname and Instance as Subject for Email Notification.

2020-12-22 Thread yagyans...@gmail.com
Hi. I am using Alertmanager v 0.21.0. I want to define a custom email 
template where in the subject of the email I don't get all the labels, but 
instead only want to print the alert name and the instance. Is it possible? 
I went through a bunch of documents but couldn't narrow down the labels in 
the subject.

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bba00a74-cf39-451a-8c19-81365ec97fd4n%40googlegroups.com.


Re: [prometheus-users] Setting up a fallback module in Blackbox.

2020-12-08 Thread yagyans...@gmail.com
Hi Julien,

One query.

Using this regex combination that you mentioned, the module is getting set 
to http_healthcheck itself for all the targets even if I specify a 
different module for some of the targets. 

On Friday, October 23, 2020 at 11:31:12 AM UTC+5:30 yagyans...@gmail.com 
wrote:

> Thanks, Julien.
>
> On Friday, October 23, 2020 at 1:38:18 AM UTC+5:30 Julien Pivotto wrote:
>
>> On 22 Oct 10:52, yagyans...@gmail.com wrote:
>> > Hi. I want to use a single job and a file for probing targets with 
>> > different modules. My job currently looks like this.
>> > 
>> > - job_name: 'blackbox_TestingAllInSameFile'
>> > metrics_path: /probe
>> > file_sd_configs:
>> > - files:
>> > - /root/test.yml
>> > relabel_configs:
>> > - source_labels: [__address__]
>> > target_label: __param_target
>> > - source_labels: [__param_target]
>> > target_label: instance
>> > - source_labels: [module]
>> > target_label: __param_module
>> > - target_label: __address__
>> > replacement: 172.20.10.99:9115
>> > scrape_interval: 10s
>> > 
>> > My target file looks like this:
>> > 
>> > - targets:
>> > - t1
>> > - t2
>> > labels:
>> > module: 'http_healthcheck'
>> > checkname: 'a'
>> > cluster: 'b'
>> > node: 'c'
>> > env: 'PROD'
>> > 
>> > My http_healthcheck module will be used for most of the targets. So, is 
>> > there any way to make it as a fallback module and not define module 
>> label 
>> > everytime for this particular module? I mean I want to define the 
>> module 
>> > label only when it is anything other than http_healthcheck.
>> > Is there any way we could achieve this using relabel_configs?
>>
>> Yes, you could do:
>>
>> - source_labels: [__param_module]
>> regex: '()'
>> replacement: 'http_healthcheck'
>> target_label: __param_module
>>
>> > 
>> > Thanks in advance!
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups "Prometheus Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to prometheus-use...@googlegroups.com.
>> > To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/990ec69c-5214-455c-ab10-c64a6a59ba46n%40googlegroups.com
>> .
>>
>>
>> -- 
>> Julien Pivotto
>> @roidelapluie
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e2e0786f-9557-4eb8-a572-935aac9b0083n%40googlegroups.com.


Re: [prometheus-users] NFS IO Stats.

2020-11-30 Thread yagyans...@gmail.com
Thanks a lot for the detailed explanation,



> Hi. I have enabled the mounstats, nfs and nfsd collectors for the NFS 
> > side metrics but there is a plethora of metrics without any proper 
> > documentation. Can somebody help with which metrics would give me 
> > the reads and writes completed on a particular mount? Like we have 
> > node_disk_reads_completed_total for SSD and HDDs, is there any exact 
> > alternative for NFS to get the IO stats? 
>
> If you care about NFS client statistics, there is no exact analog 
> because NFS mount metrics are fundamentally at a different level than 
> disk IO statistics. Disk IO statistics are obviously at the level of 
> data (blocks) transfered, regardless of why they are transfered and 
> how much of the data will be used. NFS client mount metrics are at the 
> level of filesystem operations, and filesystem operations can have 
> unpredictable disk IO impacts. All NFS operations send data to the NFS 
> server and get data back from it, but some only (potentially) create 
> read disk IO on the server while others cause write IO. 
>
> To put together NFS client IO statistics that are useful to you, you 
> will have to figure out what you care about in all of this. Some of this 
> will be workload dependent; for example, if your workload mostly reads 
> and writes a small number of big files, the filesystem level read and 
> write IO (which you can get numbers on) is highly predictive of disk IO 
> on the server and probably of performance. If your workload spends a lot 
> of time creating and deleting small files, looking only at the bytes 
> read and written from those files is probably missing a lot of server 
> disk IO, especially write IO. 
>
 >> Yes, I need these IOs exactly that are caused by the NFS operations. 
Can you give me an idea on which Prometheus metrics can give me this?
 

> As for what the metrics mean, that is hard to summarize well here. 
> The 'mountstat' collector is the most detailed and more or less what 
> it is collecting is written up in: 
>
> https://utcc.utoronto.ca/~cks/space/blog/linux/NFSMountstatsIndex 
>
> >> This is quite useful. Thanks!

This is from 2013 and for NFS v3, not NFS v4, but I believe that not 
> much has changed in this area of Linux since then and NFS v4 is pretty 
> similar to NFS v3. 
>
> (Locally I have a program that significantly aggregates this information 
> on a per-mount basis, because we have a *lot* of NFS mounts and the raw 
> mountstats data for them all adds up to too many metrics for us.) 
>
> - cks 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/543f5ad6-5586-4e93-a0db-0ffc1f0e3108n%40googlegroups.com.


[prometheus-users] NFS IO Stats.

2020-11-26 Thread yagyans...@gmail.com

Hi. I have enabled the mounstats, nfs and nfsd collectors for the NFS side 
metrics but there is a plethora of metrics without any proper 
documentation. Can somebody help with which metrics would give me the reads 
and writes completed on a particular mount? Like we have 
node_disk_reads_completed_total for SSD and HDDs, is there any exact 
alternative for NFS to get the IO stats?


Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/def33940-fb3a-4c14-a560-dd3ceebea6fan%40googlegroups.com.


Re: [prometheus-users] Alert goes to Firing --> Resolved --> Firing immediately.

2020-11-25 Thread yagyans...@gmail.com
The alert formation doesn't seem to be a problem here, because it happens 
for different alerts randomly. Below is the alert for Exporter being down 
for which it has happened thrice today.

  - alert: ExporterDown
expr: up == 0
for: 10m
labels:
  severity: "CRITICAL"
annotations:
  summary: "Exporter down on *{{ $labels.instance }}*"
  description: "Not able to fetch application metrics from *{{ 
$labels.instance }}*"

- the ALERTS metric shows what is pending or firing over time
>> But the problem is that one of my ExporterDown alerts is active since 
the past 10 days, there is no genuine reason for the alert to go to a 
resolved state. 

- evaluate the alert expression in Prometheus for the given time period. 
Are there gaps or does e.g. a label change between before and after the gap?
>> No gaps in the Prometheus GUI Console for the time period. The value for 
UP is zero constantly for last 6 hours but still the alert when to resolved 
state during that time and went to firing again.


On Wednesday, November 25, 2020 at 5:03:42 PM UTC+5:30 
matt...@prometheus.io wrote:

> This could be many things, likely it has to do with the formulation of the 
> alert. What does it look like in Prometheus? Specifically
>
> - the ALERTS metric shows what is pending or firing over time
> - evaluate the alert expression in Prometheus for the given time period. 
> Are there gaps or does e.g. a label change between before and after the gap?
>
> /MR
>
> On Wed, Nov 25, 2020 at 11:01 AM yagyans...@gmail.com <
> yagyans...@gmail.com> wrote:
>
>>
>> Hi. I am using Alertmanager 0.21.0. Occasionally, the active alerts go to 
>> resolved state for a second and then come back to firing state immediately. 
>> There is no pattern of this happening, it happens randomly. Haven't been 
>> able to identify why this is happening.
>> Any thoughts here? Where should I start to look for this? Checked 
>> Alertmanager's logs, everything seems normal.
>>
>> Thanks in advance!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/5dc917e0-d35f-4666-b61b-36afa7851d15n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/5dc917e0-d35f-4666-b61b-36afa7851d15n%40googlegroups.com?utm_medium=email_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d6e8ccd5-2042-4320-be9d-7bf835f98c33n%40googlegroups.com.


[prometheus-users] Alert goes to Firing --> Resolved --> Firing immediately.

2020-11-25 Thread yagyans...@gmail.com

Hi. I am using Alertmanager 0.21.0. Occasionally, the active alerts go to 
resolved state for a second and then come back to firing state immediately. 
There is no pattern of this happening, it happens randomly. Haven't been 
able to identify why this is happening.
Any thoughts here? Where should I start to look for this? Checked 
Alertmanager's logs, everything seems normal.

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/5dc917e0-d35f-4666-b61b-36afa7851d15n%40googlegroups.com.


Re: [prometheus-users] Debugging OOM issue.

2020-11-24 Thread yagyans...@gmail.com
Thanks, Christian.

Today I noticed something that is totally new to me. Prometheus went down 
and I got the query because of which it went down but strangely at that 
time I checked the server did not go OOM, the Memory dropped directly from 
constant usage of 77% to zero, but usually when a Query takes a long time 
the Memory usage spikes up which causes the Prometheus to crash because of 
OOM. This time there was no sudden spike in either CPU or Memory 
Utilization.

Any thoughts on this?

On Monday, November 9, 2020 at 5:31:18 PM UTC+5:30 Christian Hoffmann wrote:

> Hi,
>
> On 11/9/20 10:56 AM, yagyans...@gmail.com wrote:
> > Hi. I am using Promtheus v 2.20.1 and suddenly my Prometheus crashed
> > because of Memory overshoot. How to pinpoint what caused the Prometheus
> > to go OOM or which query caused the Prometheus go OOM?
>
> Prometheus writes the currently active queries to a file which is read
> upon restart. Prometheus will print all unfinished queries, see here:
>
>
> https://www.robustperception.io/what-queries-were-running-when-prometheus-died
>
> This should help pin-pointing the relevant queries.
>
> Often it's some combination of querying long timestamps and/or high
> cardinality metrics.
>
> Kind regards,
> Christian
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1bfe152b-bf4a-4c33-85a0-9ad9637a241fn%40googlegroups.com.


[prometheus-users] Debugging OOM issue.

2020-11-09 Thread yagyans...@gmail.com

Hi. I am using Promtheus v 2.20.1 and suddenly my Prometheus crashed 
because of Memory overshoot. How to pinpoint what caused the Prometheus to 
go OOM or which query caused the Prometheus go OOM?

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/67c2c980-60a2-41b7-8ba1-c2b31a9b8171n%40googlegroups.com.


Re: [prometheus-users] Blaxkbox Panic Error.

2020-11-07 Thread yagyans...@gmail.com
Okay, thanks a lot, Brian.

Can you please have a look at 
https://groups.google.com/u/1/g/prometheus-users/c/Cb7lUaqWnbc too? 

On Saturday, November 7, 2020 at 3:21:55 PM UTC+5:30 Brian Brazil wrote:

> On Sat, 7 Nov 2020 at 04:46, yagyans...@gmail.com  
> wrote:
>
>>
>> Hi. I am running Blackbox Exporter v 0.18.0 and quite a few time I am 
>> observing error like below.
>>
>> Nov  7 05:03:11 dh4-k1-infra-prometheus-n2 blackbox_exporter: 2020/11/07 
>> 05:03:11 http: panic serving 172.20.10.98:44106: runtime error: invalid 
>> memory address or nil pointer dereference
>>
>> This is happening quite frequently on intervals. As of now, I haven't 
>> seen any kind of major impact due to this. But wanted to understand why is 
>> this happening and is there something that can be done to mitigate this?
>>
>
> If you've any Prometheus binary panicking, please file a bug in the 
> relevant repo including the full stack trace.
>
> Brian
>  
>
>>
>> Thanks in advance!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/dd15f490-a645--b736-a352a235bf1an%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/dd15f490-a645--b736-a352a235bf1an%40googlegroups.com?utm_medium=email_source=footer>
>> .
>>
>
>
> -- 
> Brian Brazil
> www.robustperception.io
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/5cb92c21-b939-4874-98b3-cbef0666aedan%40googlegroups.com.


[prometheus-users] Re: Discrepancy in Alert Rule Evaluation.

2020-11-07 Thread yagyans...@gmail.com
Hi Brian,

My Blackbox exporter is already running with Debug Log Mode and still, I 
don't see and probe failed logs for that period.
Also, I have ran the query for some of the instances that I saw in PENDING 
state, but I do not see any failures there also, probe_success is 1 for 
them constantly without any variation in between. 

On Saturday, November 7, 2020 at 2:12:36 PM UTC+5:30 b.ca...@pobox.com 
wrote:

> Go into the Prometheus query browser (front page in the web interface, 
> normally port 9090), and enter the query:
>
> probe_success{job=~"Ping-All-Servers"}
>
> and switch to graph mode.  Is the line going up and down?  Then probes are 
> failing.
>
> If you want to see logs of these failures, then on the blackbox_exporter 
> you'll need to add --log.level=debug to its command line args.
>
> Alternatively, if you are testing with curl, you can add "=true" to 
> the URL.  e.g.
>
> curl -g 'localhost:9115/probe?module=foo=bar=true'
>
> Do this repeatedly until you see a failure, and the failure logs will be 
> included in the HTTP response.
>
> Note that the blackbox exporter by default sets a deadline of 0.5 seconds 
> less than the scrape interval.  So if you have a very short scrape interval 
> (say 1s) then each probe only has 0.5s to complete.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/61caddbf-60f9-43d2-9c19-06d5d4713830n%40googlegroups.com.


[prometheus-users] Re: Blaxkbox Panic Error.

2020-11-07 Thread yagyans...@gmail.com
Hi Brain,

I am running under linux-amd64 and using pre-built binary.
[myuser@infra-prometheus ~]# /usr/local/bin/blackbox_exporter --version
blackbox_exporter, version 0.18.0 (branch: HEAD, revision: 
60c86e6ce5af7958b06ae7a08222bb6ec839)
  build user:   root@53d72328d93f
  build date:   20201012-09:46:31
  go version:   go1.15.2

So far I have seen this error with http modules only.
Below is the configuration for 2 of my most used modules.

  http_healthcheck:
prober: http
timeout: 10s
http:
  valid_http_versions: ["HTTP/1.1", "HTTP/2.0", "HTTP/1.0"]
  method: GET
  no_follow_redirects: false
  fail_if_ssl: false
  fail_if_not_ssl: false
  tls_config:
insecure_skip_verify: true
  preferred_ip_protocol: "ip4"

  http_healthcheck_es:
prober: http
timeout: 10s
http:
  valid_http_versions: ["HTTP/1.1", "HTTP/2.0", "HTTP/1.0"]
  fail_if_body_not_matches_regexp: ['green']
  method: GET
  no_follow_redirects: false
  fail_if_ssl: false
  fail_if_not_ssl: false
  tls_config:
insecure_skip_verify: true
  preferred_ip_protocol: "ip4"



On Saturday, November 7, 2020 at 1:56:43 PM UTC+5:30 b.ca...@pobox.com 
wrote:

> I've not seen this.
>
> What server platform are you running under? (e.g. is it linux-amd64?)  Are 
> you using the release binaries, or did you build it yourself from source? 
> (show output of blackbox_exporter --version)
>
> Is it only http modules that give this error? Can you show the blackbox 
> module config? Are they using TLS?
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/05ad068f-6589-4e1f-8037-a2bb541e2cdcn%40googlegroups.com.


[prometheus-users] Discrepancy in Alert Rule Evaluation.

2020-11-07 Thread yagyans...@gmail.com

Hi. I am using Blackbox Exporter v 0.18.0 for generating Host Down Alerts. 
Below is the configured rule.
  - alert: HostDown
expr: probe_success{job=~"Ping-All-Servers"} == 0
for: 1m
labels:
  severity: "CRITICAL"
annotations:
  summary: "Server is Down - *{{ $labels.instance }}*"
  identifier: "*Cluster:* `{{ $labels.cluster }}`, *node:* `{{ 
$labels.node }}` "

Now, when I am checking my Prometheus' alert page 
http://x.x.x.x:9090/alerts, I see 7-8 HostDown in PENDING state everytime, 
and when at the same time I check my Blackbox Exporter's debug log, I don't 
see any Probe Failed for my ICMP module for those instances. 
I am missing something here?

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6fd2a56c-708a-460d-84fe-2e44f42abfaan%40googlegroups.com.


[prometheus-users] Re: Discrepancy in Alert Rule Evaluation.

2020-11-07 Thread yagyans...@gmail.com
Prometheus Version - 2.20.1

On Saturday, November 7, 2020 at 1:46:31 PM UTC+5:30 yagyans...@gmail.com 
wrote:

>
> Hi. I am using Blackbox Exporter v 0.18.0 for generating Host Down Alerts. 
> Below is the configured rule.
>   - alert: HostDown
> expr: probe_success{job=~"Ping-All-Servers"} == 0
> for: 1m
> labels:
>   severity: "CRITICAL"
> annotations:
>   summary: "Server is Down - *{{ $labels.instance }}*"
>   identifier: "*Cluster:* `{{ $labels.cluster }}`, *node:* `{{ 
> $labels.node }}` "
>
> Now, when I am checking my Prometheus' alert page 
> http://x.x.x.x:9090/alerts, I see 7-8 HostDown in PENDING state 
> everytime, and when at the same time I check my Blackbox Exporter's debug 
> log, I don't see any Probe Failed for my ICMP module for those instances. 
> I am missing something here?
>
> Thanks in advance!
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/88c42a27-dfd4-4692-b9fe-58b3acf6d183n%40googlegroups.com.


[prometheus-users] Blaxkbox Panic Error.

2020-11-06 Thread yagyans...@gmail.com

Hi. I am running Blackbox Exporter v 0.18.0 and quite a few time I am 
observing error like below.

Nov  7 05:03:11 dh4-k1-infra-prometheus-n2 blackbox_exporter: 2020/11/07 
05:03:11 http: panic serving 172.20.10.98:44106: runtime error: invalid 
memory address or nil pointer dereference

This is happening quite frequently on intervals. As of now, I haven't seen 
any kind of major impact due to this. But wanted to understand why is this 
happening and is there something that can be done to mitigate this?

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/dd15f490-a645--b736-a352a235bf1an%40googlegroups.com.


[prometheus-users] Alert Duplication in HA Prometheus & Alertmanager setup.

2020-11-06 Thread yagyans...@gmail.com
Hi. I have a HA Prometheus setup, with 2 instances(x.x.x.x and y.y.y.y) 
scraping exactly the same targets. On the respective machines, Alertmanager 
is also running and a mesh is created. But I am observing that all the 
alerts are getting duplicated and I am receiving every alert twice. 
Alertmanager Version - 0.21.0.
/usr/local/bin/alertmanager --config.file 
/etc/alertmanager/alertmanager.yml --storage.path /mnt/vol2/alertmanager 
--data.retention=120h --log.level=debug --web.listen-address=x.x.x.x:9093 
--cluster.listen-address=x.x.x.x:9094 --cluster.peer=y.y.y.y:9094

Oh, one thing that just popped into my head, for the temporary testing 
period I am running different versions of Prometheus in the instances. 
2.12.0 in one and 2.20.1 in the other one. Could this also cause this?

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/87418282-5673-4e69-b020-f9db1a57a4dbn%40googlegroups.com.


Re: [prometheus-users] Setting up a fallback module in Blackbox.

2020-10-23 Thread yagyans...@gmail.com
Thanks, Julien.

On Friday, October 23, 2020 at 1:38:18 AM UTC+5:30 Julien Pivotto wrote:

> On 22 Oct 10:52, yagyans...@gmail.com wrote:
> > Hi. I want to use a single job and a file for probing targets with 
> > different modules. My job currently looks like this.
> > 
> > - job_name: 'blackbox_TestingAllInSameFile'
> > metrics_path: /probe
> > file_sd_configs:
> > - files:
> > - /root/test.yml
> > relabel_configs:
> > - source_labels: [__address__]
> > target_label: __param_target
> > - source_labels: [__param_target]
> > target_label: instance
> > - source_labels: [module]
> > target_label: __param_module
> > - target_label: __address__
> > replacement: 172.20.10.99:9115
> > scrape_interval: 10s
> > 
> > My target file looks like this:
> > 
> > - targets:
> > - t1
> > - t2
> > labels:
> > module: 'http_healthcheck'
> > checkname: 'a'
> > cluster: 'b'
> > node: 'c'
> > env: 'PROD'
> > 
> > My http_healthcheck module will be used for most of the targets. So, is 
> > there any way to make it as a fallback module and not define module 
> label 
> > everytime for this particular module? I mean I want to define the module 
> > label only when it is anything other than http_healthcheck.
> > Is there any way we could achieve this using relabel_configs?
>
> Yes, you could do:
>
> - source_labels: [__param_module]
> regex: '()'
> replacement: 'http_healthcheck'
> target_label: __param_module
>
> > 
> > Thanks in advance!
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Prometheus Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to prometheus-use...@googlegroups.com.
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/990ec69c-5214-455c-ab10-c64a6a59ba46n%40googlegroups.com
> .
>
>
> -- 
> Julien Pivotto
> @roidelapluie
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/09baa295-caf0-42cd-aed9-54330a48ad3fn%40googlegroups.com.


[prometheus-users] Setting up a fallback module in Blackbox.

2020-10-22 Thread yagyans...@gmail.com
Hi. I want to use a single job and a file for probing targets with 
different modules. My job currently looks like this.

  - job_name: 'blackbox_TestingAllInSameFile'
metrics_path: /probe
file_sd_configs:
  - files:
- /root/test.yml
relabel_configs:
  - source_labels: [__address__]
target_label: __param_target
  - source_labels: [__param_target]
target_label: instance
  - source_labels: [module]
target_label: __param_module
  - target_label: __address__
replacement: 172.20.10.99:9115
scrape_interval: 10s

My target file looks like this:

- targets:
 - t1
 - t2
  labels:
 module: 'http_healthcheck'
 checkname: 'a'
 cluster: 'b'
 node: 'c'
 env: 'PROD'

My http_healthcheck module will be used for most of the targets. So, is 
there any way to make it as a fallback module and not define module label 
everytime for this particular module? I mean I want to define the module 
label only when it is anything other than http_healthcheck.
Is there any way we could achieve this using relabel_configs?

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/990ec69c-5214-455c-ab10-c64a6a59ba46n%40googlegroups.com.


[prometheus-users] Re: Get the list of a label based on value of another label.

2020-10-16 Thread yagyans...@gmail.com
Wow, Thanks a lot.
Worked like a charm.

On Friday, October 16, 2020 at 1:34:04 PM UTC+5:30 b.ca...@pobox.com wrote:

> This is really a Grafana question rather than a Prometheus question.
>
> Grafana lets you extract the value of a label from an arbitrary PromQL 
> expression result using a regex, e.g.
>
>
>
> The regex is matching against the entire metric line, i.e.
>
> metricname{label1="a",label2="b"} value timestamp
>
> To break the regex down:
>
> .*  match any zero or more characters at start
> ifName="match this text literally (ifName= and opening quote)
> (   start capture group
> [^"]+   match one or more characters which are not double quote
> )   end capture group
> "   match literal closing quote
> .*  match any zero or more characters at end
>
>
> Change this to use node instead of ifName.
>
> Then you should be able to write your "or" condition in the PromQL part, 
> e.g.
>
> query_result(foo{cluster="A"} or foo{cluster="B",node=~"MDM-.*"})
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e4320bdb-06ae-43f4-9214-8fb6188bc367n%40googlegroups.com.


[prometheus-users] Get the list of a label based on value of another label.

2020-10-15 Thread yagyans...@gmail.com
Hi. For every Node Exporter job I have 3 labels - cluster, env and node. 
Now, I want to extract the label values for the label "node", based on 
certain conditions.
Value of "node" label for which the "cluster" variable is A + Value of 
"node" label that start with the string "MDM-" and have "cluster" variable 
as B.

Is this possible to do?

I want to create these variables in Grafana and I am using label_values to 
extract the value of label, but I am not able to the above filtering. 

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a85dd5c2-4554-4f00-b763-8331a55a7b19n%40googlegroups.com.


[prometheus-users] Re: Keeping Config Files in sync for HA Prometheus.

2020-10-14 Thread yagyans...@gmail.com
Thanks.

Can you explain a little more on 1st? How will adding new targets or adding 
a new job completely be done via ansible? Is there any existing project on 
this?

On Thursday, October 15, 2020 at 1:58:53 AM UTC+5:30 sayf.eddi...@gmail.com 
wrote:

> Hello
> 1- you can use a CMS tool like ansible to salt to manage prometheus config 
> and update it in //
> 2- alertmanager will take care of deduplicating the alert
>
> On Wednesday, October 14, 2020 at 9:32:20 PM UTC+2 yagyans...@gmail.com 
> wrote:
>
>> Hi. I am moving from vanilla Prometheus setup to HA Prometheus setup with 
>> 2 instances scraping the same metrics. I'll be using either Thanos or 
>> VictoriaMetrics on top of that.
>> I have a couple of doubts:
>>
>> 1. How to manage the config files and keep all the config files in sync 
>> in both the Prometheus instance? Will I have to manually update the config 
>> files in both the instances every time I change something?
>>
>> 1. What happens to alerting? Will both the instances have the same 
>> alerting rules and keep communicating with the Alertmanager and 
>> de-duplication of alerts will be taken care by Alertmanager itself? Or 
>> something else has to be done?
>>
>> Thanks in advance. Sorry if the queries seem repeated. Didn't find the 
>> answer for these yet.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/360a136a-ffb9-48ec-bbee-a2cf02b7d2b7n%40googlegroups.com.


[prometheus-users] Keeping Config Files in sync for HA Prometheus.

2020-10-14 Thread yagyans...@gmail.com
Hi. I am moving from vanilla Prometheus setup to HA Prometheus setup with 2 
instances scraping the same metrics. I'll be using either Thanos or 
VictoriaMetrics on top of that.
I have a couple of doubts:

1. How to manage the config files and keep all the config files in sync in 
both the Prometheus instance? Will I have to manually update the config 
files in both the instances every time I change something?

1. What happens to alerting? Will both the instances have the same alerting 
rules and keep communicating with the Alertmanager and de-duplication of 
alerts will be taken care by Alertmanager itself? Or something else has to 
be done?

Thanks in advance. Sorry if the queries seem repeated. Didn't find the 
answer for these yet.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0c5804f0-da5a-41ce-9866-d6074ac38554n%40googlegroups.com.


Re: [prometheus-users] Prometheus HA Setup.

2020-10-13 Thread yagyans...@gmail.com
Thanks, Ben.

What happens to Alerting in case of HA Prometheus while using 
Thanos/VictoriaMetrics/Cortex on top of 2 Prometheus instances?

On Monday, October 12, 2020 at 12:57:17 AM UTC+5:30 sup...@gmail.com wrote:

> I don't think this is something you can or should be optimizing for. You 
> are on the edge of needing to shard, which means you will need to manage 
> many individual instance disks.
>
> But if you really want to have a single big disk for storage, you can use 
> Minio[0] for simple object storage if you aren't already on a platform that 
> provides object storage.
>
> [0]: https://min.io/
>
> On Sun, Oct 11, 2020, 18:29 yagyans...@gmail.com  
> wrote:
>
>> Thanks, Ben.
>>
>> One thing. I don't want to maintain 2 5TB disks for Prometheus HA, i.e 1 
>> 5TB disk on each instance, that is why I want to put a single huge disk in 
>> my VictoriaMetrics instance and maintain a single persistent disk rather 
>> than 2. Can Thanos also store the data in a persistent disk rather than 
>> Object disk? Because from the docs I have seen till now, I haven't found 
>> this feature in Thanos yet. This is the sole reason I am inclined towards 
>> VictoriaMetrics.
>>
>> On Sunday, October 11, 2020 at 6:46:56 PM UTC+5:30 sup...@gmail.com 
>> wrote:
>>
>>> On Sun, Oct 11, 2020 at 10:45 AM yagyans...@gmail.com <
>>> yagyans...@gmail.com> wrote:
>>>
>>>> On Saturday, October 10, 2020 at 2:32:14 PM UTC+5:30 sup...@gmail.com 
>>>> wrote:
>>>> 4.6TB for 50 days seems like a lot. How many metrics and how many 
>>>> samples per second are you collecting? Just estimating based on the data, 
>>>> it sounds like you might have more than 10 million series and 600-700 
>>>> samples per second. This might be the time to start thinking about 
>>>> sharding.
>>>> You can check for sure with these queries:
>>>> prometheus_tsdb_head_series
>>>> rate(prometheus_tsdb_head_samples_appended_total[1h]
>>>> >>>>
>>>> Hi Ben, my time series collection hasn't touched 10 million yet, its 
>>>> around 5.5 million as of now, but my sampling rate is quite steep, sitting 
>>>> at approximately 643522. Since my time series are quite manageable by a 
>>>> single Prometheus instance I am avoiding sharding as of now because it 
>>>> would complicate the entire setup. What is your thought on this? 
>>>>
>>>
>>> I usually recommend thinking about a sharding plan when you hit this 
>>> level. You don't need to yet, but it's worth thinking about how you would.
>>>  
>>>
>>>>
>>>> For handling HA clustering and sharding, I recommend looking into 
>>>> Thanos. It can be added to your existing Prometheus and rolled out 
>>>> incrementally.
>>>> >>>>
>>>> Yes, I looked at Thanos but my only problem is that Thanos will use 
>>>> Object Storage for long time retention which will have latency while 
>>>> extracting old data. That is why I am inclined towards VictoriaMetrics. 
>>>> What's your view on going with VictoriaMetrics?
>>>>
>>>
>>> You don't need to use Thanos for long-term storage. It works just fine 
>>> as a query-proxy only setup.  This is how we got into using Thanos. We had 
>>> an existing sharded fleet of Prometheus HA instances. We had been using 
>>> multiple Grafana data sources and simple nginx reverse proxy for HA 
>>> querying. We added Thanos Query/Sidecar just to provide a single query 
>>> interface. It wasn't until some time later that we started to use object 
>>> storage.
>>>
>>> Thanos object storage is optional, it can use Prometheus TSDB as the 
>>> backend.
>>>
>>> That said, Thanos object storage latency isn't a huge problem. It does 
>>> depend a bit on what object storage provider/software you use. But it works 
>>> just fine.
>>>
>>> I don't recommend VictoriaMetrics. I would go with Thanos or Cortex, as 
>>> these are maintained by core Prometheus community contributors.
>>>  
>>>
>>>>
>>>> > d) d) If we do use 2 separate disks for the 2 instances, how will we 
>>>> manage the config files?
>>>> If you don't have any configuration management, I recommend using 
>>>> https://github.com/cloudalchemy/ansible-prometheus. It's very easy to 
>>>> get going.
>>>> >>>>
>>>> Thanks. I'll check i

Re: [prometheus-users] Prometheus HA Setup.

2020-10-11 Thread yagyans...@gmail.com
Thanks, Ben.

One thing. I don't want to maintain 2 5TB disks for Prometheus HA, i.e 1 
5TB disk on each instance, that is why I want to put a single huge disk in 
my VictoriaMetrics instance and maintain a single persistent disk rather 
than 2. Can Thanos also store the data in a persistent disk rather than 
Object disk? Because from the docs I have seen till now, I haven't found 
this feature in Thanos yet. This is the sole reason I am inclined towards 
VictoriaMetrics.

On Sunday, October 11, 2020 at 6:46:56 PM UTC+5:30 sup...@gmail.com wrote:

> On Sun, Oct 11, 2020 at 10:45 AM yagyans...@gmail.com <
> yagyans...@gmail.com> wrote:
>
>> On Saturday, October 10, 2020 at 2:32:14 PM UTC+5:30 sup...@gmail.com 
>> wrote:
>> 4.6TB for 50 days seems like a lot. How many metrics and how many samples 
>> per second are you collecting? Just estimating based on the data, it sounds 
>> like you might have more than 10 million series and 600-700 samples per 
>> second. This might be the time to start thinking about sharding.
>> You can check for sure with these queries:
>> prometheus_tsdb_head_series
>> rate(prometheus_tsdb_head_samples_appended_total[1h]
>> >>>>
>> Hi Ben, my time series collection hasn't touched 10 million yet, its 
>> around 5.5 million as of now, but my sampling rate is quite steep, sitting 
>> at approximately 643522. Since my time series are quite manageable by a 
>> single Prometheus instance I am avoiding sharding as of now because it 
>> would complicate the entire setup. What is your thought on this? 
>>
>
> I usually recommend thinking about a sharding plan when you hit this 
> level. You don't need to yet, but it's worth thinking about how you would.
>  
>
>>
>> For handling HA clustering and sharding, I recommend looking into Thanos. 
>> It can be added to your existing Prometheus and rolled out incrementally.
>> >>>>
>> Yes, I looked at Thanos but my only problem is that Thanos will use 
>> Object Storage for long time retention which will have latency while 
>> extracting old data. That is why I am inclined towards VictoriaMetrics. 
>> What's your view on going with VictoriaMetrics?
>>
>
> You don't need to use Thanos for long-term storage. It works just fine as 
> a query-proxy only setup.  This is how we got into using Thanos. We had an 
> existing sharded fleet of Prometheus HA instances. We had been using 
> multiple Grafana data sources and simple nginx reverse proxy for HA 
> querying. We added Thanos Query/Sidecar just to provide a single query 
> interface. It wasn't until some time later that we started to use object 
> storage.
>
> Thanos object storage is optional, it can use Prometheus TSDB as the 
> backend.
>
> That said, Thanos object storage latency isn't a huge problem. It does 
> depend a bit on what object storage provider/software you use. But it works 
> just fine.
>
> I don't recommend VictoriaMetrics. I would go with Thanos or Cortex, as 
> these are maintained by core Prometheus community contributors.
>  
>
>>
>> > d) d) If we do use 2 separate disks for the 2 instances, how will we 
>> manage the config files?
>> If you don't have any configuration management, I recommend using 
>> https://github.com/cloudalchemy/ansible-prometheus. It's very easy to 
>> get going.
>> >>>>
>> Thanks. I'll check it out.
>>
>> -- 
>>
> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/6cfb736c-ba38-4e8d-8468-cdc84f2971f2n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/6cfb736c-ba38-4e8d-8468-cdc84f2971f2n%40googlegroups.com?utm_medium=email_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/cb4928de-bb1f-4769-828b-6d9e10a3f532n%40googlegroups.com.


Re: [prometheus-users] Prometheus HA Setup.

2020-10-11 Thread yagyans...@gmail.com
On Saturday, October 10, 2020 at 2:32:14 PM UTC+5:30 sup...@gmail.com wrote:
4.6TB for 50 days seems like a lot. How many metrics and how many samples 
per second are you collecting? Just estimating based on the data, it sounds 
like you might have more than 10 million series and 600-700 samples per 
second. This might be the time to start thinking about sharding.
You can check for sure with these queries:
prometheus_tsdb_head_series
rate(prometheus_tsdb_head_samples_appended_total[1h]

Hi Ben, my time series collection hasn't touched 10 million yet, its around 
5.5 million as of now, but my sampling rate is quite steep, sitting at 
approximately 643522. Since my time series are quite manageable by a single 
Prometheus instance I am avoiding sharding as of now because it would 
complicate the entire setup. What is your thought on this? 

For handling HA clustering and sharding, I recommend looking into Thanos. 
It can be added to your existing Prometheus and rolled out incrementally.

Yes, I looked at Thanos but my only problem is that Thanos will use Object 
Storage for long time retention which will have latency while extracting 
old data. That is why I am inclined towards VictoriaMetrics. What's your 
view on going with VictoriaMetrics?

> d) d) If we do use 2 separate disks for the 2 instances, how will we 
manage the config files?
If you don't have any configuration management, I recommend using 
https://github.com/cloudalchemy/ansible-prometheus. It's very easy to get 
going.

Thanks. I'll check it out.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6cfb736c-ba38-4e8d-8468-cdc84f2971f2n%40googlegroups.com.


Re: [prometheus-users] Re: Prometheus HA Setup.

2020-10-11 Thread yagyans...@gmail.com
Thanks a lot, Stuart.

What is your opinion on using VictoriaMetrics over HA Prometheus setup for 
long-term storage? This would allow me to reduce the retention period on 
the individual Prometheus instance and put the majority of the data on 
VictoriaMetrics' head. This way I'll have to manage only a single huge 
persistent storage disk.

On Saturday, October 10, 2020 at 1:24:50 PM UTC+5:30 Stuart Clark wrote:

> On 10/10/2020 08:43, yagyans...@gmail.com wrote:
>
> d) If we do use 2 separate disks for the 2 instances, how will we manage 
> the config files? I mean is there any way to make the changes on any one 
> instance and those get replicated to other instance automatically or will 
> we have to do that manually?
>
> On Saturday, October 10, 2020 at 12:36:25 PM UTC+5:30 yagyans...@gmail.com 
> wrote:
>
>> Hi. I have a vanilla Prometheus setup with 50 days retention and data 
>> size of around 4.6TB for this much retention. I want to move to HA set up 
>> to avoid a single point of failure. 
>> I'm a little confused on how to approach the below points:
>>
>> a) With a HA pair, does the Prometheus data necessarily be local to both 
>> the Prometheus instances? Because it would require me to provision 2 5TB 
>> disks, one for each instance.
>> Is it a good idea to have these 2 Prometheus instances write to an NFS 
>> disk?
>>
>> b) With both the HA pairs scraping the same targets, how do I build a 
>> global view of these local Prometheus instances? Is federation preferable 
>> or is there any other better way to approach this?
>>
>> c) Since, 2 instances will be scraping the same targets, does it add any 
>> overhead to the target side?
>>
>> Thanks in advance.
>>
> a) Yes you would have two disks. NFS is not recommended for a number of 
> reasons including performance. Additionally it would create a single point 
> of failure which could break both machines at once. Additionally with NFS 
> it is easy to accidentally try to have both instances pointing to the same 
> storage location which would cause data corruption.
>
> b) There are a number of solutions, but one option with be to run promxy 
> in front of Prometheus (so things like Grafana would query promxy) which 
> will query both servers to create a single view, removing any gaps.
>
> c) Exporters and client libraries are designed to be low impact when used 
> correctly, so additional scraping should have minimal impact.
>
> d) Control of you config files is down to whatever container/configuration 
> management solution you use. Generally the only difference between the 
> servers might be external label settings.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2c04fa81-30f7-4c0b-8ec2-828048cb095bn%40googlegroups.com.


[prometheus-users] Re: Prometheus HA Setup.

2020-10-10 Thread yagyans...@gmail.com
d) If we do use 2 separate disks for the 2 instances, how will we manage 
the config files? I mean is there any way to make the changes on any one 
instance and those get replicated to other instance automatically or will 
we have to do that manually?

On Saturday, October 10, 2020 at 12:36:25 PM UTC+5:30 yagyans...@gmail.com 
wrote:

> Hi. I have a vanilla Prometheus setup with 50 days retention and data size 
> of around 4.6TB for this much retention. I want to move to HA set up to 
> avoid a single point of failure.
> I'm a little confused on how to approach the below points:
>
> a) With a HA pair, does the Prometheus data necessarily be local to both 
> the Prometheus instances? Because it would require me to provision 2 5TB 
> disks, one for each instance.
> Is it a good idea to have these 2 Prometheus instances write to an NFS 
> disk?
>
> b) With both the HA pairs scraping the same targets, how do I build a 
> global view of these local Prometheus instances? Is federation preferable 
> or is there any other better way to approach this?
>
> c) Since, 2 instances will be scraping the same targets, does it add any 
> overhead to the target side?
>
> Thanks in advance.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9bde724c-5f65-4006-8736-1ed56ab4cdf4n%40googlegroups.com.


[prometheus-users] Prometheus HA Setup.

2020-10-10 Thread yagyans...@gmail.com
Hi. I have a vanilla Prometheus setup with 50 days retention and data size 
of around 4.6TB for this much retention. I want to move to HA set up to 
avoid a single point of failure.
I'm a little confused on how to approach the below points:

a) With a HA pair, does the Prometheus data necessarily be local to both 
the Prometheus instances? Because it would require me to provision 2 5TB 
disks, one for each instance.
Is it a good idea to have these 2 Prometheus instances write to an NFS disk?

b) With both the HA pairs scraping the same targets, how do I build a 
global view of these local Prometheus instances? Is federation preferable 
or is there any other better way to approach this?

c) Since, 2 instances will be scraping the same targets, does it add any 
overhead to the target side?

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ad374ee9-c5fa-41a5-bb02-24d062366ffan%40googlegroups.com.


[prometheus-users] Re: Scrape the status of VMWare Tools.

2020-09-10 Thread yagyans...@gmail.com
Thanks a lot, Brain.
nrpe_exporter looks interesting. Will explore the same.

On Thursday, September 10, 2020 at 12:37:00 PM UTC+5:30 b.ca...@pobox.com 
wrote:

> exporter_exporter  
> has the ability to run scripts in response to scrapes.  You'd have to 
> install it on every node where you want to run scripts.
>
> nrpe_exporter  talks 
> to nrped so you can scrape actual nagios plugins, but you need to build 
> from master branch if you want SSL as there has been no tagged release for 
> ages - plus you need to build against an old version of OpenSSL 
> .  You'd 
> only need to install one central instance.
>
> You can decide for yourself whether these are easier than creating a text 
> file from a cron job. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/66004595-e66a-42c6-a1e0-a5aecc847aacn%40googlegroups.com.


[prometheus-users] Scrape the status of VMWare Tools.

2020-09-09 Thread yagyans...@gmail.com
Hi. So, I want to scrape the status(Running/Not Running) of vmware-tools 
service running on all my VMWare servers. For CentOS 7, its pretty straight 
forward using the systemd collector but for CentOS 6, I am in bit of a 
pickle. Since, node exporter is running on all my servers, easiest solution 
would be to create a small bash script and export its output for textfile 
collector but that would require a cron to be added for the script to give 
the data continuously.

Is there any way to achieve this without putting a cron for the script? For 
example, in the Nagios based Check_MK monitoring agent, any script that is 
put under /etc/check_mk folder is self evaluated at every scrape. 

Is cron the best solution, or something better can be done here? 

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/15b661b6-57fb-4627-98a4-6a06560ba4b5n%40googlegroups.com.


[prometheus-users] Querying Response Body via Blackbox Exporter.

2020-07-31 Thread yagyans...@gmail.com

Hi. I am using the below module to fail the probe whenever the response 
body is anything else other than 'green', and it is working fine. But how 
do I get the response body when the probe fails? Eg. If the response body 
is 'red', then probe_success is showing to be zero, but how do I query or 
get what is the response body returned by the URL at this point when 
probe_success is zero.

Here is the module configuration I am using.
  http_healthcheck_es:
prober: http
timeout: 5s
http:
  valid_http_versions: ["HTTP/1.1", "HTTP/2.0", "HTTP/1.0"]
  fail_if_body_not_matches_regexp: ['green']
  method: GET
  no_follow_redirects: false
  fail_if_ssl: false
  fail_if_not_ssl: false
  tls_config:
insecure_skip_verify: true
  preferred_ip_protocol: "ip4"

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1616590e-ed7e-4c02-901a-7eb36aab4e60n%40googlegroups.com.


[prometheus-users] Re: Exact DateTime from Elapsed Time.

2020-07-31 Thread yagyans...@gmail.com
Thanks a lot.

On Wednesday, July 29, 2020 at 1:16:16 AM UTC+5:30 b.ca...@pobox.com wrote:

> node_boot_time_seconds by itself *is* the date and time at which it 
> booted - represented as the number of seconds since 1 Jan 1970.  Front-end 
> software like Grafana will be able to show this as a human-readable date + 
> time.
>
> Also, there are prometheus functions which can convert this to the 
> human-readable pieces, e.g. year(node_boot_time_seconds) will show 2020 
> if it last rebooted some time this year.
>
> See https://prometheus.io/docs/prometheus/latest/querying/functions/
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2d4eb31f-a765-4795-b6d8-5fc58ccfaec3n%40googlegroups.com.


[prometheus-users] Exact DateTime from Elapsed Time.

2020-07-28 Thread yagyans...@gmail.com

Hi. So, I am using the below expression to get the list of servers rebooted 
in last 25 days. It returns the value that how much time has elapsed from 
today to when the server last rebooted. I want to get the date and time at 
which the server rebooted from this elapsed time. Is it possible to get it 
in any way?

node_time_seconds - node_boot_time_seconds) < 600*60*60) / 60


Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b7d13668-4406-438e-b1af-4fa41efd8c6an%40googlegroups.com.


Re: [prometheus-users] Response String from Blackbox Exporter.

2020-07-16 Thread yagyans...@gmail.com
Would be very helpful if you can shed a little light on how to proceed with 
that? 
On Thursday, July 16, 2020 at 12:27:34 PM UTC+5:30 Stuart Clark wrote:

> Prometheus only supports using numbers as the values for time series.
>
> However it is possible to use modules in the blackbox exporter to check 
> the response of a request. 
>
> On 16 July 2020 07:44:46 BST, "yagyans...@gmail.com"  
> wrote:
>>
>>
>> Hi. Is it possible to get the response string given by the endpoint? I 
>> know we can get the RT, Response Code etc. but hasn't been able to get the 
>> response string. Please help if this can be done.
>>
>> Thanks in advance!
>>
>>
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f73a73fe-d228-4ef3-982d-09ddcc0f8f29n%40googlegroups.com.


[prometheus-users] Response String from Blackbox Exporter.

2020-07-16 Thread yagyans...@gmail.com

Hi. Is it possible to get the response string given by the endpoint? I know 
we can get the RT, Response Code etc. but hasn't been able to get the 
response string. Please help if this can be done.

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/22f754f6-e4f6-4548-a59d-a8b2f91511b1n%40googlegroups.com.


[prometheus-users] Re: Making TCP Checks Critical based on Response Time.

2020-07-07 Thread yagyans...@gmail.com
Oh yes. Thank you. Yes, I am using probe_success. Totally forgot that 
probe_duration is there for TCP too. 

On Tuesday, July 7, 2020 at 6:53:48 PM UTC+5:30 b.ca...@pobox.com wrote:

> On Monday, 6 July 2020 18:59:16 UTC+1, yagyans...@gmail.com wrote:
>>
>> Hi. On some of my services I have setup some TCP Port Checks using 
>> blackbox. Now, I want those checks to be critical based on the Response 
>> Time of those TCP Checks. Is it possible to do so?
>>
>>
> Yes.
>
> You didn't show your alerting rules.  Are they using "probe_success" by 
> any chance?  There are other metrics you can use.  Just type "probe_" into 
> the Prometheus web UI and look for the auto-completions.
>
> Here is an example.  blackbox.yml:
>
> modules:
>   certificate:
> prober: tcp
> timeout: 5s
> tcp:
>   tls: true
>   tls_config: {}
>
> curl 'localhost:9115/probe?module=certificate=www.google.com:443'
> # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns 
> lookup in seconds
> # TYPE probe_dns_lookup_time_seconds gauge
> probe_dns_lookup_time_seconds 0.001341219
> *# HELP probe_duration_seconds Returns how long the probe took to complete 
> in seconds*
> *# TYPE probe_duration_seconds gauge*
> *probe_duration_seconds 0.375832381*
> # HELP probe_failed_due_to_regex Indicates if probe failed due to regex
> # TYPE probe_failed_due_to_regex gauge
> probe_failed_due_to_regex 0
> # HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to 
> detect if the IP address changes.
> # TYPE probe_ip_addr_hash gauge
> probe_ip_addr_hash 1.255276657e+09
> # HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
> # TYPE probe_ip_protocol gauge
> probe_ip_protocol 6
> # HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry date
> # TYPE probe_ssl_earliest_cert_expiry gauge
> probe_ssl_earliest_cert_expiry 1.599661882e+09
> # HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL 
> chain expiry in unixtime
> # TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
> probe_ssl_last_chain_expiry_timestamp_seconds 1.599661882e+09
> # HELP probe_success Displays whether or not the probe was a success
> # TYPE probe_success gauge
> probe_success 1
> # HELP probe_tls_version_info Returns the TLS version used, or NaN when 
> unknown
> # TYPE probe_tls_version_info gauge
> probe_tls_version_info{version="TLS 1.3"} 1
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/493de2ba-3383-4edf-9e42-f8e60e0d2e6an%40googlegroups.com.


[prometheus-users] Making TCP Checks Critical based on Response Time.

2020-07-06 Thread yagyans...@gmail.com
Hi. On some of my services I have setup some TCP Port Checks using 
blackbox. Now, I want those checks to be critical based on the Response 
Time of those TCP Checks. Is it possible to do so?

My TCP job.
  - job_name: 'blackbox_Service-TCPChecks'
scrape_timeout: 10s
metrics_path: /probe
params:
  module: [tcp_connect]
file_sd_configs:
  - files:
  - /etc/blackbox/HTTP_TCP-Targets/TCP_Targets.yml

This is how I am defining targets.
- targets:
 - x.x.x.x:80
  labels:
 checkname: 'myname'
 cluster: 'C1'
 node: 'N1'

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2c31e6f3-e06c-479f-a960-ff95a3d12bdan%40googlegroups.com.


Re: [prometheus-users] Custom Threshold for a particular instance.

2020-07-03 Thread yagyans...@gmail.com
This seems like an interesting approach. If possible can you please give 
some more insight into this approach?

On Friday, July 3, 2020 at 10:56:01 AM UTC+5:30 sayf.eddi...@gmail.com 
wrote:

> The other proper way is to dynamically generate alerts where you hardcode 
> the thresholds based on labels.
> Like using a combo of yaml/jinja to store the thresholds in a 
> maintainable format and have one command to regenerate everything.
> Every time you want to change a value you just regenerate the alerts.
>
> On Thu, Jul 2, 2020 at 8:38 PM Yagyansh S. Kumar  
> wrote:
>
>> Also, currently, I have only tried a single way to give custom threshold 
>> i.e based on the component name. For example, for all the targets under 
>> Comp-A have a threshold of 99.9 and all the targets under Comp-B have a 
>> threshold of 95.
>> But now, I have to give a common custom threshold let say 98 to 5 
>> different targets, all of which belong to 5 different components and all 
>> the 5 components have more than 1 target but I want the custom threshold to 
>> be applied for only a single target from each component.
>>
>> On Fri, Jul 3, 2020 at 12:02 AM Yagyansh S. Kumar  
>> wrote:
>>
>>> Hi Christian,
>>>
>>> Actually, I want to another if there is any better way to define the 
>>> threshold for my 5 new servers that belong to 5 different components. Is 
>>> writing 5 different recording rules with the same name, and different 
>>> instance and component labels only way to proceed here? Won't that be a 
>>> little too dirty to maintain? What if it was 20 servers all belonging to a 
>>> different component?
>>>
>>> On Tue, Jun 30, 2020 at 11:43 AM Christian Hoffmann <
>>> ma...@hoffmann-christian.info> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 6/24/20 8:09 PM, yagyans...@gmail.com wrote:
>>>> > Hi. Currently I am using a custom threshold in case of my Memory 
>>>> alerts.
>>>> > I have 2 main labels for my every node exporter target - cluster and
>>>> > component.
>>>> > My custom threshold till now has been based on the component as I had 
>>>> to
>>>> > define that particular custom threshold for all the servers of the
>>>> > component. But now, I have 5 instances, all from different components
>>>> > and I have to set the threshold as 97. How do approach this?
>>>> > 
>>>> > My typical node exporter job.
>>>> >   - job_name: 'node_exporter_JOB-A'
>>>> > static_configs:
>>>> > - targets: [ 'x.x.x.x:9100' , 'x.x.x.x:9100']
>>>> >   labels:
>>>> > cluster: 'Cluster-A'
>>>> > env: 'PROD'
>>>> > component: 'Comp-A'
>>>> > scrape_interval: 10s
>>>> > 
>>>> > Recording rule for custom thresholds.
>>>> >   - record: abcd_critical
>>>> > expr: 99.9
>>>> > labels:
>>>> >   component: 'Comp-A'
>>>> > 
>>>> >   - record: xyz_critical
>>>> > expr: 95
>>>> > labels:
>>>> >   node: 'Comp-B'
>>>> > 
>>>> > The expression for Memory Alert.
>>>> > ((node_memory_MemTotal_bytes - node_memory_MemFree_bytes -
>>>> > node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) *
>>>> > on(instance) group_left(nodename) node_uname_info > on(component)
>>>> > group_left() (*abcd_critical* or *xyz_critical* or on(node) count by
>>>> > (component)((node_memory_MemTotal_bytes - node_memory_MemFree_bytes -
>>>> > node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 0 + 
>>>> 90)
>>>> > 
>>>> > Now, I have 5 servers with different components. How to include that 
>>>> in
>>>> > the most optimized manner?
>>>>
>>>> This looks almost like the pattern described here:
>>>> https://www.robustperception.io/using-time-series-as-alert-thresholds
>>>>
>>>> It looks like you already tried to integrate the two different ways to
>>>> specific thresholds, right? Is there any specific problem with it?
>>>>
>>>> Sadly, this pattern quickly becomes complex, especially if nested (like
>>>> you would need to do) and if combined with an already longer query (like
>>>> in your case).
>>>>
>>>> I 

[prometheus-users] Alert state on Alertmanager Reload.

2020-07-02 Thread yagyans...@gmail.com
Hi. Recently, I have started noticing that whenever I reload Alertmanager 
using the reload API, a resolved notification is sent out to the web-hooks. 
Is this the expected behavior? Because I have started noticing this 
behavior only recently.
For most of my alert rules, the *for *time is generally 5-10m.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/66b32828-f60d-40a4-8b4e-244bc2841937n%40googlegroups.com.


[prometheus-users] Alertmanager Queue Issue.

2020-07-02 Thread yagyans...@gmail.com
Hi. I have around 80 alert rules configured. These days I have started 
receiving warning from Alertmanager saying "*component=cluster 
msg="dropping messages because too many are queued" current=4130 limit=4096*
".
What exactly does this mean? Are 4130 alerts in the pending queue? How can 
so many alerts generate? I have currently 430 alerts(Warning + Critical). 
How are these many alerts queued? And what to do to overcome this issue?

Below are my global alertmanager configurations.

global:
  resolve_timeout: 5m
  slack_api_url: "blahblah"
route:
  group_by: ['cluster']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'slack-channel'

Also, out of 80 alert rules, some of them are for warning, for which I 
don't need an alert notification but I need them for my visualization and 
dash-boarding purposes. Is there any way to disable some of the alert-rules 
to not even go to the global receiver?

Can someone please help here? Some of my critical alerts are getting 
blocked.
Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/01ccc40b-4c20-4140-8641-b6984da370f8n%40googlegroups.com.


[prometheus-users] Alertmanager Queue Issue.

2020-07-02 Thread yagyans...@gmail.com
Hi. So, I have around 80 alert rules configured, and these days I have 
started receiving warning from alertmanager saying "*component=cluster 
msg="dropping messages because too many are queued" current=4130 limit=4096*
"*.*
What exactly does this mean? I am wondering how are 4130 alert messages 
being generated, that way too much. I have around 400 alerts(Warning + 
Critical) at present.

Also, from the 80 alerts rules some are of warning, for which I don't need 
any alert but I need them for my visualization and dashboards. How to 
configure such that the warning alerts don't even go the my global 
receiver? And how to solve this queue issue?

Can someone help here. Some of my critical alerts are being blocked.
Thanks in advance :).

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6c9a0653-8ddd-4da4-b96b-b038db39e77an%40googlegroups.com.


Re: [prometheus-users] Custom Threshold for Particular Instance.

2020-06-24 Thread yagyans...@gmail.com
Hi. Thanks for such a quick response.

Do you mean to say define 5 different recording rules with the same name 
for all those 5 servers?
Also, in the recording rule now, I would have to give 2 labels correct? One 
is the instance itself and another is the component because my whole Memory 
expression is based on the component given in the recording rule.

On Wednesday, June 24, 2020 at 11:52:46 PM UTC+5:30 sayf.eddi...@gmail.com 
wrote:

> You can redefine another recoring rule with the same metric name but a 
> different value and different label, that way one record rule can have 
> multiple values depending on labels
>
> On Wed, Jun 24, 2020, 20:11 yagyans...@gmail.com  
> wrote:
>
>> Hi. Currently I am using a custom threshold in case of my Memory alerts. 
>> I have 2 main labels for my every node exporter target - cluster and 
>> component. 
>> My custom threshold till now has been based on the component as I had to 
>> define that particular custom threshold for all the servers of the 
>> component. But now, I have 5 instances, all from different components and I 
>> have to set the threshold as 97. How do approach this?
>>
>> My typical node exporter job.
>>   - job_name: 'node_exporter_JOB-A'
>> static_configs:
>> - targets: [ 'x.x.x.x:9100' , 'x.x.x.x:9100']
>>   labels:
>> cluster: 'Cluster-A'
>> env: 'PROD'
>> component: 'Comp-A'
>> scrape_interval: 10s
>>
>> Recording rule for custom thresholds.
>>   - record: abcd_critical
>> expr: 99.9
>> labels:
>>   component: 'Comp-A'
>>
>>   - record: xyz_critical
>> expr: 95
>> labels:
>>   node: 'Comp-B'
>>
>> The expression for Memory Alert.
>> ((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - 
>> node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 
>> on(instance) group_left(nodename) node_uname_info > on(component) 
>> group_left() (*abcd_critical* or *xyz_critical* or on(component) count 
>> by (component)((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - 
>> node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 0 + 90)
>>
>> Now, I have 5 servers with different components. How to include that in 
>> the most optimized manner?
>>
>> Thanks in advance.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/58c7d88e-6538-4039-a5ae-4bd092cd8087n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/58c7d88e-6538-4039-a5ae-4bd092cd8087n%40googlegroups.com?utm_medium=email_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/93d1ca48-0275-4d37-b6f0-bc4eb15ffd8fn%40googlegroups.com.


[prometheus-users] Custom Threshold for Particular Instance.

2020-06-24 Thread yagyans...@gmail.com
Hi. Currently I am using a custom threshold in case of my Memory alerts. I 
have 2 main labels for my every node exporter target - cluster and 
component. 
My custom threshold till now has been based on the component as I had to 
define that particular custom threshold for all the servers of the 
component. But now, I have 5 instances, all from different components and I 
have to set the threshold as 97. How do approach this?

My typical node exporter job.
  - job_name: 'node_exporter_JOB-A'
static_configs:
- targets: [ 'x.x.x.x:9100' , 'x.x.x.x:9100']
  labels:
cluster: 'Cluster-A'
env: 'PROD'
component: 'Comp-A'
scrape_interval: 10s

Recording rule for custom thresholds.
  - record: abcd_critical
expr: 99.9
labels:
  component: 'Comp-A'

  - record: xyz_critical
expr: 95
labels:
  node: 'Comp-B'

The expression for Memory Alert.
((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - 
node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 
on(instance) group_left(nodename) node_uname_info > on(component) 
group_left() (*abcd_critical* or *xyz_critical* or on(component) count by 
(component)((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - 
node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 0 + 90)

Now, I have 5 servers with different components. How to include that in the 
most optimized manner?

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/58c7d88e-6538-4039-a5ae-4bd092cd8087n%40googlegroups.com.