Re: [prometheus-users] Blackbox exporter http module check interval - how to change default 15s

2023-05-18 Thread Daniel Swarbrick
I guess this relates to 
https://github.com/prometheus/blackbox_exporter/issues/1051

On Thursday, May 18, 2023 at 4:33:46 PM UTC+2 Ben Kochie wrote:

> Based on what little I can derive from the "data" presented. This is 
> probably from Grafana, given that the timestamps are snapped to an even 15 
> seconds.
>
> And based on this, I'm guessing this is a misunderstanding of the lookback 
> / step interval, which will default to 15s in Grafana.
>
> There's no actual evidence (logs) provided that the exporter is actually 
> doing what they claim.
>
> On Thu, May 18, 2023 at 2:47 PM Stuart Clark  
> wrote:
>
>> On 2023-05-18 11:27, Paweł Błażejewski wrote:
>> > Hello,
>> > 
>> > Blackbox exporter http module check https site every 15s by default.
>> > Can you please tell me Is it posible to change this interval to 1
>> > minute. I add scrape_interval parametr in prometheus config as you see
>> > below, but it doesn't change anything. Samples are stiil every 15s.
>> > 
>> > can you please tell me how and where change it.
>> > I user prometheus 2.37.7, blackbox_exporter 0.23.0
>> > 
>> > prometheus config:
>> > 
>> >   * job_name: 'blackbox-http-csci-prod'
>> > scrape_interval: 60s
>> > scrape_timeout: 50s
>> > metrics_path: /blackbox/probe
>> > params:
>> > module: [http_2xx]
>> > static_configs:
>> > 
>> >   * targets:
>> > 
>> >   * https://ci-jenkins***/login?from=%2F
>> >   * https://ci-jenkins-***l/login?from=%2F
>> > 
>> >   * https://ci-**/login?from=%2F
>> > .
>> > .
>> > .
>> > relabel_configs:
>> > 
>> >   * source_labels: [ADDRESS]
>> > target_label: __param_target
>> >   * source_labels: [__param_target]
>> > target_label: instance* target_label: ADDRESS
>> > 
>> > Samples still every 15s.
>> > 
>> > 2023-05-15 09:45:00
>> > 0.0624
>> > 2023-05-15 09:45:15
>>
>>
>> How are you producing that data?
>>
>> Did you remember to reload/restart Prometheus after making the change to 
>> your config?
>>
>> -- 
>> Stuart Clark
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/89897569de531dc500edf3645dc6bf1b%40Jahingo.com
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/577eb978-7e8d-4700-a85d-ef0ac3bbcfe7n%40googlegroups.com.


[prometheus-users] How to sense disk read/write errors

2023-05-18 Thread dubnq77 via Prometheus Users
For proper NVMe metrics monitoring you need additional collector script, for 
example: 
https://github.com/prometheus-community/node-exporter-textfile-collector-scripts/blob/master/nvme_metrics.sh

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a0f490e6-1b42-4c0b-afea-fb6964e35d44%40at.encryp.ch.


[prometheus-users] How to sense disk read/write errors

2023-05-18 Thread M Moore
Had two users come at me with "why didn't you...?" because of a machine
that had disk
hardware failures, but no alerts before the device died.  They pointed at
these messages
in the kernel dmesg:

> [Wed May 17 06:07:05 2023] nvme nvme3: async event result 00010300
> [Wed May 17 06:07:25 2023] nvme nvme3: controller is down; will reset:
CSTS=0x2, PCI_STATUS=0x10
> [Wed May 17 11:56:04 2023] print_req_error: I/O error, dev nvme3c33n1,
sector 3125627392
> [Wed May 17 11:56:04 2023] print_req_error: I/O error, dev nvme3c33n1,
sector 3125627392 > [Thu May 18 08:06:04 2023] Buffer I/O error on dev
nvme3n1, logical block 390703424, async page read
> [Thu May 18 08:07:37 2023] print_req_error: I/O error, dev nvme3c33n1,
sector 0
> [Thu May 18 08:07:37 2023] print_req_error: I/O error, dev nvme3c33n1,
sector 256 I didn't find an "errors" counter in iostats[1] so I can guess
node_exporter won't have it. I did find node_filesystem_device_error but
that was zero the whole time. What would be the prometheus-y way to sense
these errors so my users can have their alerts?" I'm hoping to avoid
"logtail | grep -c 'error' " in a counter. [1:
https://www.kernel.org/doc/html/latest/admin-guide/iostats.html ]

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CACDZGiKxT-kKodJQe44TL5-DRKwZ5fpazPhvkb4FijGS8iWjsQ%40mail.gmail.com.


[prometheus-users] Use Kubernetes-nodes Metrics to determine kubelet cert expiry

2023-05-18 Thread Kevin Cameron
Hi There,
  I have several older kubernetes clusters and I need a way to alert on the
kubelet/kubeadm certs before they expire.

>From the kubernetes-nodes job we get several metrics:
apiserver_client_certificate_expiration_seconds_count
apiserver_client_certificate_expiration_seconds_sum
apiserver_client_certificate_expiration_seconds_bucket

But I am not really clear on how to use these values to determine either a
countdown to expiry or the actual expiration date.  The bucket metrics have
an "le" label but not sure what that means.

Thanks,
Kevin

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAB%2BSi9_hL4NxT4w-ORbCnK8ABHdMeXmEqtUBZti--t3cYG9%3DLg%40mail.gmail.com.


Re: [prometheus-users] Blackbox exporter http module check interval - how to change default 15s

2023-05-18 Thread Ben Kochie
Based on what little I can derive from the "data" presented. This is
probably from Grafana, given that the timestamps are snapped to an even 15
seconds.

And based on this, I'm guessing this is a misunderstanding of the lookback
/ step interval, which will default to 15s in Grafana.

There's no actual evidence (logs) provided that the exporter is actually
doing what they claim.

On Thu, May 18, 2023 at 2:47 PM Stuart Clark 
wrote:

> On 2023-05-18 11:27, Paweł Błażejewski wrote:
> > Hello,
> >
> > Blackbox exporter http module check https site every 15s by default.
> > Can you please tell me Is it posible to change this interval to 1
> > minute. I add scrape_interval parametr in prometheus config as you see
> > below, but it doesn't change anything. Samples are stiil every 15s.
> >
> > can you please tell me how and where change it.
> > I user prometheus 2.37.7, blackbox_exporter 0.23.0
> >
> > prometheus config:
> >
> >   * job_name: 'blackbox-http-csci-prod'
> > scrape_interval: 60s
> > scrape_timeout: 50s
> > metrics_path: /blackbox/probe
> > params:
> > module: [http_2xx]
> > static_configs:
> >
> >   * targets:
> >
> >   * https://ci-jenkins***/login?from=%2F
> >   * https://ci-jenkins-***l/login?from=%2F
> >
> >   * https://ci-**/login?from=%2F
> > .
> > .
> > .
> > relabel_configs:
> >
> >   * source_labels: [ADDRESS]
> > target_label: __param_target
> >   * source_labels: [__param_target]
> > target_label: instance* target_label: ADDRESS
> >
> > Samples still every 15s.
> >
> > 2023-05-15 09:45:00
> > 0.0624
> > 2023-05-15 09:45:15
>
>
> How are you producing that data?
>
> Did you remember to reload/restart Prometheus after making the change to
> your config?
>
> --
> Stuart Clark
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/89897569de531dc500edf3645dc6bf1b%40Jahingo.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmpca_ujU23ARR980QR2NcwAHfJNapLE7p37W6emyWUBBQ%40mail.gmail.com.


Re: [prometheus-users] Blackbox exporter http module check interval - how to change default 15s

2023-05-18 Thread Stuart Clark

On 2023-05-18 11:27, Paweł Błażejewski wrote:

Hello,

Blackbox exporter http module check https site every 15s by default.
Can you please tell me Is it posible to change this interval to 1
minute. I add scrape_interval parametr in prometheus config as you see
below, but it doesn't change anything. Samples are stiil every 15s.

can you please tell me how and where change it.
I user prometheus 2.37.7, blackbox_exporter 0.23.0

prometheus config:

* job_name: 'blackbox-http-csci-prod'
scrape_interval: 60s
scrape_timeout: 50s
metrics_path: /blackbox/probe
params:
module: [http_2xx]
static_configs:

* targets:

* https://ci-jenkins***/login?from=%2F
* https://ci-jenkins-***l/login?from=%2F

* https://ci-**/login?from=%2F
.
.
.
relabel_configs:

* source_labels: [ADDRESS]
target_label: __param_target
* source_labels: [__param_target]
target_label: instance  * target_label: ADDRESS

Samples still every 15s.

2023-05-15 09:45:00
0.0624
2023-05-15 09:45:15



How are you producing that data?

Did you remember to reload/restart Prometheus after making the change to 
your config?


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/89897569de531dc500edf3645dc6bf1b%40Jahingo.com.


[prometheus-users] Blackbox exporter http module check interval - how to change default 15s

2023-05-18 Thread Paweł Błażejewski
 

Hello, 

Blackbox exporter http module check https site every 15s by default. Can 
you please tell me Is it posible to change this interval to 1 minute. I add 
scrape_interval parametr in prometheus config as you see below, but it 
doesn't change anything. Samples are stiil every 15s.

can you please tell me how and where change it.
I user prometheus 2.37.7, blackbox_exporter 0.23.0

prometheus config:

   - job_name: 'blackbox-http-csci-prod'
   scrape_interval: 60s
   scrape_timeout: 50s
   metrics_path: /blackbox/probe
   params:
   module: [http_2xx]
   static_configs: 
  - targets: 
 - https://ci-jenkins***/login?from=%2F
 - https://ci-jenkins-***l/login?from=%2F
 - https://ci-**/login?from=%2F
 .
 .
 .
 relabel_configs:
  - source_labels: [*address*]
  target_label: __param_target
  - source_labels: [__param_target]
  target_label: instance
  - target_label: *address*
   
Samples still every 15s.

2023-05-15 09:45:00
0.0624
2023-05-15 09:45:15
0.0624
2023-05-15 09:45:30
0.0527
2023-05-15 09:45:45
0.0527
2023-05-15 09:46:00
0.0527
2023-05-15 09:46:15
0.0527
2023-05-15 09:46:30
0.0465
2023-05-15 09:46:45
0.0465
2023-05-15 09:47:00
0.0465
2023-05-15 09:47:15
0.0465
2023-05-15 09:47:30
0.0465
2023-05-15 09:47:45
0.0465

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d8d5c09c-96b1-4cdc-852e-4debcc64e33an%40googlegroups.com.


[prometheus-users] Re: Calculate Time Until Gauge Reaches Target

2023-05-18 Thread Brian Candler
Also remember a separate alert for "*progress_current >= progress_goal*" if 
you need to know if the goal has already been met. The expression I gave 
only gives an expected time to reach the goal, if the goal *hasn't* already 
been met.

On Thursday, 18 May 2023 at 09:07:24 UTC+1 Brian Candler wrote:

> Grr, I think I messed up the threshold detection ("> progress_goal") on 
> the denominator.
>
> Take 2:
>
> expr: |
> (*(progress_goal - progress_current) > 0)* / 
> (( predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 
> 604800) *> progress_goal*) - *progress_current*) * 604800
>
> On Thursday, 18 May 2023 at 08:57:57 UTC+1 Brian Candler wrote:
>
>> > I have seen a bunch of examples for predicting when a disk will fill, 
>> but these all seem to rely on the assumption that a gauge is trending 
>> downwards, and we are predicting when it meets zero. 
>>
>> Here is a better recipe as a starting point, using predict_linear():
>> https://groups.google.com/g/prometheus-users/c/PCT4MJjFFgI/m/kVfOW069BQAJ
>>
>> > How can we write an expression that will tell us the time in seconds 
>> until progress_current meets *progress_goal*?
>>
>> In principle, I think you should just be able to replace the metric in 
>> that expression with (progress_goal - progress_current) and see when that 
>> hits zero.  Something like this:
>>
>> expr: |
>> (*(progress_goal - progress_current) > 0)* / (*(progress_goal - 
>> progress_current)* -
>> ((*progress_goal -* 
>> predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 
>> 604800)) < 0)) * 604800
>>
>> (That's syntactically valid, but otherwise untested. Obviously you should 
>> remove or change the label selectors on progress_current to suit the metric 
>> in question). 
>>
>> I see +progress_goal and -progress_goal in the denominator so it should 
>> be possible to simplify it further, I think to this:
>>
>> expr: |
>> (*(progress_goal - progress_current) > 0)* / 
>> (( predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 
>> 604800) *- progress_current*) > *progress_goal*) * 604800
>>
>> The value of this expression is intended to return *the time until the 
>> goal is reached*, using linear interpolation over the past 12 hours, to 
>> predict where it will be in 7 days. If the goal isn't expected to be 
>> reached in 7 days then it will return no value.
>>
>> I can attempt to justify that expression graphically:
>>
>> |   C2 ^
>> |  /   |
>> | /|
>>   G +.^..x.|
>> | | /  D
>> | N/   |
>> | |   /|
>> | v C1 v
>> |
>> +---0d--7d> time
>>
>> G is the progress_goal, which I'm assuming is constant
>> C1 is the value of progress_current now (time = 0d), which is less than G
>> C2 is the future predicted value of progress_current (at time = 7d), 
>> which is greater than G
>>
>> N (numerator) is G - C1
>> D (denominator) is  C2 - C1
>>
>> And geometrically, I think the ratio of the crossover time (0d...x) to 
>> the full time (0d...7d) is the same as the ratio of N to D.  Hence (N/D)*7d 
>> gives the time to crossover.
>>
>> Give it a go. Note that if progress_goal is a metric, rather than a 
>> constant, then it will need to have exactly the same label set as 
>> progress_current; or you will need to add some on(...) or ignoring(...) 
>> clauses as appropriate.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/155a5585-f191-4e19-b59e-fa853532dde5n%40googlegroups.com.


[prometheus-users] Re: Calculate Time Until Gauge Reaches Target

2023-05-18 Thread Brian Candler
Grr, I think I messed up the threshold detection ("> progress_goal") on the 
denominator.

Take 2:

expr: |
(*(progress_goal - progress_current) > 0)* / 
(( predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 
604800) *> progress_goal*) - *progress_current*) * 604800

On Thursday, 18 May 2023 at 08:57:57 UTC+1 Brian Candler wrote:

> > I have seen a bunch of examples for predicting when a disk will fill, 
> but these all seem to rely on the assumption that a gauge is trending 
> downwards, and we are predicting when it meets zero. 
>
> Here is a better recipe as a starting point, using predict_linear():
> https://groups.google.com/g/prometheus-users/c/PCT4MJjFFgI/m/kVfOW069BQAJ
>
> > How can we write an expression that will tell us the time in seconds 
> until progress_current meets *progress_goal*?
>
> In principle, I think you should just be able to replace the metric in 
> that expression with (progress_goal - progress_current) and see when that 
> hits zero.  Something like this:
>
> expr: |
> (*(progress_goal - progress_current) > 0)* / (*(progress_goal - 
> progress_current)* -
> ((*progress_goal -* 
> predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 
> 604800)) < 0)) * 604800
>
> (That's syntactically valid, but otherwise untested. Obviously you should 
> remove or change the label selectors on progress_current to suit the metric 
> in question). 
>
> I see +progress_goal and -progress_goal in the denominator so it should be 
> possible to simplify it further, I think to this:
>
> expr: |
> (*(progress_goal - progress_current) > 0)* / 
> (( predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 
> 604800) *- progress_current*) > *progress_goal*) * 604800
>
> The value of this expression is intended to return *the time until the 
> goal is reached*, using linear interpolation over the past 12 hours, to 
> predict where it will be in 7 days. If the goal isn't expected to be 
> reached in 7 days then it will return no value.
>
> I can attempt to justify that expression graphically:
>
> |   C2 ^
> |  /   |
> | /|
>   G +.^..x.|
> | | /  D
> | N/   |
> | |   /|
> | v C1 v
> |
> +---0d--7d> time
>
> G is the progress_goal, which I'm assuming is constant
> C1 is the value of progress_current now (time = 0d), which is less than G
> C2 is the future predicted value of progress_current (at time = 7d), which 
> is greater than G
>
> N (numerator) is G - C1
> D (denominator) is  C2 - C1
>
> And geometrically, I think the ratio of the crossover time (0d...x) to the 
> full time (0d...7d) is the same as the ratio of N to D.  Hence (N/D)*7d 
> gives the time to crossover.
>
> Give it a go. Note that if progress_goal is a metric, rather than a 
> constant, then it will need to have exactly the same label set as 
> progress_current; or you will need to add some on(...) or ignoring(...) 
> clauses as appropriate.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bcb15414-2ee4-4355-8302-b89913c2a22en%40googlegroups.com.


[prometheus-users] Re: Calculate Time Until Gauge Reaches Target

2023-05-18 Thread Brian Candler
> I have seen a bunch of examples for predicting when a disk will fill, but 
these all seem to rely on the assumption that a gauge is trending 
downwards, and we are predicting when it meets zero. 

Here is a better recipe as a starting point, using predict_linear():
https://groups.google.com/g/prometheus-users/c/PCT4MJjFFgI/m/kVfOW069BQAJ

> How can we write an expression that will tell us the time in seconds 
until progress_current meets *progress_goal*?

In principle, I think you should just be able to replace the metric in that 
expression with (progress_goal - progress_current) and see when that hits 
zero.  Something like this:

expr: |
(*(progress_goal - progress_current) > 0)* / (*(progress_goal - 
progress_current)* -
((*progress_goal -* 
predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 
604800)) < 0)) * 604800

(That's syntactically valid, but otherwise untested. Obviously you should 
remove or change the label selectors on progress_current to suit the metric 
in question). 

I see +progress_goal and -progress_goal in the denominator so it should be 
possible to simplify it further, I think to this:

expr: |
(*(progress_goal - progress_current) > 0)* / 
(( predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 
604800) *- progress_current*) > *progress_goal*) * 604800

The value of this expression is intended to return *the time until the goal 
is reached*, using linear interpolation over the past 12 hours, to predict 
where it will be in 7 days. If the goal isn't expected to be reached in 7 
days then it will return no value.

I can attempt to justify that expression graphically:

|   C2 ^
|  /   |
| /|
  G +.^..x.|
| | /  D
| N/   |
| |   /|
| v C1 v
|
+---0d--7d> time

G is the progress_goal, which I'm assuming is constant
C1 is the value of progress_current now (time = 0d), which is less than G
C2 is the future predicted value of progress_current (at time = 7d), which 
is greater than G

N (numerator) is G - C1
D (denominator) is  C2 - C1

And geometrically, I think the ratio of the crossover time (0d...x) to the 
full time (0d...7d) is the same as the ratio of N to D.  Hence (N/D)*7d 
gives the time to crossover.

Give it a go. Note that if progress_goal is a metric, rather than a 
constant, then it will need to have exactly the same label set as 
progress_current; or you will need to add some on(...) or ignoring(...) 
clauses as appropriate.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1aa53b61-d37a-4ac3-85c5-10636bbb75acn%40googlegroups.com.