Re: [prometheus-users] Blackbox exporter http module check interval - how to change default 15s
I guess this relates to https://github.com/prometheus/blackbox_exporter/issues/1051 On Thursday, May 18, 2023 at 4:33:46 PM UTC+2 Ben Kochie wrote: > Based on what little I can derive from the "data" presented. This is > probably from Grafana, given that the timestamps are snapped to an even 15 > seconds. > > And based on this, I'm guessing this is a misunderstanding of the lookback > / step interval, which will default to 15s in Grafana. > > There's no actual evidence (logs) provided that the exporter is actually > doing what they claim. > > On Thu, May 18, 2023 at 2:47 PM Stuart Clark > wrote: > >> On 2023-05-18 11:27, Paweł Błażejewski wrote: >> > Hello, >> > >> > Blackbox exporter http module check https site every 15s by default. >> > Can you please tell me Is it posible to change this interval to 1 >> > minute. I add scrape_interval parametr in prometheus config as you see >> > below, but it doesn't change anything. Samples are stiil every 15s. >> > >> > can you please tell me how and where change it. >> > I user prometheus 2.37.7, blackbox_exporter 0.23.0 >> > >> > prometheus config: >> > >> > * job_name: 'blackbox-http-csci-prod' >> > scrape_interval: 60s >> > scrape_timeout: 50s >> > metrics_path: /blackbox/probe >> > params: >> > module: [http_2xx] >> > static_configs: >> > >> > * targets: >> > >> > * https://ci-jenkins***/login?from=%2F >> > * https://ci-jenkins-***l/login?from=%2F >> > >> > * https://ci-**/login?from=%2F >> > . >> > . >> > . >> > relabel_configs: >> > >> > * source_labels: [ADDRESS] >> > target_label: __param_target >> > * source_labels: [__param_target] >> > target_label: instance* target_label: ADDRESS >> > >> > Samples still every 15s. >> > >> > 2023-05-15 09:45:00 >> > 0.0624 >> > 2023-05-15 09:45:15 >> >> >> How are you producing that data? >> >> Did you remember to reload/restart Prometheus after making the change to >> your config? >> >> -- >> Stuart Clark >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to prometheus-use...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/89897569de531dc500edf3645dc6bf1b%40Jahingo.com >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/577eb978-7e8d-4700-a85d-ef0ac3bbcfe7n%40googlegroups.com.
[prometheus-users] How to sense disk read/write errors
For proper NVMe metrics monitoring you need additional collector script, for example: https://github.com/prometheus-community/node-exporter-textfile-collector-scripts/blob/master/nvme_metrics.sh -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a0f490e6-1b42-4c0b-afea-fb6964e35d44%40at.encryp.ch.
[prometheus-users] How to sense disk read/write errors
Had two users come at me with "why didn't you...?" because of a machine that had disk hardware failures, but no alerts before the device died. They pointed at these messages in the kernel dmesg: > [Wed May 17 06:07:05 2023] nvme nvme3: async event result 00010300 > [Wed May 17 06:07:25 2023] nvme nvme3: controller is down; will reset: CSTS=0x2, PCI_STATUS=0x10 > [Wed May 17 11:56:04 2023] print_req_error: I/O error, dev nvme3c33n1, sector 3125627392 > [Wed May 17 11:56:04 2023] print_req_error: I/O error, dev nvme3c33n1, sector 3125627392 > [Thu May 18 08:06:04 2023] Buffer I/O error on dev nvme3n1, logical block 390703424, async page read > [Thu May 18 08:07:37 2023] print_req_error: I/O error, dev nvme3c33n1, sector 0 > [Thu May 18 08:07:37 2023] print_req_error: I/O error, dev nvme3c33n1, sector 256 I didn't find an "errors" counter in iostats[1] so I can guess node_exporter won't have it. I did find node_filesystem_device_error but that was zero the whole time. What would be the prometheus-y way to sense these errors so my users can have their alerts?" I'm hoping to avoid "logtail | grep -c 'error' " in a counter. [1: https://www.kernel.org/doc/html/latest/admin-guide/iostats.html ] -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CACDZGiKxT-kKodJQe44TL5-DRKwZ5fpazPhvkb4FijGS8iWjsQ%40mail.gmail.com.
[prometheus-users] Use Kubernetes-nodes Metrics to determine kubelet cert expiry
Hi There, I have several older kubernetes clusters and I need a way to alert on the kubelet/kubeadm certs before they expire. >From the kubernetes-nodes job we get several metrics: apiserver_client_certificate_expiration_seconds_count apiserver_client_certificate_expiration_seconds_sum apiserver_client_certificate_expiration_seconds_bucket But I am not really clear on how to use these values to determine either a countdown to expiry or the actual expiration date. The bucket metrics have an "le" label but not sure what that means. Thanks, Kevin -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAB%2BSi9_hL4NxT4w-ORbCnK8ABHdMeXmEqtUBZti--t3cYG9%3DLg%40mail.gmail.com.
Re: [prometheus-users] Blackbox exporter http module check interval - how to change default 15s
Based on what little I can derive from the "data" presented. This is probably from Grafana, given that the timestamps are snapped to an even 15 seconds. And based on this, I'm guessing this is a misunderstanding of the lookback / step interval, which will default to 15s in Grafana. There's no actual evidence (logs) provided that the exporter is actually doing what they claim. On Thu, May 18, 2023 at 2:47 PM Stuart Clark wrote: > On 2023-05-18 11:27, Paweł Błażejewski wrote: > > Hello, > > > > Blackbox exporter http module check https site every 15s by default. > > Can you please tell me Is it posible to change this interval to 1 > > minute. I add scrape_interval parametr in prometheus config as you see > > below, but it doesn't change anything. Samples are stiil every 15s. > > > > can you please tell me how and where change it. > > I user prometheus 2.37.7, blackbox_exporter 0.23.0 > > > > prometheus config: > > > > * job_name: 'blackbox-http-csci-prod' > > scrape_interval: 60s > > scrape_timeout: 50s > > metrics_path: /blackbox/probe > > params: > > module: [http_2xx] > > static_configs: > > > > * targets: > > > > * https://ci-jenkins***/login?from=%2F > > * https://ci-jenkins-***l/login?from=%2F > > > > * https://ci-**/login?from=%2F > > . > > . > > . > > relabel_configs: > > > > * source_labels: [ADDRESS] > > target_label: __param_target > > * source_labels: [__param_target] > > target_label: instance* target_label: ADDRESS > > > > Samples still every 15s. > > > > 2023-05-15 09:45:00 > > 0.0624 > > 2023-05-15 09:45:15 > > > How are you producing that data? > > Did you remember to reload/restart Prometheus after making the change to > your config? > > -- > Stuart Clark > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to prometheus-users+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/89897569de531dc500edf3645dc6bf1b%40Jahingo.com > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmpca_ujU23ARR980QR2NcwAHfJNapLE7p37W6emyWUBBQ%40mail.gmail.com.
Re: [prometheus-users] Blackbox exporter http module check interval - how to change default 15s
On 2023-05-18 11:27, Paweł Błażejewski wrote: Hello, Blackbox exporter http module check https site every 15s by default. Can you please tell me Is it posible to change this interval to 1 minute. I add scrape_interval parametr in prometheus config as you see below, but it doesn't change anything. Samples are stiil every 15s. can you please tell me how and where change it. I user prometheus 2.37.7, blackbox_exporter 0.23.0 prometheus config: * job_name: 'blackbox-http-csci-prod' scrape_interval: 60s scrape_timeout: 50s metrics_path: /blackbox/probe params: module: [http_2xx] static_configs: * targets: * https://ci-jenkins***/login?from=%2F * https://ci-jenkins-***l/login?from=%2F * https://ci-**/login?from=%2F . . . relabel_configs: * source_labels: [ADDRESS] target_label: __param_target * source_labels: [__param_target] target_label: instance * target_label: ADDRESS Samples still every 15s. 2023-05-15 09:45:00 0.0624 2023-05-15 09:45:15 How are you producing that data? Did you remember to reload/restart Prometheus after making the change to your config? -- Stuart Clark -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/89897569de531dc500edf3645dc6bf1b%40Jahingo.com.
[prometheus-users] Blackbox exporter http module check interval - how to change default 15s
Hello, Blackbox exporter http module check https site every 15s by default. Can you please tell me Is it posible to change this interval to 1 minute. I add scrape_interval parametr in prometheus config as you see below, but it doesn't change anything. Samples are stiil every 15s. can you please tell me how and where change it. I user prometheus 2.37.7, blackbox_exporter 0.23.0 prometheus config: - job_name: 'blackbox-http-csci-prod' scrape_interval: 60s scrape_timeout: 50s metrics_path: /blackbox/probe params: module: [http_2xx] static_configs: - targets: - https://ci-jenkins***/login?from=%2F - https://ci-jenkins-***l/login?from=%2F - https://ci-**/login?from=%2F . . . relabel_configs: - source_labels: [*address*] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: *address* Samples still every 15s. 2023-05-15 09:45:00 0.0624 2023-05-15 09:45:15 0.0624 2023-05-15 09:45:30 0.0527 2023-05-15 09:45:45 0.0527 2023-05-15 09:46:00 0.0527 2023-05-15 09:46:15 0.0527 2023-05-15 09:46:30 0.0465 2023-05-15 09:46:45 0.0465 2023-05-15 09:47:00 0.0465 2023-05-15 09:47:15 0.0465 2023-05-15 09:47:30 0.0465 2023-05-15 09:47:45 0.0465 -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/d8d5c09c-96b1-4cdc-852e-4debcc64e33an%40googlegroups.com.
[prometheus-users] Re: Calculate Time Until Gauge Reaches Target
Also remember a separate alert for "*progress_current >= progress_goal*" if you need to know if the goal has already been met. The expression I gave only gives an expected time to reach the goal, if the goal *hasn't* already been met. On Thursday, 18 May 2023 at 09:07:24 UTC+1 Brian Candler wrote: > Grr, I think I messed up the threshold detection ("> progress_goal") on > the denominator. > > Take 2: > > expr: | > (*(progress_goal - progress_current) > 0)* / > (( predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], > 604800) *> progress_goal*) - *progress_current*) * 604800 > > On Thursday, 18 May 2023 at 08:57:57 UTC+1 Brian Candler wrote: > >> > I have seen a bunch of examples for predicting when a disk will fill, >> but these all seem to rely on the assumption that a gauge is trending >> downwards, and we are predicting when it meets zero. >> >> Here is a better recipe as a starting point, using predict_linear(): >> https://groups.google.com/g/prometheus-users/c/PCT4MJjFFgI/m/kVfOW069BQAJ >> >> > How can we write an expression that will tell us the time in seconds >> until progress_current meets *progress_goal*? >> >> In principle, I think you should just be able to replace the metric in >> that expression with (progress_goal - progress_current) and see when that >> hits zero. Something like this: >> >> expr: | >> (*(progress_goal - progress_current) > 0)* / (*(progress_goal - >> progress_current)* - >> ((*progress_goal -* >> predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], >> 604800)) < 0)) * 604800 >> >> (That's syntactically valid, but otherwise untested. Obviously you should >> remove or change the label selectors on progress_current to suit the metric >> in question). >> >> I see +progress_goal and -progress_goal in the denominator so it should >> be possible to simplify it further, I think to this: >> >> expr: | >> (*(progress_goal - progress_current) > 0)* / >> (( predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], >> 604800) *- progress_current*) > *progress_goal*) * 604800 >> >> The value of this expression is intended to return *the time until the >> goal is reached*, using linear interpolation over the past 12 hours, to >> predict where it will be in 7 days. If the goal isn't expected to be >> reached in 7 days then it will return no value. >> >> I can attempt to justify that expression graphically: >> >> | C2 ^ >> | / | >> | /| >> G +.^..x.| >> | | / D >> | N/ | >> | | /| >> | v C1 v >> | >> +---0d--7d> time >> >> G is the progress_goal, which I'm assuming is constant >> C1 is the value of progress_current now (time = 0d), which is less than G >> C2 is the future predicted value of progress_current (at time = 7d), >> which is greater than G >> >> N (numerator) is G - C1 >> D (denominator) is C2 - C1 >> >> And geometrically, I think the ratio of the crossover time (0d...x) to >> the full time (0d...7d) is the same as the ratio of N to D. Hence (N/D)*7d >> gives the time to crossover. >> >> Give it a go. Note that if progress_goal is a metric, rather than a >> constant, then it will need to have exactly the same label set as >> progress_current; or you will need to add some on(...) or ignoring(...) >> clauses as appropriate. >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/155a5585-f191-4e19-b59e-fa853532dde5n%40googlegroups.com.
[prometheus-users] Re: Calculate Time Until Gauge Reaches Target
Grr, I think I messed up the threshold detection ("> progress_goal") on the denominator. Take 2: expr: | (*(progress_goal - progress_current) > 0)* / (( predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 604800) *> progress_goal*) - *progress_current*) * 604800 On Thursday, 18 May 2023 at 08:57:57 UTC+1 Brian Candler wrote: > > I have seen a bunch of examples for predicting when a disk will fill, > but these all seem to rely on the assumption that a gauge is trending > downwards, and we are predicting when it meets zero. > > Here is a better recipe as a starting point, using predict_linear(): > https://groups.google.com/g/prometheus-users/c/PCT4MJjFFgI/m/kVfOW069BQAJ > > > How can we write an expression that will tell us the time in seconds > until progress_current meets *progress_goal*? > > In principle, I think you should just be able to replace the metric in > that expression with (progress_goal - progress_current) and see when that > hits zero. Something like this: > > expr: | > (*(progress_goal - progress_current) > 0)* / (*(progress_goal - > progress_current)* - > ((*progress_goal -* > predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], > 604800)) < 0)) * 604800 > > (That's syntactically valid, but otherwise untested. Obviously you should > remove or change the label selectors on progress_current to suit the metric > in question). > > I see +progress_goal and -progress_goal in the denominator so it should be > possible to simplify it further, I think to this: > > expr: | > (*(progress_goal - progress_current) > 0)* / > (( predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], > 604800) *- progress_current*) > *progress_goal*) * 604800 > > The value of this expression is intended to return *the time until the > goal is reached*, using linear interpolation over the past 12 hours, to > predict where it will be in 7 days. If the goal isn't expected to be > reached in 7 days then it will return no value. > > I can attempt to justify that expression graphically: > > | C2 ^ > | / | > | /| > G +.^..x.| > | | / D > | N/ | > | | /| > | v C1 v > | > +---0d--7d> time > > G is the progress_goal, which I'm assuming is constant > C1 is the value of progress_current now (time = 0d), which is less than G > C2 is the future predicted value of progress_current (at time = 7d), which > is greater than G > > N (numerator) is G - C1 > D (denominator) is C2 - C1 > > And geometrically, I think the ratio of the crossover time (0d...x) to the > full time (0d...7d) is the same as the ratio of N to D. Hence (N/D)*7d > gives the time to crossover. > > Give it a go. Note that if progress_goal is a metric, rather than a > constant, then it will need to have exactly the same label set as > progress_current; or you will need to add some on(...) or ignoring(...) > clauses as appropriate. > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/bcb15414-2ee4-4355-8302-b89913c2a22en%40googlegroups.com.
[prometheus-users] Re: Calculate Time Until Gauge Reaches Target
> I have seen a bunch of examples for predicting when a disk will fill, but these all seem to rely on the assumption that a gauge is trending downwards, and we are predicting when it meets zero. Here is a better recipe as a starting point, using predict_linear(): https://groups.google.com/g/prometheus-users/c/PCT4MJjFFgI/m/kVfOW069BQAJ > How can we write an expression that will tell us the time in seconds until progress_current meets *progress_goal*? In principle, I think you should just be able to replace the metric in that expression with (progress_goal - progress_current) and see when that hits zero. Something like this: expr: | (*(progress_goal - progress_current) > 0)* / (*(progress_goal - progress_current)* - ((*progress_goal -* predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 604800)) < 0)) * 604800 (That's syntactically valid, but otherwise untested. Obviously you should remove or change the label selectors on progress_current to suit the metric in question). I see +progress_goal and -progress_goal in the denominator so it should be possible to simplify it further, I think to this: expr: | (*(progress_goal - progress_current) > 0)* / (( predict_linear(*progress_current*{fstype!~"fuse.*|nfs.*"}[12h], 604800) *- progress_current*) > *progress_goal*) * 604800 The value of this expression is intended to return *the time until the goal is reached*, using linear interpolation over the past 12 hours, to predict where it will be in 7 days. If the goal isn't expected to be reached in 7 days then it will return no value. I can attempt to justify that expression graphically: | C2 ^ | / | | /| G +.^..x.| | | / D | N/ | | | /| | v C1 v | +---0d--7d> time G is the progress_goal, which I'm assuming is constant C1 is the value of progress_current now (time = 0d), which is less than G C2 is the future predicted value of progress_current (at time = 7d), which is greater than G N (numerator) is G - C1 D (denominator) is C2 - C1 And geometrically, I think the ratio of the crossover time (0d...x) to the full time (0d...7d) is the same as the ratio of N to D. Hence (N/D)*7d gives the time to crossover. Give it a go. Note that if progress_goal is a metric, rather than a constant, then it will need to have exactly the same label set as progress_current; or you will need to add some on(...) or ignoring(...) clauses as appropriate. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1aa53b61-d37a-4ac3-85c5-10636bbb75acn%40googlegroups.com.