[prometheus-users] Re: failed to start prometheus on centos9

2023-05-31 Thread Daniel Swarbrick
This is a simple filesystem permissions issue, as you can see by these two 
log messages:

May 26 07:09:35 grafana prometheus[43908]: ts=2023-05-26T07:09:35.475Z 
caller=query_logger.go:113 level=error component=activeQueryTracker 
msg="Failed to create directory for logging active queries"
May 26 07:09:35 grafana prometheus[43908]: ts=2023-05-26T07:09:35.475Z 
caller=query_logger.go:91 level=error component=activeQueryTracker 
msg="Error opening query log file" file=/var/lib/prometheus/queries.active 
err="open /var/lib/prometheus/queries.active: no such file or directory"

Make sure that either /var/lib/prometheus already exists and is writable by 
the process, or that /var/lib is writable by the process (so that it can 
create the prometheus directory itself).
On Wednesday, May 31, 2023 at 12:39:16 PM UTC+2 Gharbi abdelaziz wrote:

> × prometheus.service - Prometheus
>  Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; 
> preset: disabled)
>  Active: failed (Result: exit-code) since Fri 2023-05-26 07:09:35 UTC; 
> 21min ago
>Duration: 45ms
> Process: 43908 ExecStart=/usr/local/bin/prometheus --config.file 
> /etc/prometheus/prometheus.yml --storage.tsdb.path /var/lib/prometheus/ 
> --web.console.templates=/etc/prometheus/consoles 
> --web.console.libraries=/etc/prometheus/console_libraries (code=exited, 
> status=2)
>Main PID: 43908 (code=exited, status=2)
> CPU: 32ms
>
> May 26 07:09:35 grafana prometheus[43908]: ts=2023-05-26T07:09:35.475Z 
> caller=query_logger.go:113 level=error component=activeQueryTracker 
> msg="Failed to create directory for logging active queries"
> May 26 07:09:35 grafana prometheus[43908]: ts=2023-05-26T07:09:35.475Z 
> caller=query_logger.go:91 level=error component=activeQueryTracker 
> msg="Error opening query log file" file=/var/lib/prometheus/queries.active 
> err="open /var/lib/prometheus/queries.active: no such file or directory"
> May 26 07:09:35 grafana prometheus[43908]: panic: Unable to create mmap-ed 
> active query log
> May 26 07:09:35 grafana prometheus[43908]: goroutine 1 [running]:
> May 26 07:09:35 grafana prometheus[43908]: 
> github.com/prometheus/prometheus/promql.NewActiveQueryTracker({0x7ffe15a37e80 
> ,
>  
> 0x14}, 0x14, {0x38169e0, 0xc000805a90})
> May 26 07:09:35 grafana prometheus[43908]: 
> /app/promql/query_logger.go:121 +0x3cd
> May 26 07:09:35 grafana prometheus[43908]: main.main()
> May 26 07:09:35 grafana prometheus[43908]: 
> /app/cmd/prometheus/main.go:597 +0x6713
> May 26 07:09:35 grafana systemd[1]: prometheus.service: Main process 
> exited, code=exited, status=2/INVALIDARGUMENT
> May 26 07:09:35 grafana systemd[1]: prometheus.service: Failed with result 
> 'exit-code'.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/865eaf54-9d95-41e7-877f-62c3be0768b7n%40googlegroups.com.


Re: [prometheus-users] Blackbox exporter http module check interval - how to change default 15s

2023-05-18 Thread Daniel Swarbrick
I guess this relates to 
https://github.com/prometheus/blackbox_exporter/issues/1051

On Thursday, May 18, 2023 at 4:33:46 PM UTC+2 Ben Kochie wrote:

> Based on what little I can derive from the "data" presented. This is 
> probably from Grafana, given that the timestamps are snapped to an even 15 
> seconds.
>
> And based on this, I'm guessing this is a misunderstanding of the lookback 
> / step interval, which will default to 15s in Grafana.
>
> There's no actual evidence (logs) provided that the exporter is actually 
> doing what they claim.
>
> On Thu, May 18, 2023 at 2:47 PM Stuart Clark  
> wrote:
>
>> On 2023-05-18 11:27, Paweł Błażejewski wrote:
>> > Hello,
>> > 
>> > Blackbox exporter http module check https site every 15s by default.
>> > Can you please tell me Is it posible to change this interval to 1
>> > minute. I add scrape_interval parametr in prometheus config as you see
>> > below, but it doesn't change anything. Samples are stiil every 15s.
>> > 
>> > can you please tell me how and where change it.
>> > I user prometheus 2.37.7, blackbox_exporter 0.23.0
>> > 
>> > prometheus config:
>> > 
>> >   * job_name: 'blackbox-http-csci-prod'
>> > scrape_interval: 60s
>> > scrape_timeout: 50s
>> > metrics_path: /blackbox/probe
>> > params:
>> > module: [http_2xx]
>> > static_configs:
>> > 
>> >   * targets:
>> > 
>> >   * https://ci-jenkins***/login?from=%2F
>> >   * https://ci-jenkins-***l/login?from=%2F
>> > 
>> >   * https://ci-**/login?from=%2F
>> > .
>> > .
>> > .
>> > relabel_configs:
>> > 
>> >   * source_labels: [ADDRESS]
>> > target_label: __param_target
>> >   * source_labels: [__param_target]
>> > target_label: instance* target_label: ADDRESS
>> > 
>> > Samples still every 15s.
>> > 
>> > 2023-05-15 09:45:00
>> > 0.0624
>> > 2023-05-15 09:45:15
>>
>>
>> How are you producing that data?
>>
>> Did you remember to reload/restart Prometheus after making the change to 
>> your config?
>>
>> -- 
>> Stuart Clark
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/89897569de531dc500edf3645dc6bf1b%40Jahingo.com
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/577eb978-7e8d-4700-a85d-ef0ac3bbcfe7n%40googlegroups.com.


[prometheus-users] Re: Hex-STRING instead of DisplayString = trailing NULL

2023-05-03 Thread Daniel Swarbrick
You can override the type of an OID to force snmp_exporter to always handle 
it as e.g. DisplayString in your generator.yml, even when dodgy snmp 
engines return it as different type than what the MIB specifies. Read the 
"overrides" section of 
https://github.com/prometheus/snmp_exporter/tree/main/generator#file-format

However, that won't prevent the null bytes from still appearing in the 
label value, and the only way (currently) to do that is with metric 
relabelling as you have already discovered. You might want to subscribe to 
this github issue: https://github.com/prometheus/snmp_exporter/issues/615

On Sunday, April 30, 2023 at 11:25:25 AM UTC+2 Jonathan Tougas wrote:

> I'm looking for a way to deal with a situation where we end up with null 
> characters trailing some label values: `count({ifDescr=~".*\x00"}) != 0`.
>
> The source of the problem seems to be with `ifDescr` returned as a 
> `Hex-String` instead of what the MIB says should be a `DisplayString`... 
> for __some__ servers.
>
> # Good,  99% of servers:
> $ snmpget -v 2c -c $creds 172.21.34.10 1.3.6.1.2.1.2.2.1.2.1
> iso.3.6.1.2.1.2.2.1.2.1 = STRING: "eth0"
>
> # Bad, Cisco CVP tsk tsk tsk...
> $ snmpget -v 2c -c $creds 172.20.220.88 1.3.6.1.2.1.2.2.1.2.1
> iso.3.6.1.2.1.2.2.1.2.1 = Hex-STRING: 53 6F 66 74 77 61 72 65 20 4C 6F 6F 
> 70 62 61 63
> 6B 20 49 6E 74 65 72 66 61 63 65 20 31 00
>
> I'm currently planning on using `metric_relabel_configs` to cleanup the 
> trailing nulls on these and other similar situations I uncovered. 
> Is there better way than mopping up like that? Perhaps snmp-exporter can 
> deal with these and convert somehow? I'm not familiar enough with it to 
> figure out if it can or not.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/398861ab-75a7-4e46-800c-21339b038924n%40googlegroups.com.


[prometheus-users] Re: HA Prometheus instances use different amount of storage

2023-05-03 Thread Daniel Swarbrick
This sounds like you might have run into the Go timer jitter bug. Try 
enabling timestamp tolerance to mitigate the effect of timer jitter: 
https://promlabs.com/blog/2021/09/14/whats-new-in-prometheus-2-30/#improving-storage-efficiency-by-tuning-timestamp-tolerances

I have so far not found a satisfactory resolution to this bug, and even 
though enabling timestamp jitter tolerance helps, I still occasionally see 
instances inexplicably using up to 2.5x their previous average bytes per 
sample after a restart. It seems to be a lucky dip. Restarting the instance 
usually settles back down to the original and expected bytes per sample.

On Wednesday, April 12, 2023 at 9:55:10 AM UTC+2 Per Carlson wrote:

> Hi.
>
> We have a pair of Prometheus instances that consume significantly 
> different amounts of storage. The instances are created by the same 
> StatefulSet (created by Prometheus-operator), so they are using the same 
> configuration.
>
> Both instances have similar number of samples and series, but instance 
> "0" consume up to ~50% more storage than instance "1".
>
> $ kubectl exec prometheus-prometheus-0 -- /bin/sh -c "promtool tsdb list ."
> BLOCK ULID  MIN TIME   MAX TIME   DURATION   
> NUM SAMPLES  NUM CHUNKS   NUM SERIES   SIZE
> 01GWY0R4N3QG1QJS957XZ0SYP7  168026400  168032880  18h0m0s   
>  3296299059   26900037 931315   7259935610
> 01GX05E79R6WNQ4F6MMB068WJ7  168032883  168039360  17h59m59.997s 
>  3312300492   27012299 892364   7265602468
> 01GX237SWZYZ6X5XXMENGMQ1YM  168039362  168045840  17h59m59.998s 
>  3315540127   27036907 894595   7247593445
> 01GX410BDAPBMPKP100300C7DW  168045841  168052320  17h59m59.999s 
>  3320458065   27130364 987454   7328750825
> 01GX5YTZVD5W97D497JA11CATT  168052327  168058800  17h59m59.993s 
>  3318443269   27135815 1007206  7380926789
> 01GX7WMF1FNJ6MGT0TJY2A5KEM  168058801  168065280  17h59m59.999s 
>  3331999517   27259726 1028363  7364976990
> 01GX9TDYMKVWJ9WYYY7CD8BCWH  168065285  168071760  17h59m59.995s 
>  3327868238   27186293 981912   7288127305
> 01GXBR7FYPSRWA6N6313MMR9BM  168071769  168078240  17h59m59.991s 
>  3327937718   27125975 896286   7199443835
> 01GXDP01QKKXC137B7JJZ6706W  168078241  168084720  17h59m59.999s 
>  037262   27172805 897459   7194002011
> 01GXFKTGVZN6RXRM74PXB5E61Q  168084721  168091200  17h59m59.999s 
>  3329211104   27134065 879001   7202044230
> 01GXHHM1JST118SYQNX8Z5W8PX  168091204  168097680  17h59m59.996s 
>  3329464442   27131788 876881   7192136400
> 01GXKFCM51YQFDAWTGP6BBXGQZ  168097683  168104160  17h59m59.997s 
>  3329134675   27127804 875877   7197030123
> 01GXND71ZF62MK8M1DP5E7345M  1681041600011  168110640  17h59m59.989s 
>  3327555787   27119184 887763   7216837469
> 01GXQB0QJX2T55FBFFSZC9PC4D  168110645  168117120  17h59m59.995s 
>  3324035858   27084455 871653   7195109123
> 01GXS8T0EJXHH2C1B0CBAEPHQB  1681171200011  168123600  17h59m59.989s 
>  3315573555   26493111 989655   6235040678
> 01GXSXCRNTNDJC359R7160ZEDX  168123601  168125760  5h59m59.999s   
> 1107306526   9028997  828578   1830084344
> 01GXSPFKRCVXHRSD2WAFEFM0WD  168125765  168126480  1h59m59.995s   
> 3697068393015597  808854   671597409
> 01GXSXBRED7JT7WJY9318QYKKZ  168126482  168127200  1h59m59.998s   
> 3696613863012000  805553   668951473
> 01GXT47FQYC712E0M6XPSP41FF  168127201  168127920  1h59m59.999s   
> 3697406283021714  823966   673649781
>
> $ kubectl exec prometheus-prometheus-1 -- /bin/sh -c "promtool tsdb list ."
> BLOCK ULID  MIN TIME   MAX TIME   DURATION   
> NUM SAMPLES  NUM CHUNKS   NUM SERIES   SIZE
> 01GWY0RDK93D2RYJBHJRDMS100  168026400  168032880  18h0m0s   
>  3296396516   26926127 957040   4831014683
> 01GX05ETVBDXMQH0KW9NX7RCPC  168032883  168039360  17h59m59.997s 
>  3312324642   27036260 917296   4807892522
> 01GX2383F7YPDX400MN4DQ9CSX  168039362  168045840  17h59m59.998s 
>  3315587751   27059963 918166   4832761551
> 01GX410PJXX52PKFVH1H205385  168045843  168052320  17h59m59.997s 
>  3320397897   27157090 1014022  4890962085
> 01GX5YVKEAQ6D1NZM1AQW0YJ90  168052323  168058800  17h59m59.997s 
>  3318472581   27171422 1042831  4854062752
> 01GX7WMWV41M3PW3BFV62P0M32  168058801  168065280  17h59m59.999s 
>  3331918609   27288755 1056267  4861196239
> 01GX9TECS126QJM1A1F61GW0ZT  168065283  168071760  17h59m59.997s 
>  3328065112   27214643 1008335  4831609465
> 01GXBR7NZ3RXVSP50V5J2QE4HQ  168071763  168078240  17h59m59.997s 
>  3327954927   27159515 929150   4800273178
> 01GXDP1BCZF7THHQTSK4YGAYX7  168078241  

Re: [prometheus-users] Alerts resolved upon prometheus crash

2020-03-05 Thread Daniel Swarbrick
By default, Alertmanager will consider alerts resolved if 5 minutes or more 
elapses without the alert firiing (resolve_timeout config option).

If your Prometheus instance crashes and takes more than 5 minutes to 
restart, it's highly likely that any previously firing alerts will be 
"resolved". If the alerting rule conditions still exist after the restart, 
new alerts will be fired.

On Wednesday, March 4, 2020 at 12:45:11 PM UTC+1, Julien Pivotto wrote:
>
> On 04 Mar 12:39, Julien Pivotto wrote: 
> > On 04 Mar 12:38, Julien Pivotto wrote: 
> > > Hello there, 
> > > 
> > > We are running a pair of HA prometheis and HA alertmanagers. 
> > > 
> > > One prometheus server OOM'd; and restarted. When it was down, we 
> > > received alert resolution notifications from the alertmanager: 
> > > 
> > > > resolved (duration: 115h45m0s) 
> > > 
> > > But a few seconds after: 
> > > 
> > > > firing (duration: 115h52m16s) 
> > > 
> > > I would have expected that the second prometheus, which had the alert 
> > > all the time and was working as expected, would have prevented the 
> alert 
> > > to disappear. 
> > > 
> > > Note that the alert does NOT have a `for` clause. 
> > > 
> > > There is an entry at 9:44:39, then the server drops, and the alert is 
> > > firing again at 9:53. Note: We received the new "firing" at 9:52, with 
> included 115h52m16s of duration. 
> > > 
> > > Both Prometheis servers send alerts to both alertmanagers. 
> > > 
> > > 
> > > What can have appened here? 
> > > 
> > > Our evaluation_interval is 1m, and resend-delay is default. 
> > > 
> > > -- 
> > >  (o-Julien Pivotto 
> > >  //\Open-Source Consultant 
> > >  V_/_   Inuits - https://www.inuits.eu 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google 
> Groups "Prometheus Users" group. 
> > > To unsubscribe from this group and stop receiving emails from it, send 
> an email to promethe...@googlegroups.com . 
> > > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/20200304113821.GA19241%40oxygen.
>  
>
> > 
> > Note: alertmanagers are 0.20.0 pulled from GH releases and both 
> > prometheus are 2.16.0 pulled from GH releases too. 
>
>
> When I look at the metrics, it looks like 
> rate(alertmanager_alerts_received_total[5m]) is showing a lot of 
> 'resolved' at that time. It it possible that Prometheus somehow sends 
> resolved alerts when TSDB is not yet ready? And because those rules were 
> running for a long time, we tried to restore them ? 
>
> regards, 
>
>
> -- 
>  (o-Julien Pivotto 
>  //\Open-Source Consultant 
>  V_/_   Inuits - https://www.inuits.eu 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c78909f5-1f22-4e2a-a276-794408a8dae5%40googlegroups.com.


[prometheus-users] Re: Snmp exporter - value preprocessing

2020-03-03 Thread Daniel Swarbrick
Have you considered using recording rules?

https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

On Monday, March 2, 2020 at 2:53:05 AM UTC+1, Виталий Ковалев wrote:
>
> Hello. Is there any way to do value preprocessing after lookup?
> On my Network i have D-Link and Huawei switches. I use snmp_exporter to 
> get Optical signal levels(DDM).
> The main issue is that Huawei switches give values in Dbm and Dlink 
> switches give values in mW.
> Also i have some devices, for which your values should be divided by 10.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4f66ddb2-2478-4bb2-968d-67fd7c2fce99%40googlegroups.com.