Re: [prometheus-users] Prometheus collect the metric fail

2021-11-12 Thread Mike Spreitzer


Would it be so hard to make the error message include the position where
the problem was?

Thanks,
Mike

prometheus-users@googlegroups.com wrote on 11/12/2021 01:06:11 AM:

> From: "易Richard" 
> To: "Prometheus Users" 
> Date: 11/12/2021 01:06 AM
> Subject: [EXTERNAL] [prometheus-users] Prometheus collect the metric fail
> Sent by: prometheus-users@googlegroups.com
>
> Prometheus collect endpoint fail. The error msg is "expected equal,
> got INVALID" I checked the stderr output but didn't find any clue
> about the error. The metric content from the endpoint fail to
> collect seems normal. ‍‍ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
> ZjQcmQRYFpfptBannerEnd
> Prometheus collect endpoint fail. The error msg is "expected equal,
> got INVALID"
> I checked the stderr output but didn't find any clue about the error.
>
> [image removed]
>
> The metric content  from the endpoint fail to collect seems normal.
> Do I need upload the whole metric content  cause it's too long?
> --
> You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/
> d/msgid/prometheus-users/
> a9b67778-4657-4dd9-9a01-044094f7bc3an%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/OF7B5A36C9.6EBD5867-ON8525878B.0058760F-8525878B.005880EE%40ibm.com.


Re: [prometheus-users] Re: my counters start at zero

2020-07-31 Thread Mike Spreitzer
I have a specific scenario.  I have counters that start at zero when the 
scraped process starts; they are counting something that happens in the 
scraped process.  If a counter first appears with a non-zero value, I know 
all those counts happened since the previous scrape.  I am not asserting 
that `rate()` should be changed for everybody.  Is there a PromQL query I 
can write that will behave similarly to `rate()` but will recognize that 
an initial non-zero count is due to increments since the previous scrape 
of the same process (yes, restricted to the situations where the process 
has been scraped before)?

Thanks,
Mike

prometheus-users@googlegroups.com wrote on 07/29/2020 03:27:03 AM:

> From: Brian Candler 
> To: Prometheus Users 
> Date: 07/29/2020 03:27 AM
> Subject: [EXTERNAL] [prometheus-users] Re: my counters start at zero
> Sent by: prometheus-users@googlegroups.com
> 
> rate() calculates the rate between the first and last available 
> samples in the given time window, as long as there are at least two 
samples.
> 
> irate() calculates the rate between the last two samples in the 
> given time window.
> 
> On Wednesday, 29 July 2020 05:25:04 UTC+1, Mike Spreitzer wrote:
> Now suppose instead that foo first shows up in a scrape at time t0 
> with a value of 10, and in every scrape after that the value of foo 
> is also 10.  What will `rate(foo[60s])` give me?  If I understand 
> correctly, it will give me nothing until time t0+60s, and from then 
> on it will give me zero.  Have I got this right?
> 
> It will show a rate of 0 as soon as two values are available, that 
> is, from t0+10s onwards.
> 
> If a new counter appears with value 10, it tells you nothing about 
> rate just before the counter appeared.  It maybe that scraping was 
> broken, and the counter had value 10 for the last year.  It could be
> that the counter had being going 1-2-3-4-5-6-7-8-9-10 at intervals 
> of 10 seconds.  Or at intervals of 1 week.
> 
> As a real-world example, it is very common to start polling an SNMP 
> device and find its interface byte counters already at huge values, 
> reflecting how much traffic has been carried in total by that 
> interface since the device was powered on.  It would be completely 
> wrong to have an enormous blip which effectively compresses months 
> or years of traffic into one sample interval.
> -- 
> You received this message because you are subscribed to the Google 
> Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, 
> send an email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/
> d/msgid/prometheus-users/b9dfe865-3be6-414f-
> b6f9-7e55caa52196o%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/OF2A889B1D.27215406-ON852585B6.00240C50-852585B6.00247512%40notes.na.collabserv.com.


[prometheus-users] my counters start at zero

2020-07-28 Thread Mike Spreitzer
Suppose I have a counter metric; let's name it `foo`.  Suppose foo first 
shows up with a value of 0 in a scrape at time t0, shows up with a value of 
10 in a scrape at time t0+10s, and has value 10 in all subsequent scrapes.  
What will the PromQL expression `rate(foo[60s])` get me?  I suppose nothing 
until time t0+60s; some non-zero value from t0+60s to t0+70s; and zero from 
t0+70s onward.  Is that right?  If not, what will I get?

Now suppose instead that foo first shows up in a scrape at time t0 with a 
value of 10, and in every scrape after that the value of foo is also 10.  
What will `rate(foo[60s])` give me?  If I understand correctly, it will 
give me nothing until time t0+60s, and from then on it will give me zero.  
Have I got this right?  That is a rather disappointing answer.  This 
counter really did start at zero, and got 10 increments before the first 
scrape.  It would be gratifying to have a PromQL query that shows this blip 
of activity.  Can I write a different PromQL query that will get this 
result?  While retaining all the other smarts of `rate`?

Thanks,
Mike

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/954b128d-fdf9-4829-86c7-8645fde91cb3o%40googlegroups.com.


[prometheus-users] Re: No joy from Prometheus snapshot

2020-07-06 Thread Mike Spreitzer
Interestingly, the Prometheus server running against the new snapshot 
logged some compactions after about a minute.

...
level=info ts=2020-07-07T01:57:08.694Z caller=main.go:646 msg="Server is 
ready to receive web requests."
level=info ts=2020-07-07T01:58:18.185Z caller=compact.go:441 component=tsdb 
msg="compact blocks" count=3 mint=159401520 maxt=159403680 ulid=
01ECKFX8HKYEP8Y2YPX2WB0PC3 sources="[01ECHNMH8MRPZYHM8KXKBE60ZA 
01ECHWG8GCAZWMW1JDZB9ERQ0N 01ECJ3BZRM3SP7XQAZRJ867C30]" duration=
9.494006165s
level=info ts=2020-07-07T01:58:27.860Z caller=compact.go:441 component=tsdb 
msg="compact blocks" count=3 mint=159403680 maxt=159405840 ulid=
01ECKFXHV47D51X0GREQM6QZE3 sources="[01ECJA7Q18RT0VW9S5HKSVDP0J 
01ECJH3E8ENEWVK8E41G5XWM71 01ECJQZ5GB6TGV21QMHCWZ3JFY]" duration=9.64782793s
level=info ts=2020-07-07T01:58:37.619Z caller=compact.go:441 component=tsdb 
msg="compact blocks" count=3 mint=159405840 maxt=159408000 ulid=
01ECKFXV9PSG91VCTAMAVKZ1YR sources="[01ECJYTWRJZG6YSGH3XBTJK6CX 
01ECK5PQETDE8VDRSR6NZ6FGXQ 01ECKCJEPAJ19AJ7DXJY6K6P6B]" duration=
9.725054346s
level=info ts=2020-07-07T01:58:53.953Z caller=compact.go:441 component=tsdb 
msg="compact blocks" count=3 mint=159401520 maxt=159408000 ulid=
01ECKFY4TFCY2J3KQ30VSZFRKS sources="[01ECKFX8HKYEP8Y2YPX2WB0PC3 
01ECKFXHV47D51X0GREQM6QZE3 01ECKFXV9PSG91VCTAMAVKZ1YR]" duration=
16.305728942s

This server logged finding 10 blocks when it started (see my previous 
email); those compactions compacted the first 9 blocks into one.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/67ceeb43-6e7c-43b4-96a1-519d7405c6dao%40googlegroups.com.


[prometheus-users] Re: No joy from Prometheus snapshot

2020-07-06 Thread Mike Spreitzer
I suspected the relative directory and then noticed the double equal in the 
prometheus command line.  So I erased the old snapshot and tried again.

sysop@r26data0:/var/lib/prometheus/snapshots$ ls
sysop@r26data0:/var/lib/prometheus/snapshots$ curl -X POST http:
//localhost:30909/api/v1/admin/tsdb/snapshot
{"status":"success","data":{"name":"20200707T015331Z-b7fbfbcafd915bb"}}
sysop@r26data0:/var/lib/prometheus/snapshots$ 
sysop@r26data0:/var/lib/prometheus/snapshots$ ls -la
total 12
drwxr-xr-x  3 nobody nogroup 4096 Jul  7 01:53 .
drwxr-xr-x 14 nobody   65533 4096 Jul  7 00:59 ..
drwxr-xr-x 12 nobody nogroup 4096 Jul  7 01:53 20200707T015331Z-
b7fbfbcafd915bb

Next I run the server again.  This time it logs messages about finding data 
blocks.

sysop@r26data0:/var/lib/prometheus/snapshots/20200707T015331Z-b7fbfbcafd915bb$ 
sudo -u nobody ~/prometheus --storage.tsdb.path=$PWD 
 --web.enable-admin-api --config.file=$HOME/prom-config/config.yaml
level=info ts=2020-07-07T01:57:08.662Z caller=main.go:302 msg="No time or 
size retention was set so using the default time retention" duration=15d
level=info ts=2020-07-07T01:57:08.662Z caller=main.go:337 msg="Starting 
Prometheus" version="(version=2.19.1, branch=HEAD, 
revision=eba3fdcbf0d378b66600281903e3aab515732b39)"
level=info ts=2020-07-07T01:57:08.662Z caller=main.go:338 
build_context="(go=go1.14.4, 
user=root@62700b3d0ef9, date=20200618-16:35:26)"
level=info ts=2020-07-07T01:57:08.662Z caller=main.go:339 host_details="(Linux 
4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 
r26data0 (none))"
level=info ts=2020-07-07T01:57:08.662Z caller=main.go:340 
fd_limits="(soft=65535, 
hard=65535)"
level=info ts=2020-07-07T01:57:08.662Z caller=main.go:341 
vm_limits="(soft=unlimited, 
hard=unlimited)"
level=info ts=2020-07-07T01:57:08.664Z caller=main.go:678 msg="Starting 
TSDB ..."
level=info ts=2020-07-07T01:57:08.664Z caller=web.go:524 component=web 
msg="Start 
listening for connections" address=0.0.0.0:9090
level=info ts=2020-07-07T01:57:08.664Z caller=repair.go:59 component=tsdb 
msg="Found healthy block" mint=159401520 maxt=159402240 ulid=
01ECHNMH8MRPZYHM8KXKBE60ZA
level=info ts=2020-07-07T01:57:08.664Z caller=repair.go:59 component=tsdb 
msg="Found healthy block" mint=159402240 maxt=159402960 ulid=
01ECHWG8GCAZWMW1JDZB9ERQ0N
level=info ts=2020-07-07T01:57:08.664Z caller=repair.go:59 component=tsdb 
msg="Found healthy block" mint=159402960 maxt=159403680 ulid=
01ECJ3BZRM3SP7XQAZRJ867C30
level=info ts=2020-07-07T01:57:08.664Z caller=repair.go:59 component=tsdb 
msg="Found healthy block" mint=159403680 maxt=159404400 ulid=
01ECJA7Q18RT0VW9S5HKSVDP0J
level=info ts=2020-07-07T01:57:08.664Z caller=repair.go:59 component=tsdb 
msg="Found healthy block" mint=159404400 maxt=159405120 ulid=
01ECJH3E8ENEWVK8E41G5XWM71
level=info ts=2020-07-07T01:57:08.664Z caller=repair.go:59 component=tsdb 
msg="Found healthy block" mint=159405120 maxt=159405840 ulid=
01ECJQZ5GB6TGV21QMHCWZ3JFY
level=info ts=2020-07-07T01:57:08.665Z caller=repair.go:59 component=tsdb 
msg="Found healthy block" mint=159405840 maxt=159406560 ulid=
01ECJYTWRJZG6YSGH3XBTJK6CX
level=info ts=2020-07-07T01:57:08.665Z caller=repair.go:59 component=tsdb 
msg="Found healthy block" mint=159406560 maxt=159407280 ulid=
01ECK5PQETDE8VDRSR6NZ6FGXQ
level=info ts=2020-07-07T01:57:08.665Z caller=repair.go:59 component=tsdb 
msg="Found healthy block" mint=159407280 maxt=159408000 ulid=
01ECKCJEPAJ19AJ7DXJY6K6P6B
level=info ts=2020-07-07T01:57:08.665Z caller=repair.go:59 component=tsdb 
msg="Found healthy block" mint=159408000 maxt=1594086829277 ulid=
01ECKFMSZ5XDWD5296R3C8ZXZ6
level=info ts=2020-07-07T01:57:08.688Z caller=head.go:645 component=tsdb msg
="Replaying WAL and on-disk memory mappable chunks if any, this may take a 
while"
level=info ts=2020-07-07T01:57:08.688Z caller=head.go:706 component=tsdb msg
="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-07-07T01:57:08.688Z caller=head.go:709 component=tsdb msg
="WAL replay completed" duration=365.54µs
level=info ts=2020-07-07T01:57:08.690Z caller=main.go:694 fs_type=
EXT4_SUPER_MAGIC
level=info ts=2020-07-07T01:57:08.690Z caller=main.go:695 msg="TSDB started"
level=info ts=2020-07-07T01:57:08.690Z caller=main.go:799 msg="Loading 
configuration file" filename=/home/sysop/prom-config/config.yaml
level=info ts=2020-07-07T01:57:08.694Z caller=main.go:827 msg="Completed 
loading of configuration file" filename=/home/sysop/prom-config/config.yaml
level=info ts=2020-07-07T01:57:08.694Z caller=main.go:646 msg="Server is 
ready to receive web requests."

The last data block ends about 5 minutes ago.

sysop@r26data0:~$ date --date @1594086829
Tue Jul  7 01:53:49 UTC 2020

But still no data.

sysop@r26data0:~$ curl http://localhost:9090/api/v1/metadata
{"status":"success","data":{}}sysop@r26data0:~$ 

The web UI shows the following for "/status".

Runtime