Re: [prometheus-users] Counter metric resets

2022-04-07 Thread Yaron Bialik
Thanks!

On Thu, Apr 7, 2022, 19:04 Julius Volz  wrote:

> On Thu, Apr 7, 2022 at 4:52 PM Stuart Clark 
> wrote:
>
>> On 07/04/2022 14:04, Yaron B wrote:
>> > Hello,
>> > we have a counter metric that counts each time a pod is doing a
>> > specific action.
>> > I need to count how many times the pod (actually sum of all the pods
>> > from a certain deployment) did the action over 24 hours.
>> > problem is, the pod is on spot, and when it gets restarted, the
>> > counter resets, so the metric might be 20 at 1:00, but at 2:00 it
>> > might be 3, so when I try to do delta, or sum over time, I am getting
>> > wrong results..
>> > any ideas how can I get the real delta for the action in a 24 hours
>> range?
>>
>> Look at using rate() which handles counter resets. If you multiply the
>> value produced by the time period it is over you would get the number of
>> actions that occurred.
>>
>
> That sounds equivalent to just using increase() - increase() is identical
> to rate(), except that it does not convert the unit to be per-second, but
> keeps it per-whatever-time-interval-you-specified.
>
> But yep, with metrics and resets, this is only ever going to be an
> estimate, and both rate() and increase() do some extrapolation, see also
> https://promlabs.com/blog/2021/01/29/how-exactly-does-promql-calculate-rates
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAPe6rojb4mfFEeL0y_%2BmM-tu0wnSSuk_syfD%3DaDXqifYUkjNug%40mail.gmail.com.


Re: [prometheus-users] CMDB config

2022-04-07 Thread Julius Volz
Are you talking about config management databases in general, or some
specific one?

In either case, you likely will have to use the http_sd or file_sd custom
service discovery integrations to build your own discovery based on your
CMDB (unless you find something on Google for it already).

See:

* https://prometheus.io/docs/guides/file-sd/
* https://prometheus.io/blog/2018/07/05/implementing-custom-sd/
* https://prometheus.io/docs/prometheus/latest/http_sd/

On Thu, Apr 7, 2022 at 12:18 PM ritesh patel 
wrote:

> Hello Team,
>
> Anyone have CMDB config for Prometheus.yml for target’s.
>
> How i can add targets from CMDB directly?
>
> Thanks and regards
> Ritesh patel
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/CAPxUNF9HO%3DrXqDJ_7K5SHpn-nfWgTExGSt2VY8gY0FSZO2TePQ%40mail.gmail.com
> 
> .
>


-- 
Julius Volz
PromLabs - promlabs.com

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAObpH5yFph_QJUa3kdSM5J%3D%2Bkcgi8Jipqt%3Db7az%3D90U5%3D48F6w%40mail.gmail.com.


Re: [prometheus-users] Counter metric resets

2022-04-07 Thread Julius Volz
On Thu, Apr 7, 2022 at 4:52 PM Stuart Clark 
wrote:

> On 07/04/2022 14:04, Yaron B wrote:
> > Hello,
> > we have a counter metric that counts each time a pod is doing a
> > specific action.
> > I need to count how many times the pod (actually sum of all the pods
> > from a certain deployment) did the action over 24 hours.
> > problem is, the pod is on spot, and when it gets restarted, the
> > counter resets, so the metric might be 20 at 1:00, but at 2:00 it
> > might be 3, so when I try to do delta, or sum over time, I am getting
> > wrong results..
> > any ideas how can I get the real delta for the action in a 24 hours
> range?
>
> Look at using rate() which handles counter resets. If you multiply the
> value produced by the time period it is over you would get the number of
> actions that occurred.
>

That sounds equivalent to just using increase() - increase() is identical
to rate(), except that it does not convert the unit to be per-second, but
keeps it per-whatever-time-interval-you-specified.

But yep, with metrics and resets, this is only ever going to be an
estimate, and both rate() and increase() do some extrapolation, see also
https://promlabs.com/blog/2021/01/29/how-exactly-does-promql-calculate-rates
.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAObpH5yzyj93gzc6%2BnJnAARwM4Wdc-YG6JqBhJLLC7TbfLfMuA%40mail.gmail.com.


Re: [prometheus-users] Counter metric resets

2022-04-07 Thread Stuart Clark

On 07/04/2022 14:04, Yaron B wrote:

Hello,
we have a counter metric that counts each time a pod is doing a 
specific action.
I need to count how many times the pod (actually sum of all the pods 
from a certain deployment) did the action over 24 hours.
problem is, the pod is on spot, and when it gets restarted, the 
counter resets, so the metric might be 20 at 1:00, but at 2:00 it 
might be 3, so when I try to do delta, or sum over time, I am getting 
wrong results..

any ideas how can I get the real delta for the action in a 24 hours range?


Look at using rate() which handles counter resets. If you multiply the 
value produced by the time period it is over you would get the number of 
actions that occurred. Note that this will only ever be an estimate (for 
example you might not scrape a pod before it is destroyed, missing the 
detection of some actions) and will most likely not be an integer (due 
to the way interpolation happens).


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2cf67da0-1566-3cdf-f467-8eda19ac7b9f%40Jahingo.com.


[prometheus-users] Counter metric resets

2022-04-07 Thread Yaron B
Hello, 
we have a counter metric that counts each time a pod is doing a specific 
action.
I need to count how many times the pod (actually sum of all the pods from a 
certain deployment) did the action over 24 hours.
problem is, the pod is on spot, and when it gets restarted, the counter 
resets, so the metric might be 20 at 1:00, but at 2:00 it might be 3, so 
when I try to do delta, or sum over time, I am getting wrong results..
any ideas how can I get the real delta for the action in a 24 hours range?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c750f114-f8b5-4cb8-9efb-d47f2f6807e3n%40googlegroups.com.


[prometheus-users] Re: Calculate 95th for network traffic per month

2022-04-07 Thread Brian Candler
Yes, see quantile_over_time() 

.

Here are some example recording rules, to calculate this continuously:

groups:
- name: bandwidth_percentiles_daily
  interval: 5m
  rules:
  - record: interface:in_octets:rate5m_95th_24h
expr: quantile_over_time(0.95, 
rate(ifHCInOctets{instance="",ifName=""}[10m])[24h:5m])
  - record: interface:out_octets:rate5m_95th_24h
expr: quantile_over_time(0.95, 
rate(ifHCOutOctets{instance="",ifName=""}[10m])[24h:5m])

- name: bandwidth_percentiles_monthly
  interval: 1h
  rules:
  - record: interface:in_octets:rate5m_95th_30d
expr: quantile_over_time(0.95, 
rate(ifHCInOctets{instance="",ifName=""}[10m])[30d:5m])
  - record: interface:out_octets:rate5m_95th_30d
expr: quantile_over_time(0.95, 
rate(ifHCOutOctets{instance="",ifName=""}[10m])[30d:5m])

Each metric gives the 95th-percentile of the rate taken at 5 minute 
intervals, over the preceding 24 hours or 30 days respectively.

You can of course use these expressions directly in the PromQL browser for 
ad-hoc queries.  You should be able to use the new modifier "@time 
" 
to perform the query at a given instant in time, so will calculate over the 
24 hours or 30 days before that time.

On Thursday, 7 April 2022 at 08:49:45 UTC+1 romu...@gmail.com wrote:

> Hello.
> Is it possible to calculate 95th of network traffic in Prometheus ?
> Thank you!
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f4a083f4-0ede-4843-a0ed-f50da8858c7dn%40googlegroups.com.


[prometheus-users] CMDB config

2022-04-07 Thread ritesh patel
Hello Team,

Anyone have CMDB config for Prometheus.yml for target’s.

How i can add targets from CMDB directly?

Thanks and regards
Ritesh patel

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAPxUNF9HO%3DrXqDJ_7K5SHpn-nfWgTExGSt2VY8gY0FSZO2TePQ%40mail.gmail.com.


[prometheus-users] Calculate 95th for network traffic per month

2022-04-07 Thread Roman Melnyk
Hello.
Is it possible to calculate 95th of network traffic in Prometheus ?
Thank you!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4b275b2b-6e42-4c13-8b66-b75a9849ebc8n%40googlegroups.com.


Re: [prometheus-users] Re: Big spike when using rate() - doesn't seem to be a counter reset

2022-04-07 Thread Brian Candler
There are at least three different prometheus servers running then?
- 2 x prometheus in kubernetes that you deployed yourself
- 1 x AWS managed prometheus

Which of those are you querying from Grafana, and which were you querying 
for the direct queries you showed?
How do you get data from the 2 x prometheus into the 1 x AWS managed 
prometheus?  e.g. remote write, or federation?

If the two local kubernetes servers are both scraping the same targets and 
both writing into the same AWS instance, then you need to set different 
"external_labels" on them, so that they create two distinct timeseries in 
AWS.  If not, you'll get duplicate data points with different timestamps 
which sounds very likely to give the problem you describe.

On Wednesday, 6 April 2022 at 20:23:21 UTC+1 he...@samwho.dev wrote:

> I appreciate your time. I’ve logged off for the day but will get back to 
> you tomorrow with more data.
>
> To answer the question I can: we aren’t using any proxy software to my 
> knowledge. We use the 
> https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus
>  Helm 
> chart (version 13.8.0) hooked up to store data in AWS’s managed Prometheus 
> product.
>
> That said, we do run it in StatefulSet mode with 2 replicas. I wonder if 
> that’s causing problems.
>
> On Wed, 6 Apr 2022, at 19:49, Brian Candler wrote:
>
> Are you going through any middleware or proxy, like promxy?
>
> rate(foo[1m]) should definitely give no answer at all, when the timeseries 
> data is sampled at 1 minute intervals.
>
> Here is a working query_range for rate[1m] where the scrape interval is 
> 15s:
>
> # curl -Ssg '
> http://localhost:9090/api/v1/query_range?query=rate(ifHCInOctets{instance="gw1",ifName="ether1"}[60s])=1649264340=1649264640=60'
>  
> | python3 -m json.tool
> {
> "status": "success",
> "data": {
> "resultType": "matrix",
> "result": [
> {
> "metric": {
> "ifIndex": "16",
> "ifName": "ether1",
> "instance": "gw1",
> "job": "snmp",
> "module": "mikrotik_secret",
> "netbox_type": "device"
> },
> "values": [
> [
> 1649264340,
> "578.6"
> ],
> [
> 1649264400,
> "651.42221"
> ],
> [
> 1649264460,
> "135.18"
> ],
> [
> 1649264520,
> "1699.4"
> ],
> [
> 1649264580,
> "441.5"
> ],
> [
> 1649264640,
> "39768.088"
> ]
> ]
> }
> ]
> }
> }
>
> But if I make exactly the same query but with rate[15s] then there are no 
> answers:
>
> # curl -Ssg '
> http://localhost:9090/api/v1/query_range?query=rate(ifHCInOctets{instance="gw1",ifName="ether1"}[15s])=1649264340=1649264640=60'
>  
> | python3 -m json.tool
> {
> "status": "success",
> "data": {
> "resultType": "matrix",
> "result": []
> }
> }
>
> I think the real reason for your problem is hidden; you're obfuscating the 
> query and metric names, and I suspect it's hidden behind that.  Sorry, I 
> can't help you any further given what I can see, but hopefully you have an 
> idea where you can look further.
>
> On Wednesday, 6 April 2022 at 18:45:10 UTC+1 he...@samwho.dev wrote:
>
> Hey Brian,
>
> In the original post I put the output of the raw time series as gathered 
> the way you suggest. I'll copy it again below:
>
> {
> "data": {
> "result": [
> {
> "metric": {/* redacted */},
> "values": [
> [
> 1649239253.4,
> "225201"
> ],
> [
> 1649239313.4,
> "225226"
> ],
> [
> 1649239373.4,
> "225249"
> ],
> [
> 1649239433.4,
> "225262"
> ],
> [
> 1649239493.4,
> "225278"
> ],
> [
> 1649239553.4,
> "225310"
> ],
> [
> 1649239613.4,
> "225329"
> ],
>