Re: [prometheus-users] can node_exporter expose aggregated node_cpu_seconds_total?

2023-02-02 Thread Ben Kochie
The node_exporter exposes per-cpu metrics because that's what most users
want. Knowing about per-core saturation, single-core IO wait, etc are
extremely useful and common use cases.

Using a recording rule is recommended.

On Thu, Feb 2, 2023 at 10:05 AM koly li  wrote:

> If using a recording rule to aggerate data, then I have to store both the
> per core samples and metric samples in the same prometheus, which costs
> lots of memory.
>
> After some investigation on node_exporter sourcecode, I found:
> 1. updateStat(cpu_linux.go
> )
> function reads the content of /proc/stat file and generate the
> node_cpu_seconds_total samples per core
> 2. updateStat function calls c.fs.Stat() to read and parse the content of
> /proc/stat file
> 3. fs.Stat() function parse the /proc/stat file and store the cpu total
> statics to Stat.CPUTotal(stat.go
> )
> 4. However, updateStat function ignores the Stat.CPUTotal, it only uses
> the stats.CPU which contains info per core
>
> so, the question is why node_exporter developers don't use the CPUTotal to
> expose a total cpu statics? Should the new metrics about total usage
> statics be added to node-exporter?
>
>
> On Thursday, February 2, 2023 at 2:40:34 PM UTC+8 Stuart Clark wrote:
> On 02/02/2023 06:26, koly li wrote:
> Hi,
>
> Currently, node_exporter exposes time series for each cpu core (an example
> below), which generates a lot of data in a large cluster (10k nodes
> cluster). However, we only care about total cpu usage instead of usage per
> core. So is there a way for node_exporter to only
> expose aggregated node_cpu_seconds_total?
>
> we also notice there is an discussion here (reduce cardinality of
> node_cpu_seconds_total
> ), but
> it seems no conclusion.
>
>
> node_cpu_seconds_total{container="node-exporter",cpu="85",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
> 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="system",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
> 9077.24 1675059665571
>
> node_cpu_seconds_total{container="node-exporter",cpu="85",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
> 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="user",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
> 19298.57 1675059665571
>
> node_cpu_seconds_total{container="node-exporter",cpu="86",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
> 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="idle",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
> 1.060892164e+07 1675059665571
>
> node_cpu_seconds_total{container="node-exporter",cpu="86",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
> 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="iowait",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
> 4.37 1675059665571
>
> You can't remove it as far as I'm aware, but you can use a recording rule
> to aggregate that data to just give you a metric that represents the
> overall CPU usage (not broken down by core/status).
> -- Stuart Clark
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/bc11f812-92b3-4b2d-81f8-e0720adc7510n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmrHE7qvB%2Be8_AozxGCosHc7f2sAVk-g_D%2B9U7Q0FF4kfg%40mail.gmail.com.


Re: [prometheus-users] can node_exporter expose aggregated node_cpu_seconds_total?

2023-02-02 Thread koly li
If using a recording rule to aggerate data, then I have to store both the 
per core samples and metric samples in the same prometheus, which costs 
lots of memory.

After some investigation on node_exporter sourcecode, I found:
1. updateStat(cpu_linux.go 
)
 
function reads the content of /proc/stat file and generate the 
node_cpu_seconds_total samples per core
2. updateStat function calls c.fs.Stat() to read and parse the content of 
/proc/stat file
3. fs.Stat() function parse the /proc/stat file and store the cpu total 
statics to Stat.CPUTotal(stat.go 
)
4. However, updateStat function ignores the Stat.CPUTotal, it only uses the 
stats.CPU which contains info per core

so, the question is why node_exporter developers don't use the CPUTotal to 
expose a total cpu statics? Should the new metrics about total usage 
statics be added to node-exporter?


On Thursday, February 2, 2023 at 2:40:34 PM UTC+8 Stuart Clark wrote:
On 02/02/2023 06:26, koly li wrote:
Hi, 

Currently, node_exporter exposes time series for each cpu core (an example 
below), which generates a lot of data in a large cluster (10k nodes 
cluster). However, we only care about total cpu usage instead of usage per 
core. So is there a way for node_exporter to only 
expose aggregated node_cpu_seconds_total?

we also notice there is an discussion here (reduce cardinality of 
node_cpu_seconds_total 
), but it 
seems no conclusion.

node_cpu_seconds_total{container="node-exporter",cpu="85",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="system",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
 
9077.24 1675059665571
node_cpu_seconds_total{container="node-exporter",cpu="85",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="user",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
 
19298.57 1675059665571
node_cpu_seconds_total{container="node-exporter",cpu="86",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="idle",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
 
1.060892164e+07 1675059665571
node_cpu_seconds_total{container="node-exporter",cpu="86",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="iowait",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
 
4.37 1675059665571

You can't remove it as far as I'm aware, but you can use a recording rule 
to aggregate that data to just give you a metric that represents the 
overall CPU usage (not broken down by core/status).
-- Stuart Clark 

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bc11f812-92b3-4b2d-81f8-e0720adc7510n%40googlegroups.com.