[prometheus-users] Prometheus server increases CPU usage beyond 200%

Isabel Noronha Thu, 14 May 2020 01:45:44 -0700

Hi,

Server config where prometheus is running:
160 CPU cores
500 Gb RAM
2Tb Hardisk.


Prometheus version:2.18.0
cadvisor version:0.36.0

Prometheus is running inside a container.
I have already done relabeling.
Retention period is 15days.

I am using Cadvisor to get metrics from containers  around 4k containers.
I have done relabeling for container metrics as well.

Scrape interval is 40s

I use top command to check the CPU usage.
So to my surprise Prometheus was exceeding 200% CPU usage.
On this server (where prometheus server is running ) has around 2K 
containers.

On another target 2K containers,
So overall 4K containers.

Could anyone help me understand the possible reasons for prometheus to 
increase the CPU usage?

prometheus .yml
# my global config
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: ‘prometheus-monitor’

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
  #- 'alert.rules'
  # - "first.rules"
   - "alert_rules.yml"

# alert
alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - "server:9093"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries 
scraped from this config.

  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 
seconds.
    scrape_interval: 40s
    scrape_timeout: 40s

    static_configs:
        - targets: ['localhost:9010']

  - job_name: 'cadvisor'

    # Override the global default and scrape targets from this job every 5 
seconds.
    scrape_interval: 40s
    scrape_timeout: 40s

    static_configs:
          - targets: [server1:8080',server2:8080',server3:8080']
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 
'(container_fs_writes_total|container_fs_reads_total|container_tasks_state|container_cpu_user_seconds_total|container_last_seen|container_memory_usage_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|container_memory_rss|container_network_receive_bytes_total|container_network_transmit_bytes_total|container_memory_cache|cadvisor_version_info)'
        action: keep

  - job_name: 'node-exporter'

    # Override the global default and scrape targets from this job every 5 
seconds.
    scrape_interval: 15s
    scrape_timeout: 15s
    static_configs:
          - targets: [server1:9100',server2:9100',server3:9100']

    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 
'(process_start_time_seconds|node_load1|node_exporter_build_info|node_uname_info|node_network_receive_bytes_total|node_network_transmit_bytes_total|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_MemFree_bytes|node_memory_SwapCached_bytes|node_memory_PageTables_bytes|node_memory_VmallocUsed_bytes|node_memory_SwapTotal_bytes|node_memory_Committed_AS_bytes|node_memory_Active_bytes|node_memory_Mapped_bytes|node_memory_Inactive_bytes|node_cpu_seconds_total|node_filesystem_avail_bytes|node_filesystem_size_bytes|node_memory_MemAvailable_bytes|node_memory_MemTotal_bytes|node_memory_MemFree_bytes
 
|node_memory_Cached_bytes|node_filesystem_free_bytes)'
        action: keep
  - job_name: 'docker'
         # metrics_path defaults to '/metrics'
         # scheme defaults to 'http'.
    scrape_interval: 5s

    static_configs:
      - targets: ['172.17.0.1:9999']

Thank you!


-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/41576f93-da37-4524-aba8-8e5d0e595402%40googlegroups.com.

[prometheus-users] Prometheus server increases CPU usage beyond 200%

Reply via email to