Re: [prometheus-users] Prometheus getting slow on about 400 node_exporter instances

Julien Pivotto Sun, 01 Mar 2020 01:02:41 -0800

On 29 Feb 18:13, Nur Kholis Majid wrote:
> Hi,
> 
> On Sunday, March 1, 2020 at 7:55:30 AM UTC+7, Julien Pivotto wrote:
> >
> > On 29 Feb 16:40, Nur Kholis Majid wrote: 
> > > Hi Julien, 
> > > 
> > > On Sunday, March 1, 2020 at 6:44:34 AM UTC+7, Julien Pivotto wrote: 
> > > > 
> > > > On 29 Feb 15:38, Nur Kholis Majid wrote: 
> > > > > Hi, 
> > > > > 
> > > > > I've test prometheus to monitoring node_exporter on 400 instances. 
> > With 
> > > > > default configuration, in just two months tsdb size reach +- 450GB 
> > and 
> > > > > memory size +- 135GB. Query become slow and unuseable. 
> > > > > 
> > > > > [image: photo_2020-03-01_06-33-51.jpg] 
> > > > > 
> > > > > [image: photo_2020-03-01_06-34-00.jpg] 
> >
> > Hi, 
> >
> > Can you tell us what is in your data directory? Are compaction 
> > happening, etc? 
> >
> > e.g. the command 
> > tree data 
> >
> > or ls -Rl data 
> >
> > too long to copy here. please see https://paste.ee/p/ayBlq
> 
> Thanks



You have a lot of failed compations in the past, and a lot of .tmp
directories.

What is strange is that at the end compaction happens.

I have the following next questions to help you:

- What is your prometheus version?
- Can you share the logs of prometheus?
- Are you using the node_exporter textfile_collector?
- Do you have metrics relabel configs?

We have a few bugs out there but none of them explain that the wal is
compacted correctly at the end.

>  
> 
> > Thanks 
> >
> >
> > > > 
> > > > 
> > > > Can we know what you mean by default configuration? Is it default or 
> > > > documented one? What are your startup parameters? 
> > > > 
> > > > I mean I just add minimum configuration in prometheus.yml: 
> > > $ cat prometheus.yml 
> > > # my global config 
> > > global: 
> > >   scrape_interval:     15s # Set the scrape interval to every 15 
> > seconds. 
> > > Default is every 1 minute. 
> > >   evaluation_interval: 15s # Evaluate rules every 15 seconds. The 
> > default 
> > > is every 1 minute. 
> > >   # scrape_timeout is set to the global default (10s). 
> > > 
> > > # Alertmanager configuration 
> > > alerting: 
> > >   alertmanagers: 
> > >   - static_configs: 
> > >     - targets: 
> > >       # - alertmanager:9093 
> > > 
> > > # Load rules once and periodically evaluate them according to the global 
> > > 'evaluation_interval'. 
> > > rule_files: 
> > >   # - "first_rules.yml" 
> > >   # - "second_rules.yml" 
> > > 
> > > # A scrape configuration containing exactly one endpoint to scrape: 
> > > # Here it's Prometheus itself. 
> > > scrape_configs: 
> > >   # The job name is added as a label `job=<job_name>` to any timeseries 
> > > scraped from this config. 
> > >   - job_name: 'prometheus' 
> > > 
> > >     # metrics_path defaults to '/metrics' 
> > >     # scheme defaults to 'http'. 
> > > 
> > >     static_configs: 
> > >     - targets: ['localhost:9090'] 
> > > 
> > >   - job_name: 'node' 
> > >     static_configs: 
> > >     - targets: ['10.10.10.1:9100', '10.10.10.2:9100', etc until 400 
> > nodes] 
> > > 
> > > In node_exporter side, no additional config made. 
> > >   
> > > 
> > > > How many series do you have? 
> > > > max_over_time(prometheus_tsdb_head_series[1d]) 
> > > > 
> > > > 771651 
> > >   
> > > 
> > > > Do you have lots of different disks/devices per machines ? lots of 
> > > > network interfaces? 
> > > > 
> > > Yes. Each node consist of 2 NIC in bonding mode and 12 disks. 
> > >   
> > > 
> > > > 
> > > > I recommend you read 
> > > > 
> > > > 
> > https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion
> >  
> > > > to better understand this. 
> > > > 
> > > > > 
> > > > > 
> > > > > Question: 
> > > > > 1. How many maximum node_exporter instances can handle by prometheus 
> > > > with 
> > > > > acceptable query duration? 
> > > > > 2. Is there any special prometheus configuration for huge amount of 
> > > > > instances? 
> > > > > 
> > > > > Thank you 
> > > > > 
> > > > > -- 
> > > > > You received this message because you are subscribed to the Google 
> > > > Groups "Prometheus Users" group. 
> > > > > To unsubscribe from this group and stop receiving emails from it, 
> > send 
> > > > an email to [email protected] <javascript:>. 
> > > > > To view this discussion on the web visit 
> > > > 
> > https://groups.google.com/d/msgid/prometheus-users/7da6b213-02d0-4beb-83fb-e943701b2422%40googlegroups.com.
> >  
> >
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > >  (o-    Julien Pivotto 
> > > >  //\    Open-Source Consultant 
> > > >  V_/_   Inuits - https://www.inuits.eu 
> > > > 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google 
> > Groups "Prometheus Users" group. 
> > > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to [email protected] <javascript:>. 
> > > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/prometheus-users/986e63a7-798d-4945-adf6-580f9e48ad4b%40googlegroups.com.
> >  
> >
> >
> >
> > -- 
> >  (o-    Julien Pivotto 
> >  //\    Open-Source Consultant 
> >  V_/_   Inuits - https://www.inuits.eu 
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/97240c8d-3a9d-4bf4-9a14-a91ae0a087d9%40googlegroups.com.


-- 
 (o-    Julien Pivotto
 //\    Open-Source Consultant
 V_/_   Inuits - https://www.inuits.eu

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/20200301090157.GA14672%40oxygen.

signature.asc
Description: PGP signature

Re: [prometheus-users] Prometheus getting slow on about 400 node_exporter instances

Reply via email to