Re: [prometheus-users] Increasing scrape_interval causing missing data point for metrics

chuanjia xing Mon, 22 Mar 2021 17:25:35 -0700

Thanks Stuart. I'll need to think about if it's doable for my case to run 
node_exporter on each ec2 instances. I am in an infra team, doing that will 
have lots of impact which I need to evaluate. But thanks for your 
suggestions.

One more questions regarding cloudwatch exporter: for my case, another 
option (actually it's my first option) to get cluster / service level cpu 
metrics is instead querying ec2 intance metrics, I can collect 
AutoScalingGroup cpu metrics, which will be faster since the # of ASG is 
much smaller than the # of ec2 instances. But unfortunately, using 
cloudwatch exporter, it doesn't support ASG metrics directly since the aws 
api it's using doesn't support 
ASG: 
https://docs.aws.amazon.com/resourcegroupstagging/latest/APIReference/supported-services.html

I am actually thinking if I can get over this limitation by using a 
different aws API. Do you know if this is something doable? (I can ask this 
question in a separate conversation if needed)

thanks.

On Monday, March 22, 2021 at 5:02:43 PM UTC-7 Stuart Clark wrote:

> On 22/03/2021 23:30, chuanjia xing wrote:
>
> I have one more question for node_exporter: say if I want to get ec2 
> instance cpu metrics for *lots* of clusters, do I need to run 
> node_exporter on every node in all clusters? From the doc of node_exporter, 
> it looks like one exporter will only collect metrics for the node it's 
> running on, which means in my case I do need to install node_exporter on 
> every nodes for all clusters.  
> If that is the case, then node_exporter might not work for my case -- I 
> can't run a node_exporter on every node. Then cloudwatch exporter can do 
> this since I only need one exporter instance to collect all ec2 instance 
> cpu metrics in one region, but it's just slow.
>
> Yes you would install the node exporter on each EC2 instance. A common way 
> to do that is to build it into the AMIs you are using or to use cloud-init 
> to add it on startup. In addition to CPU you get a lot more metrics that 
> Cloudwatch isn't able to supply - full details about networking, memory, 
> disk, systemd, etc.
>
> Cloudwatch is known to be slow, not just the actual API calls but also the 
> time it takes for metrics to be available (a value returned by the API 
> might be comparatively old rather than being real-time). Using the Node 
> exporter is also likely to be cheaper as the only costs are network 
> bandwidth rather than the various API calls.
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/47b2e66a-d4e1-4c34-b791-bb2f7e7d4a1en%40googlegroups.com.

Re: [prometheus-users] Increasing scrape_interval causing missing data point for metrics

Reply via email to