Thanks for your quick response Stuart! The reason I increase the scrape_interval to be longer than 2 mins is that I have several regions in aws to query for ec2 cpuutilization metrics, and for the Exporter, some region it took ~3mins to return the cloudwatch matrics. Let's say if it took 3mins, then on prometheus side, I have to set the scrape_timeout longer than 3mins otherwise it will timeout; and scrape_interval needs to be no less than scrape_timeout other prometheus will assert, so I have to set scrape_interval longer here.
So for my case, if the cloudwatch Exporter took long time to get the metrics, do you think is there any way I can get over the missing data issue from you side? Thanks! On Monday, March 22, 2021 at 3:07:08 PM UTC-7 Stuart Clark wrote: > On 22/03/2021 21:48, chuanjia xing wrote: > > Hi there, > > I recently hit an missing data point issue using prometheus. Want to > get some help here. Thanks. > > *Issue:* > > Increasing scrape_interval in prometheus resulted in missing data points. > > *My scenario:* > > I am using prometheus CloudWatch Exporter > <https://github.com/prometheus/cloudwatch_exporter> plus prometheus to > fetch aws cloudwatch metrics for ec2 instances cpuutilizaiton. The key > configs for the Exporter and Prometheus is initially as follows: > > Config. Value > > Scrape_interval (prometheus) 120s > > Scrape_timeout (prometheus) 60s > > Delay_seconds (Exporter) 600s > > Range_seconds (Exporter) 600s > > Period_seconds (Exporter) 60s > > It is working fine with this set of configs, meaning the metrics I got > from cloudwatch has no missing data point. > > Later on, I increased Prometheus scrape_interval to 320s and all other > configs are the same. I need to do this due to some other reason which I am > not explaining here. After this change, the same metrics started to show > some missing values, as shown below: > > (attached graph) > > You can see the missing data around time 11:30 and between 12:30 and 13:00 > . > > There’re more of these data gaps in the metrics. And something I noticed > is that the length of the missing data gap seems to match the > scrape_interval config. For example, the first data gap above is from > 11:24:26 to 11:30:08; the second data gap is from 12:44:14 to 12:50:53. > Both length of gaps are around but not the same as the scrape_interval > which is 320s. > > Is there something already known? This is making my graph looking bad. The > prometheus logs doesn’t provide much useful information as I can find. > > Any pointer how to investigate this issue? Thanks! > > The maximum scrape interval is 5 minutes (otherwise time series will be > marked as stale), however it is recommended to have a maximum of 2-2.5 > minutes to allow for a single scrape failure (which can happen due to a > timeout or slight network issue) without staleness. Is there a reason you > are trying to increase the scrape interval above 2 minutes? > > -- > Stuart Clark > > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/e9568470-40f3-469d-936c-41fbaf25cdadn%40googlegroups.com.