Thanks for your quick response Stuart!
The reason I increase the scrape_interval to be longer than 2 mins is that 
I have several regions in aws to query for ec2 cpuutilization metrics, and 
for the Exporter, some region it took ~3mins to return the cloudwatch 
matrics. Let's say if it took 3mins, then on prometheus side, I have to set 
the scrape_timeout longer than 3mins otherwise it will timeout; and 
scrape_interval needs to be no less than scrape_timeout other prometheus 
will assert, so I have to set scrape_interval longer here.

So for my case, if the cloudwatch Exporter took long time to get the 
metrics, do you think is there any way I can get over the missing data 
issue from you side? Thanks!

On Monday, March 22, 2021 at 3:07:08 PM UTC-7 Stuart Clark wrote:

> On 22/03/2021 21:48, chuanjia xing wrote:
>
> Hi there, 
>
>       I recently hit an missing data point issue using prometheus. Want to 
> get some help here. Thanks. 
>
> *Issue:*
>
> Increasing scrape_interval in prometheus resulted in missing data points.
>
> *My scenario:*
>
> I am using prometheus CloudWatch Exporter 
> <https://github.com/prometheus/cloudwatch_exporter> plus prometheus to 
> fetch aws cloudwatch metrics for ec2 instances cpuutilizaiton. The key 
> configs for the Exporter and Prometheus is initially as follows:
>
> Config.                                                           Value
>
> Scrape_interval (prometheus)                   120s
>
> Scrape_timeout (prometheus)                   60s
>
> Delay_seconds (Exporter)                           600s
>
> Range_seconds (Exporter)                          600s
>
> Period_seconds (Exporter)                          60s
>
> It is working fine with this set of configs, meaning the metrics I got 
> from cloudwatch has no missing data point.
>
> Later on, I increased Prometheus scrape_interval to 320s and all other 
> configs are the same. I need to do this due to some other reason which I am 
> not explaining here. After this change, the same metrics started to show 
> some missing values, as shown below:
>
> (attached graph)
>
> You can see the missing data around time 11:30 and between 12:30 and 13:00
> . 
>
> There’re more of these data gaps in the metrics. And something I noticed 
> is that the length of the missing data gap seems to match the 
> scrape_interval config. For example, the first data gap above is from 
> 11:24:26 to 11:30:08; the second data gap is from 12:44:14 to 12:50:53. 
> Both length of gaps are around but not the same as the scrape_interval 
> which is 320s. 
>
> Is there something already known? This is making my graph looking bad. The 
> prometheus logs doesn’t provide much useful information as I can find. 
>
> Any pointer how to investigate this issue? Thanks!
>
> The maximum scrape interval is 5 minutes (otherwise time series will be 
> marked as stale), however it is recommended to have a maximum of 2-2.5 
> minutes to allow for a single scrape failure (which can happen due to a 
> timeout or slight network issue) without staleness. Is there a reason you 
> are trying to increase the scrape interval above 2 minutes?
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e9568470-40f3-469d-936c-41fbaf25cdadn%40googlegroups.com.

Reply via email to