On 2021-02-09 14:11, Natraj Rams wrote:
I have prometheus server installed in each environment. Say, I have 5
environments. I have one federated prometheus which scrapes specific
metric from all those 5 prometheus servers. I have retention period of
metrics as 10 days in each individual prometheus servers, while the
federated prometheus has the retention period of 1 year.

Let the individual prometheus servers name be:

  1) Prometheus-1
  2) Prometheus-2
  3) Prometheus-3
  4) Prometheus-4
  5) Prometheus-5

Till a date, say 01/02/2021, I have all those metrics visible in my
federated cluster. Let the metric value as of 01/02/2021 be "x" from
all the 5 prometheus servers.

Now one of the 5 prometheus servers, say prometheus-3 server, has lost
all its data on 02/02/2021, due to some reasons and the prometheus
server itself has got stopped. Below are the doubts I have

  1) If I query for a metric in federated prometheus, before the
prometheus-3 server becomes active, will I get any data of
prometheus-3 server?
  2) As I mentioned , prometheus-3 server has lost all its data, now
if the prometheus-3 server becomes active on 03/02/2021 with metric
value "y"(which will be very less, since all historical data are lost)
and when I query data of prometheus-3 in federated cluster, what will
be the metric value I will be getting? Will it be  x+y or just y?


The server that is federating metrics from the 5 servers has its own TSDB and isn't dependant on those servers in any way for queries. Normally you would be federating certain metrics (not everything) so the central server wouldn't have all the details, so you would still want to query the 5 servers as needed.

If you stopped scraping one of the servers (e.g. because it failed) nothing would change regarding the data the central server has already ingested. From that point onward the scrape would fail for the missing server, so any queries would have a gap. One the failed server returns the scrapes would work again and the gap would finish.

So for (1) if you query the central server before server 3 is back what you get depends on your query - if the query is for a time period before server 3 failed then you get the full data, but after server 3 failed it would be missing.

For (2) is depends what you mean by "which will be very less, since all historical data are lost". Federation fetches the current value of the matched metrics each time the central server makes a scrape of the 5 servers. Historical data is never queried (Prometheus will look back for a maximum of 5 minutes to find the latest value for each metric). If the metric is a gauge it is totally normal for the value to fluctuate. If the metric is a counter then you will get occasional counter resets, but that is down to the metric source and not the Prometheus server - counters reset when an application restarts or start from 0 if a new pod is created.

So in summary, the only impact of server 3 breaking would be a gap in your query (or lower than expected aggregate values) while it was unavailable. There is no impact for any historical data before that time or data once the server is back.

--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/11293204b82e0777996ec9f88c3b472f%40Jahingo.com.

Reply via email to