On 2021-02-09 14:11, Natraj Rams wrote:
I have prometheus server installed in each environment. Say, I have 5
environments. I have one federated prometheus which scrapes specific
metric from all those 5 prometheus servers. I have retention period of
metrics as 10 days in each individual prometheus servers, while the
federated prometheus has the retention period of 1 year.
Let the individual prometheus servers name be:
1) Prometheus-1
2) Prometheus-2
3) Prometheus-3
4) Prometheus-4
5) Prometheus-5
Till a date, say 01/02/2021, I have all those metrics visible in my
federated cluster. Let the metric value as of 01/02/2021 be "x" from
all the 5 prometheus servers.
Now one of the 5 prometheus servers, say prometheus-3 server, has lost
all its data on 02/02/2021, due to some reasons and the prometheus
server itself has got stopped. Below are the doubts I have
1) If I query for a metric in federated prometheus, before the
prometheus-3 server becomes active, will I get any data of
prometheus-3 server?
2) As I mentioned , prometheus-3 server has lost all its data, now
if the prometheus-3 server becomes active on 03/02/2021 with metric
value "y"(which will be very less, since all historical data are lost)
and when I query data of prometheus-3 in federated cluster, what will
be the metric value I will be getting? Will it be x+y or just y?
The server that is federating metrics from the 5 servers has its own
TSDB and isn't dependant on those servers in any way for queries.
Normally you would be federating certain metrics (not everything) so the
central server wouldn't have all the details, so you would still want to
query the 5 servers as needed.
If you stopped scraping one of the servers (e.g. because it failed)
nothing would change regarding the data the central server has already
ingested. From that point onward the scrape would fail for the missing
server, so any queries would have a gap. One the failed server returns
the scrapes would work again and the gap would finish.
So for (1) if you query the central server before server 3 is back what
you get depends on your query - if the query is for a time period before
server 3 failed then you get the full data, but after server 3 failed it
would be missing.
For (2) is depends what you mean by "which will be very less, since all
historical data are lost". Federation fetches the current value of the
matched metrics each time the central server makes a scrape of the 5
servers. Historical data is never queried (Prometheus will look back for
a maximum of 5 minutes to find the latest value for each metric). If the
metric is a gauge it is totally normal for the value to fluctuate. If
the metric is a counter then you will get occasional counter resets, but
that is down to the metric source and not the Prometheus server -
counters reset when an application restarts or start from 0 if a new pod
is created.
So in summary, the only impact of server 3 breaking would be a gap in
your query (or lower than expected aggregate values) while it was
unavailable. There is no impact for any historical data before that time
or data once the server is back.
--
Stuart Clark
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/11293204b82e0777996ec9f88c3b472f%40Jahingo.com.