Re: [prometheus-users] Use remote-write instead of federation

2022-09-08 Thread Brian Candler
> Federation endpoint is just another scrapping target - in case of network 
failure (or any other failure) I will get an alert that federation endpoint 
is down

This is true.  However the flip side is that remote_write buffers metrics 
while the network is down, whereas federation will not back-fill any 
historical data when the network comes back up.

You can alert on a remote_write endpoint going away, as described here:
https://groups.google.com/g/prometheus-users/c/ur9Tu1kRu6w/m/Q81qPxqQAAAJ

I think you can make a generic alert against loss of *any* remote write 
sender - something like this (untested):
*up{prometheus_agent="true"} offset 1h unless up*

(i.e. "alert if the given metric/timeseries was present one hour ago but 
isn't present now")

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bd7a1300-05f0-4377-ae0c-050c80571acan%40googlegroups.com.


Re: [prometheus-users] /metrics lifecycle

2022-09-08 Thread Stuart Clark

On 31/08/2022 02:16, Vasiliy B wrote:

Folks,
  Am researching a use case where we collect /metrics data with 
Prometheus only when needed to do some investigation. During normal 
operating hours, we would flush the /metrics endpoint on a time 
schedule which is greater than the scrape_interval ?


From the documentation, it is clear that Prometheus server will honor 
the scrape_interval config setting. But what about on the service 
side?  Do the metrics have the ability to reset to zero after a 
predefined time, i.e. 1 minute?


Looking for feedback if this is a feasible, any gotchas, or 
clarification on how metrics are stored on the client side.


A call to /metrics should always return the "current" value of every 
metric. For counters (which are generally recommended) they always 
increase and therefore only reset to zero if the application itself is 
restarted. For gauges the ideal situation is that the scraping request 
returns the live value of those metrics. For some metrics (such as where 
a call to an external system is needed to generate) it can be quite 
resource intensive to produce, and therefore that process wants to be 
limited in frequency. In that case the common method is to have a 
separate timed process which updates the metric, with the call to 
/metrics just returning the latest values calculated. While this can 
reduce the impact of such "heavy" metrics generation processes it does 
mean that you may be getting old values (depending on the frequency of 
the metric generation process) and there is a risk of that process 
breaking without you noticing.


So in general, no there isn't any idea of "flushing" or resetting metrics.

Would you be able to give a bit more detail about what you are trying to 
do, and why you think resetting metrics is important / needed?


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/3e0bf352-3959-18c0-7cb7-c159ca497cae%40Jahingo.com.