Hi Brian,

Yes, agree with above implications. But, even with the case of federations
it has to take care of all alert rules and loading the time series into RAM.

Main issue with networking devices rules are, we don't know when we want to
raise an alert across devices (proemtheus shards). In this case, we thought
remote read will be just an overhead with the scraping and RAM (not too
much compared to actual ingestion). Main issue was with Proemthes is
getting crashed without prior notiifcations such OOM with large label
values or some other indications.

if you have any case studies on proemtheus crash handling cross cluster
alerting , please let us know.


Regards
Rajesh

On Sun, May 17, 2020 at 3:35 PM Brian Brazil <
[email protected]> wrote:

> On Sun, 17 May 2020 at 10:53, Rajesh Reddy Nachireddi <
> [email protected]> wrote:
>
>> Hi,
>>
>> Basically, we have large networking setup with 10k devices. we are
>> hitting 1M metrics every second from 20 % of devices itself, so we have 5
>> prom instances and one global proemtheus which uses remote read to handle
>> alert rule evaluations and thanos querier for visualisation on grafana.
>>
>> We have segregated devices with specific device ip ranges to each
>> Prometheus instances.
>>
>> So, we have one aggregator which is using remote read from all the
>> individual prom instances through remote read
>>
>> 1. will the remote read cause an issue w.r.t loading the large time
>> series over wire every 1 min ?
>> 2. Is it CPU or memory intensive ?
>>
>> What is best design strategy to handle these scale and alerting across
>> the devices or metrics ?
>>
>
> Remote read is unlikely to be the best approach here, it's pulling tons of
> raw data over the network on every evaluation which have to be buffered up
> in RAM.
>
> What you want to do here is do as much of the alerting&rules on the
> scraping Prometheus servers as is possible. For things that you can't do
> that way (e.g. 10% of devices are down globally), use federation to pass up
> e.g. total number of devices down in each Prometheus to the global and
> alert on that.
>
> --
> Brian Brazil
> www.robustperception.io
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAEyhnp%2B4J1-DQrKybWxzwyKH7qdkk4F1A3QHXRTLRDW1G%3DpH8g%40mail.gmail.com.

Reply via email to