Part of the Prometheus/Alertmanager design is to better survive WAN
split-brain.

IMO, running a wide Alertmanager cluster is a good idea when you have a
wide network. The AM gossip protocol and deduplication is designed to fail
open in the event of a split brain.

The only thing you have to be aware of is that Prometheus-to-Alertmanager
is an all-all communication. All Prometheus instances need to send to all
Alertmanagers.

On Thu, Feb 27, 2025 at 5:38 PM 'Brian Candler' via Prometheus Users <
[email protected]> wrote:

> On Thursday, 27 February 2025 at 15:37:54 UTC hartfordfive wrote:
>
> With this approach, multiple AZ which are typically each hosted within a
> single DC, still run the risk of being inaccessible should the link to the
> DC go down.   So let's say you have datacenters in 3 regions (AMER, EMEA
> and APAC) and you've chosen to have a single AM cluster in EMEA, should the
> link between AMER and EMEA and/or EMEA and APAC go down , then Prometheus
> instances located in AMER or APAC won't be able to send alert
> notifications.   If you instead of 2 or 3 alertmanager instances in each of
> these regions, wouldn't that still allow alerts to be received and actioned
> within each of those regions?
>
>
> Only you know what the meaningful failure modes are for your environment.
> It seems to me that you expect key DC-to-DC connectivity to go down, but
> you are still able to send alerts (presumably via Internet or some other
> out-of-band means).  You could get Prometheus to talk to alertmanager over
> the Internet too, using https, if you felt that was more reliable.
>
> Also, if DC-to-DC communication is unreliable, then personally I would not
> want to run any sort of distributed application across it (alertmanager or
> otherwise), due to problems with partitioning / split brain.
>
> However, you need to make your own call as to what works best for you, and
> what is the optimum tradeoff between cost, complexity, and reliability.  My
> gut feeling is towards simplicity and reliability, which for me means
> either a single global alertmanager cluster, or a separate AM cluster per
> region, but you can build whatever you're comfortable with.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/prometheus-users/ec7b1e1f-d1af-4e0c-ad59-1f238e661737n%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/ec7b1e1f-d1af-4e0c-ad59-1f238e661737n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmq%3Dx%2Bwb%3DqKh0JN_K3hiTDn_MCe_7Me7ercgEK3jP7S8Pg%40mail.gmail.com.

Reply via email to