I don't think any flag changes are needed. On Wed, Mar 12, 2025 at 4:36 PM hartfordfive <[email protected]> wrote:
> Great, thank you for the feedback. Should any of the flags be set to > custom values when deploying across a wide WAN or should the default values > still suffice? > > > On Monday, March 3, 2025 at 7:03:30 AM UTC-5 Ben Kochie wrote: > >> Part of the Prometheus/Alertmanager design is to better survive WAN >> split-brain. >> >> IMO, running a wide Alertmanager cluster is a good idea when you have a >> wide network. The AM gossip protocol and deduplication is designed to fail >> open in the event of a split brain. >> >> The only thing you have to be aware of is that Prometheus-to-Alertmanager >> is an all-all communication. All Prometheus instances need to send to all >> Alertmanagers. >> >> On Thu, Feb 27, 2025 at 5:38 PM 'Brian Candler' via Prometheus Users < >> [email protected]> wrote: >> >>> On Thursday, 27 February 2025 at 15:37:54 UTC hartfordfive wrote: >>> >>> With this approach, multiple AZ which are typically each hosted within a >>> single DC, still run the risk of being inaccessible should the link to the >>> DC go down. So let's say you have datacenters in 3 regions (AMER, EMEA >>> and APAC) and you've chosen to have a single AM cluster in EMEA, should the >>> link between AMER and EMEA and/or EMEA and APAC go down , then Prometheus >>> instances located in AMER or APAC won't be able to send alert >>> notifications. If you instead of 2 or 3 alertmanager instances in each of >>> these regions, wouldn't that still allow alerts to be received and actioned >>> within each of those regions? >>> >>> >>> Only you know what the meaningful failure modes are for your >>> environment. It seems to me that you expect key DC-to-DC connectivity to go >>> down, but you are still able to send alerts (presumably via Internet or >>> some other out-of-band means). You could get Prometheus to talk to >>> alertmanager over the Internet too, using https, if you felt that was more >>> reliable. >>> >>> Also, if DC-to-DC communication is unreliable, then personally I would >>> not want to run any sort of distributed application across it (alertmanager >>> or otherwise), due to problems with partitioning / split brain. >>> >>> However, you need to make your own call as to what works best for you, >>> and what is the optimum tradeoff between cost, complexity, and >>> reliability. My gut feeling is towards simplicity and reliability, which >>> for me means either a single global alertmanager cluster, or a separate AM >>> cluster per region, but you can build whatever you're comfortable with. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion visit >>> https://groups.google.com/d/msgid/prometheus-users/ec7b1e1f-d1af-4e0c-ad59-1f238e661737n%40googlegroups.com >>> <https://groups.google.com/d/msgid/prometheus-users/ec7b1e1f-d1af-4e0c-ad59-1f238e661737n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/prometheus-users/e0d30be0-0dfb-421a-a457-ebef81b4d1d9n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/e0d30be0-0dfb-421a-a457-ebef81b4d1d9n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmoxwwuhMdgxwZQiSkC1kR356Dq9%2BhsPDPyeSL3pGWkHZg%40mail.gmail.com.

