For the record, there is whole section about this here (sorry for pasting the same link twice in my original mail)
https://docs.datastax.com/en/developer/java-driver/4.17/manual/core/load_balancing/index.html#cross-datacenter-failover Interesting to see your perspective, I can see how doing something without application changes might seem appealing. I wonder what others think about this. Regards On Mon, Dec 1, 2025 at 11:39 PM Qc L <[email protected]> wrote: > Hi Stefan, > > Thanks for raising the question about using a custom LoadBalancingPolicy! > Remote quorum and LB-based failover aim to solve similar problems, > maintaining availability during datacenter degradation, so I did a > comparison study on the server-side remote quorum solution versus relying > on driver-side logic. > > > - For our use cases, we found that client-side failover is expensive > to operate and leads to fragmented behavior during incidents. We also > prefer failover to be triggered only when needed, not automatically. A > server-side mechanism gives us controlled, predictable behavior and makes > the system easier to operate. > - Remote quorum is implemented on the server, where the coordinator > can intentionally form a quorum using replicas in a backup region when the > local region cannot satisfy LOCAL_QUORUM or EACH_QUORUM. The database > determines the fallback region, how replicas are selected, and ensures > consistency guarantees still hold. > > By keeping this logic internal to the server, we provide a unified, > consistent behavior across all clients without requiring any application > changes. > > Happy to discuss further if helpful. > > On Mon, Dec 1, 2025 at 12:57 PM Štefan Miklošovič <[email protected]> > wrote: > >> Hello, >> >> before going deeper into your proposal ... I just have to ask, have you >> tried e.g. custom LoadBalancingPolicy in driver, if you use that? >> >> I can imagine that if the driver detects a node to be down then based on >> its "distance" / dc it belongs to you might start to create a different >> query plan which talks to remote DC and similar. >> >> >> https://docs.datastax.com/en/drivers/java/4.2/com/datastax/oss/driver/api/core/loadbalancing/LoadBalancingPolicy.html >> >> >> https://docs.datastax.com/en/drivers/java/4.2/com/datastax/oss/driver/api/core/loadbalancing/LoadBalancingPolicy.html >> >> On Mon, Dec 1, 2025 at 8:54 PM Qc L <[email protected]> wrote: >> >>> Hello team, >>> >>> I’d like to propose adding the remote quorum to Cassandra consistency >>> level for handling simultaneous hosts unavailability in the local data >>> center that cannot achieve the required quorum. >>> >>> Background >>> NetworkTopologyStrategy is the most commonly used strategy at Uber, and >>> we use Local_Quorum for read/write in many use cases. Our Cassandra >>> deployment in each data center currently relies on majority replicas being >>> healthy to consistently achieve local quorum. >>> >>> Current behavior >>> When a local data center in a Cassandra deployment experiences outages, >>> network isolation, or maintenance events, the EACH_QUORUM / LOCAL_QUORUM >>> consistency level will fail for both reads and writes if enough replicas in >>> that the wlocal data center are unavailable. In this configuration, >>> simultaneous hosts unavailability can temporarily prevent the cluster from >>> reaching the required quorum for reads and writes. For applications that >>> require high availability and a seamless user experience, this can lead to >>> service downtime and a noticeable drop in overall availability. see >>> attached figure1.[image: figure1.png] >>> >>> Proposed Solution >>> To prevent this issue and ensure a seamless user experience, we can use >>> the Remote Quorum consistency level as a fallback mechanism in scenarios >>> where local replicas are unavailable. Remote Quorum in Cassandra refers to >>> a read or write operation that achieves quorum (a majority of replicas) in >>> the remote data center, rather than relying solely on replicas within the >>> local data center. see attached Figure2. >>> >>> [image: figure2.png] >>> >>> We will add a feature to do read/write consistency level override on the >>> server side. When local replicas are not available, we will overwrite the >>> server side write consistency level from each quorum to remote quorum. Note >>> that, implementing this change in client side will require some protocol >>> changes in CQL, we only add this on server side which can only be used by >>> server internal. >>> >>> For example, giving the following Cassandra setup >>> >>> >>> >>> >>> >>> *CREATE KEYSPACE ks WITH REPLICATION = { 'class': >>> 'NetworkTopologyStrategy', 'cluster1': 3, 'cluster2': 3, 'cluster3': 3};* >>> >>> The selected approach for this design is to explicitly configure a >>> backup data center mapping for the local data center, where each data >>> center defines its preferred failover target. For example >>> *remote_quorum_target_data_center:* >>> >>> >>> * cluster1: cluster2 cluster2: cluster3 cluster3: cluster1* >>> >>> Implementations >>> We proposed the following feature to Cassandra to address data center >>> failure scenarios >>> >>> 1. Introduce a new Consistency level called remote quorum. >>> 2. Feature to do read/write consistency level override on server >>> side. (This can be controlled by a feature flag). Use Node tools command >>> to >>> turn on/off the server failback >>> >>> Why remote quorum is useful >>> As shown in the figure 2, we have the data center failure in one of the >>> data centers, Local quorum and each quorum will fail since two replicas are >>> unavailable and cannot meet the quorum requirement in the local data center. >>> >>> During the incident, we can use the nodetool to enable failover to >>> remote quorum. With the failover enabled, we can failover to the remote >>> data center for read and write, which avoids the available drop. >>> >>> Any thoughts or concerns? >>> Thanks in advance for your feedback, >>> >>> Qiaochu >>> >>>
