[jira] [Created] (PHOENIX-7555) Graceful Failover with Phoenix HA - Metrics

Ritesh (Jira) Wed, 19 Mar 2025 10:45:06 -0700

Ritesh created PHOENIX-7555:
-------------------------------

             Summary: Graceful Failover with Phoenix HA - Metrics
                 Key: PHOENIX-7555
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7555
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Ritesh
            Assignee: Ritesh



Phoenix HA (PHOENIX-6491) suggests a best effort failover process for the 
failover HA policy.  The first step is to make both clusters’ roles Standby, 
and then wait for replication to finish (best-effort). The final step is to 
make the other cluster role Active. 

When the cluster role is set to Standby, the dual cluster Phoenix client does 
not allow read/write operations on a standby cluster. This helps drain 
replication data from the previously Active cluster to the previously Standby 
cluster. However, in practice a cluster may receive changes without using the 
Phoenix dual client. For example, data can be inserted through MapReduce jobs 
which do not use the Phoenix JDBC client. Another example is that the 
previously active cluster could be receiving replication data from a third 
cluster.

This means pausing writes at the Phoenix client is not sufficient for a 
graceful failover operation.  Here graceful means consistent failover between 
two healthy clusters. A consistent failover can be achieved only when the 
replication data is completely sent to the soon to-be Active cluster. 

To ensure that all incoming data is paused before the failover event, we need 
to stop writing to the cluster at the server side. To achieve this, a Phoenix 
coprocessor can also maintain and watch cluster role changes and stop writes 
when an Active cluster becomes Standby as the dual Phoenix client does. In 
order to eliminate the ambiguity on which cluster was previously Active, a new 
HA role called ActiveToStandby is introduced. Both Phoenix client and server do 
not allow write operations on an ActiveToStandby cluster.

With the above changes, graceful failover is achieved by the following steps
 # Change the Active cluster’s role to ActiveToStandby,
 # Wait for the replication data is drained
 # Change the Standby cluster’s role to Active, and the ActiveToStandby 
cluster’s role Standby



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (PHOENIX-7555) Graceful Failover with Phoenix HA - Metrics

Reply via email to