Ritesh created PHOENIX-7555:
-------------------------------
Summary: Graceful Failover with Phoenix HA - Metrics
Key: PHOENIX-7555
URL: https://issues.apache.org/jira/browse/PHOENIX-7555
Project: Phoenix
Issue Type: Improvement
Reporter: Ritesh
Assignee: Ritesh
Phoenix HA (PHOENIX-6491) suggests a best effort failover process for the
failover HA policy. The first step is to make both clusters’ roles Standby,
and then wait for replication to finish (best-effort). The final step is to
make the other cluster role Active.
When the cluster role is set to Standby, the dual cluster Phoenix client does
not allow read/write operations on a standby cluster. This helps drain
replication data from the previously Active cluster to the previously Standby
cluster. However, in practice a cluster may receive changes without using the
Phoenix dual client. For example, data can be inserted through MapReduce jobs
which do not use the Phoenix JDBC client. Another example is that the
previously active cluster could be receiving replication data from a third
cluster.
This means pausing writes at the Phoenix client is not sufficient for a
graceful failover operation. Here graceful means consistent failover between
two healthy clusters. A consistent failover can be achieved only when the
replication data is completely sent to the soon to-be Active cluster.
To ensure that all incoming data is paused before the failover event, we need
to stop writing to the cluster at the server side. To achieve this, a Phoenix
coprocessor can also maintain and watch cluster role changes and stop writes
when an Active cluster becomes Standby as the dual Phoenix client does. In
order to eliminate the ambiguity on which cluster was previously Active, a new
HA role called ActiveToStandby is introduced. Both Phoenix client and server do
not allow write operations on an ActiveToStandby cluster.
With the above changes, graceful failover is achieved by the following steps
# Change the Active cluster’s role to ActiveToStandby,
# Wait for the replication data is drained
# Change the Standby cluster’s role to Active, and the ActiveToStandby
cluster’s role Standby
--
This message was sent by Atlassian Jira
(v8.20.10#820010)