apurtell commented on code in PR #2461: URL: https://github.com/apache/phoenix/pull/2461#discussion_r3290062508
########## src/main/tla/ConsistentFailover/markdown/Admin.md: ########## @@ -0,0 +1,151 @@ +# Admin -- Operator-Initiated Actions + +**Source:** [`Admin.tla`](../Admin.tla) + +## Overview + +`Admin` models the human operator (Admin actor) who drives failover and abort via the `PhoenixHAAdminTool` CLI, which delegates to `HAGroupStoreManager` coprocessor endpoints. This module contains four actions: `AdminStartFailover` (initiate failover), `AdminAbortFailover` (abort an in-progress failover), `AdminGoOffline` (take a standby cluster offline), and `AdminForceRecover` (force-recover from OFFLINE). The last two are gated on the `UseOfflinePeerDetection` feature flag and model the proactive design for peer OFFLINE detection using `PhoenixHAAdminTool update --force`. + +These are the only actions in the specification that represent deliberate human decisions rather than automated system behavior. All four (`AdminStartFailover`, `AdminAbortFailover`, `AdminGoOffline`, `AdminForceRecover`) receive no fairness in the liveness specifications. The admin is genuinely non-deterministic. The admin might never initiate a failover, or might abort every failover attempt. Imposing fairness on admin actions would force unrealistic guarantees about human behavior. + +### Modeling Choice: Direct ZK Writes + +Unlike the peer-reactive transitions in [HAGroupStore.md](HAGroupStore.md), admin actions are direct ZK writes -- they are not watcher-dependent. The admin CLI writes directly to the local ZK znode via the coprocessor endpoint. No `zkPeerConnected` or `zkLocalConnected` guard is needed because the admin tool manages its own ZK connection independently of the `HAGroupStoreClient` watcher infrastructure. `AdminGoOffline` and `AdminForceRecover` also use the `--force` path which writes directly to ZK, so no `zkLocalConnected` guard is needed for those actions either. + +## Implementation Traceability + +| TLA+ Action | Java Source | +|---|---| +| `AdminStartFailover(c)` | `HAGroupStoreManager.initiateFailoverOnActiveCluster()` L375-400 | Review Comment: Agreed, I think file level citation is sufficient. The robots like to pin to line numbers but I can ask them to be removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
