Himanshu Gwalani created PHOENIX-7938:
-----------------------------------------

             Summary: Fix consistency point calculation for sync replication 
replay when files within a round are processed out of order
                 Key: PHOENIX-7938
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7938
             Project: Phoenix
          Issue Type: Sub-task
            Reporter: Himanshu Gwalani
            Assignee: Himanshu Gwalani


Primary-side HA fix is enabling a direct ANISTS → AISTS transition (Ritesh 
Garg). On the standby this manifests as a local HAGroupState transition 
DEGRADED_STANDBY → STANDBY_TO_ACTIVE directly, skipping STANDBY. Replay side 
has no listener for this path today: replicationReplayState stays at DEGRADED, 
no rewind to lastRoundInSync happens, and shouldTriggerFailover() (line 493) 
hard-blocks promotion forever because it requires state == SYNC.

File: 
phoenix-core-server/src/main/java/org/apache/phoenix/replication/reader/ReplicationLogDiscoveryReplay.java

*Fix on three fronts (all must land together):*
**1. triggerFailoverListener (148-160): add 
replicationReplayState.compareAndSet(DEGRADED, SYNCED_RECOVERY) before 
failoverPending.set(true). Conditional CAS so the happy STANDBY → 
STANDBY_TO_ACTIVE path doesn't pay a redundant rewind; failoverPending set runs 
unconditionally so the signal is never lost.

2. initializeLastRoundProcessed() (215-263): add a parallel branch for 
STANDBY_TO_ACTIVE so a reader restart in this state — when 
lastSyncStateTimeInMs indicates prior DEGRADED — initializes lastRoundInSync 
from lastSyncStateTimeInMs and sets state to SYNCED_RECOVERY. Without this, 
restart after the direct transition silently skips files between the pre-crash 
sync point and the crash, promoting with a hole.

3. Declare lastRoundProcessed and lastRoundInSync volatile to close a 
visibility gap between the ZK watcher thread and the scheduler thread that the 
new path makes more reachable.

  

*Dependencies:* Coordinate landing with Ritesh's primary-side change widening 
HAGroupStoreRecord.HAGroupState.DEGRADED_STANDBY.allowedTransitions (currently 
{STANDBY}) and the writer signaling for ANISTS → AISTS. Neither side ships the 
new transition without the other.

 

*Tests:* 2 listener unit cases (CAS fires from DEGRADED, no-ops from SYNC), 
full IT for the direct path with files in OUT, restart IT for crash 
mid-transition, end-to-end cycle IT including ABORT_TO_STANDBY retry, and 
update to HAGroupStoreRecordTest.testHAGroupStateValidTransitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to