[jira] [Updated] (PHOENIX-7938) Fix consistency point calculation for sync replication replay when files within a round are processed out of order

Himanshu Gwalani (Jira) Thu, 25 Jun 2026 01:47:10 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Himanshu Gwalani updated PHOENIX-7938:
--------------------------------------
    Description: 
*Background*
In Replication Replay Design, when replication is in _sync_ mode, the 
consistency point is currently computed as the _minimum timestamp of all files 
in the_ _IN-PROGRESS directory_ (when non-empty).

*Issue*

{*}{*}Files within a round are picked randomly and moved to IN-PROGRESS. A file 
with a later creation timestamp can be moved to IN-PROGRESS before an older 
file from the same round still sitting in the IN directory. This causes the 
consistency point to advance past data that has not yet been replayed.

*Example*

{*}{*}Round N contains:
File A — timestamp T+5

File B — timestamp T+30

RS-1 picks file B first and renames it to IN-PROGRESS. File A is still in the 
IN directory. IN-PROGRESS has no files from previous rounds.

IN directory: file A (T+5) — not yet replayed

IN-PROGRESS: file B (T+30) — being replayed

Current logic yields consistency point = (T+30) − 1. The correct value should 
be (T+5) − 1.


*Proposed* *Fix*
**Consistency points should align to round start times. After computing the 
minimum timestamp across IN-PROGRESS files, adjust it to the start time of the 
round that the minimum timestamp belongs to. This avoids listing all IN 
directories while still preventing the consistency point from advancing past 
unreplayed files.

  was:
Primary-side HA fix is enabling a direct ANISTS → AISTS transition (Ritesh 
Garg). On the standby this manifests as a local HAGroupState transition 
DEGRADED_STANDBY → STANDBY_TO_ACTIVE directly, skipping STANDBY. Replay side 
has no listener for this path today: replicationReplayState stays at DEGRADED, 
no rewind to lastRoundInSync happens, and shouldTriggerFailover() (line 493) 
hard-blocks promotion forever because it requires state == SYNC.

File: 
phoenix-core-server/src/main/java/org/apache/phoenix/replication/reader/ReplicationLogDiscoveryReplay.java

*Fix on three fronts (all must land together):*
**1. triggerFailoverListener (148-160): add 
replicationReplayState.compareAndSet(DEGRADED, SYNCED_RECOVERY) before 
failoverPending.set(true). Conditional CAS so the happy STANDBY → 
STANDBY_TO_ACTIVE path doesn't pay a redundant rewind; failoverPending set runs 
unconditionally so the signal is never lost.

2. initializeLastRoundProcessed() (215-263): add a parallel branch for 
STANDBY_TO_ACTIVE so a reader restart in this state — when 
lastSyncStateTimeInMs indicates prior DEGRADED — initializes lastRoundInSync 
from lastSyncStateTimeInMs and sets state to SYNCED_RECOVERY. Without this, 
restart after the direct transition silently skips files between the pre-crash 
sync point and the crash, promoting with a hole.

3. Declare lastRoundProcessed and lastRoundInSync volatile to close a 
visibility gap between the ZK watcher thread and the scheduler thread that the 
new path makes more reachable.

  

*Dependencies:* Coordinate landing with Ritesh's primary-side change widening 
HAGroupStoreRecord.HAGroupState.DEGRADED_STANDBY.allowedTransitions (currently 
{STANDBY}) and the writer signaling for ANISTS → AISTS. Neither side ships the 
new transition without the other.

 

*Tests:* 2 listener unit cases (CAS fires from DEGRADED, no-ops from SYNC), 
full IT for the direct path with files in OUT, restart IT for crash 
mid-transition, end-to-end cycle IT including ABORT_TO_STANDBY retry, and 
update to HAGroupStoreRecordTest.testHAGroupStateValidTransitions.


> Fix consistency point calculation for sync replication replay when files 
> within a round are processed out of order
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-7938
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7938
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Himanshu Gwalani
>            Assignee: Himanshu Gwalani
>            Priority: Major
>
> *Background*
> In Replication Replay Design, when replication is in _sync_ mode, the 
> consistency point is currently computed as the _minimum timestamp of all 
> files in the_ _IN-PROGRESS directory_ (when non-empty).
> *Issue*
> {*}{*}Files within a round are picked randomly and moved to IN-PROGRESS. A 
> file with a later creation timestamp can be moved to IN-PROGRESS before an 
> older file from the same round still sitting in the IN directory. This causes 
> the consistency point to advance past data that has not yet been replayed.
> *Example*
> {*}{*}Round N contains:
> File A — timestamp T+5
> File B — timestamp T+30
> RS-1 picks file B first and renames it to IN-PROGRESS. File A is still in the 
> IN directory. IN-PROGRESS has no files from previous rounds.
> IN directory: file A (T+5) — not yet replayed
> IN-PROGRESS: file B (T+30) — being replayed
> Current logic yields consistency point = (T+30) − 1. The correct value should 
> be (T+5) − 1.
> *Proposed* *Fix*
> **Consistency points should align to round start times. After computing the 
> minimum timestamp across IN-PROGRESS files, adjust it to the start time of 
> the round that the minimum timestamp belongs to. This avoids listing all IN 
> directories while still preventing the consistency point from advancing past 
> unreplayed files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7938) Fix consistency point calculation for sync replication replay when files within a round are processed out of order

Reply via email to