[
https://issues.apache.org/jira/browse/PHOENIX-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Himanshu Gwalani updated PHOENIX-7938:
--------------------------------------
Description:
*Background*
In Replication Replay Design, when replication is in _sync_ mode, the
consistency point is currently computed as the _minimum timestamp of all files
in the_ _IN-PROGRESS directory_ (when non-empty).
*Issue*
{*}{*}Files within a round are picked randomly and moved to IN-PROGRESS. A file
with a later creation timestamp can be moved to IN-PROGRESS before an older
file from the same round still sitting in the IN directory. This causes the
consistency point to advance past data that has not yet been replayed.
*Example*
{*}{*}Round N contains:
File A — timestamp T+5
File B — timestamp T+30
RS-1 picks file B first and renames it to IN-PROGRESS. File A is still in the
IN directory. IN-PROGRESS has no files from previous rounds.
IN directory: file A (T+5) — not yet replayed
IN-PROGRESS: file B (T+30) — being replayed
Current logic yields consistency point = (T+30) − 1. The correct value should
be (T+5) − 1.
*Proposed* *Fix*
**Consistency points should align to round start times. After computing the
minimum timestamp across IN-PROGRESS files, adjust it to the start time of the
round that the minimum timestamp belongs to. This avoids listing all IN
directories while still preventing the consistency point from advancing past
unreplayed files.
was:
Primary-side HA fix is enabling a direct ANISTS → AISTS transition (Ritesh
Garg). On the standby this manifests as a local HAGroupState transition
DEGRADED_STANDBY → STANDBY_TO_ACTIVE directly, skipping STANDBY. Replay side
has no listener for this path today: replicationReplayState stays at DEGRADED,
no rewind to lastRoundInSync happens, and shouldTriggerFailover() (line 493)
hard-blocks promotion forever because it requires state == SYNC.
File:
phoenix-core-server/src/main/java/org/apache/phoenix/replication/reader/ReplicationLogDiscoveryReplay.java
*Fix on three fronts (all must land together):*
**1. triggerFailoverListener (148-160): add
replicationReplayState.compareAndSet(DEGRADED, SYNCED_RECOVERY) before
failoverPending.set(true). Conditional CAS so the happy STANDBY →
STANDBY_TO_ACTIVE path doesn't pay a redundant rewind; failoverPending set runs
unconditionally so the signal is never lost.
2. initializeLastRoundProcessed() (215-263): add a parallel branch for
STANDBY_TO_ACTIVE so a reader restart in this state — when
lastSyncStateTimeInMs indicates prior DEGRADED — initializes lastRoundInSync
from lastSyncStateTimeInMs and sets state to SYNCED_RECOVERY. Without this,
restart after the direct transition silently skips files between the pre-crash
sync point and the crash, promoting with a hole.
3. Declare lastRoundProcessed and lastRoundInSync volatile to close a
visibility gap between the ZK watcher thread and the scheduler thread that the
new path makes more reachable.
*Dependencies:* Coordinate landing with Ritesh's primary-side change widening
HAGroupStoreRecord.HAGroupState.DEGRADED_STANDBY.allowedTransitions (currently
{STANDBY}) and the writer signaling for ANISTS → AISTS. Neither side ships the
new transition without the other.
*Tests:* 2 listener unit cases (CAS fires from DEGRADED, no-ops from SYNC),
full IT for the direct path with files in OUT, restart IT for crash
mid-transition, end-to-end cycle IT including ABORT_TO_STANDBY retry, and
update to HAGroupStoreRecordTest.testHAGroupStateValidTransitions.
> Fix consistency point calculation for sync replication replay when files
> within a round are processed out of order
> ------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-7938
> URL: https://issues.apache.org/jira/browse/PHOENIX-7938
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: Himanshu Gwalani
> Assignee: Himanshu Gwalani
> Priority: Major
>
> *Background*
> In Replication Replay Design, when replication is in _sync_ mode, the
> consistency point is currently computed as the _minimum timestamp of all
> files in the_ _IN-PROGRESS directory_ (when non-empty).
> *Issue*
> {*}{*}Files within a round are picked randomly and moved to IN-PROGRESS. A
> file with a later creation timestamp can be moved to IN-PROGRESS before an
> older file from the same round still sitting in the IN directory. This causes
> the consistency point to advance past data that has not yet been replayed.
> *Example*
> {*}{*}Round N contains:
> File A — timestamp T+5
> File B — timestamp T+30
> RS-1 picks file B first and renames it to IN-PROGRESS. File A is still in the
> IN directory. IN-PROGRESS has no files from previous rounds.
> IN directory: file A (T+5) — not yet replayed
> IN-PROGRESS: file B (T+30) — being replayed
> Current logic yields consistency point = (T+30) − 1. The correct value should
> be (T+5) − 1.
> *Proposed* *Fix*
> **Consistency points should align to round start times. After computing the
> minimum timestamp across IN-PROGRESS files, adjust it to the start time of
> the round that the minimum timestamp belongs to. This avoids listing all IN
> directories while still preventing the consistency point from advancing past
> unreplayed files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)