[
https://issues.apache.org/jira/browse/PHOENIX-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tanuj Khurana updated PHOENIX-7846:
-----------------------------------
Description:
Problem:
ReplicationLog maintains a currentBatch which accumulates every successful
append and clears only on an explicit sync() call. On writer rotation
mid-batch, replayCurrentBatch() re-appends every record in the batch onto the
new writer. For workloads with many appends between explicit syncs, the replay
cost scales linearly with batch size.
There is a pre-existing implicit durability point: LogFileFormatWriter.append()
checks the in-memory block size after each append and, when the block hits
maxBlockSize (default 1 MB), triggers an internal sync() that flushes the block
to HDFS. Records up to that point are durable. However, this information does
not propagate back to ReplicationLog.append(), so currentBatch keeps growing
past these durability points.
For example, with a 10k-record batch (1 KB records, 1 MB block size): blocks
fill every ~1000 records, but currentBatch grows to 10,000. Rotation at record
9,500 replays all 9,500 records — even though records 1–9,000 are already
durable in completed blocks on the old writer's file.
Solution:
Change LogFile.Writer.append() to return a boolean indicating whether a
block-full sync occurred. Propagate this signal through LogFileFormatWriter →
LogFileWriter → ReplicationLog.append(). When the signal is true, clear
currentBatch — all records up to this point are durable and do not need replay.
After this change, replay on rotation is proportional to the last partial block
(bounded by maxBlockSize), not the full inter-sync window. Using the same
example: rotation at record 9,500 replays only ~500 records instead of 9,500.
No change to durability semantics — this only leverages an existing durability
point that was previously not propagated.
was:
Problem:
ReplicationLog maintains a
currentBatch which accumulates every successful append and clears only on an
explicit sync() call. On writer rotation mid-batch, replayCurrentBatch()
re-appends every record in the batch onto the new writer. For workloads with
many appends between explicit syncs, the replay cost scales linearly with batch
size.
There is a pre-existing implicit durability point: LogFileFormatWriter.append()
checks the in-memory block size after each append and, when the block hits
maxBlockSize (default 1 MB), triggers an internal sync() that flushes the block
to HDFS. Records up to that point are durable. However, this information does
not propagate back to ReplicationLog.append(), so currentBatch keeps growing
past these durability points.
For example, with a 10k-record batch (1 KB records, 1 MB block size): blocks
fill every ~1000 records, but currentBatch grows to 10,000. Rotation at record
9,500 replays all 9,500 records — even though records 1–9,000 are already
durable in completed blocks on the old writer's file.
Solution:
Change LogFile.Writer.append() to return a boolean indicating whether a
block-full sync occurred. Propagate this signal through LogFileFormatWriter →
LogFileWriter → ReplicationLog.append(). When the signal is true, clear
currentBatch — all records up to this point are durable and do not need replay.
After this change, replay on rotation is proportional to the last partial block
(bounded by maxBlockSize), not the full inter-sync window. Using the same
example: rotation at record 9,500 replays only ~500 records instead of 9,500.
No change to durability semantics — this only leverages an existing durability
point that was previously not propagated.
> Bound rotation replay cost for large commit batches
> ---------------------------------------------------
>
> Key: PHOENIX-7846
> URL: https://issues.apache.org/jira/browse/PHOENIX-7846
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: Tanuj Khurana
> Assignee: Tanuj Khurana
> Priority: Major
>
> Problem:
> ReplicationLog maintains a currentBatch which accumulates every successful
> append and clears only on an explicit sync() call. On writer rotation
> mid-batch, replayCurrentBatch() re-appends every record in the batch onto the
> new writer. For workloads with many appends between explicit syncs, the
> replay cost scales linearly with batch size.
> There is a pre-existing implicit durability point:
> LogFileFormatWriter.append() checks the in-memory block size after each
> append and, when the block hits maxBlockSize (default 1 MB), triggers an
> internal sync() that flushes the block to HDFS. Records up to that point are
> durable. However, this information does not propagate back to
> ReplicationLog.append(), so currentBatch keeps growing past these durability
> points.
> For example, with a 10k-record batch (1 KB records, 1 MB block size): blocks
> fill every ~1000 records, but currentBatch grows to 10,000. Rotation at
> record 9,500 replays all 9,500 records — even though records 1–9,000 are
> already durable in completed blocks on the old writer's file.
> Solution:
> Change LogFile.Writer.append() to return a boolean indicating whether a
> block-full sync occurred. Propagate this signal through LogFileFormatWriter →
> LogFileWriter → ReplicationLog.append(). When the signal is true, clear
> currentBatch — all records up to this point are durable and do not need
> replay.
> After this change, replay on rotation is proportional to the last partial
> block (bounded by maxBlockSize), not the full inter-sync window. Using the
> same example: rotation at record 9,500 replays only ~500 records instead of
> 9,500.
> No change to durability semantics — this only leverages an existing
> durability point that was previously not propagated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)