Copilot commented on code in PR #2485:
URL: https://github.com/apache/phoenix/pull/2485#discussion_r3277765049
##########
phoenix-core-server/src/main/java/org/apache/phoenix/replication/ReplicationLog.java:
##########
@@ -439,6 +447,9 @@ public void run() {
numFailures, maxRotationRetries, t);
}
} finally {
+ // Clear the flag last so requestRotation()'s CAS rejects duplicate
on-demand submissions
+ // while this task is still creating/staging a writer.
+ rotationRequested.set(false);
CountDownLatch latch = rotationStagedLatch;
Review Comment:
`LogRotationTask` now unconditionally does `rotationRequested.set(false)` in
`finally`. Because scheduled rotations run the same task without ever setting
`rotationRequested` to true, this can race with a concurrent on-demand
`requestRotation()` (which sets the flag to true) and then get cleared by the
scheduled task, allowing duplicate on-demand rotations to be submitted.
Consider distinguishing scheduled vs on-demand tasks (e.g., constructor flag),
and only clear `rotationRequested` for the on-demand path (or have scheduled
rotations also participate in the flag consistently).
##########
phoenix-core/src/test/java/org/apache/phoenix/replication/ReplicationLogGroupTest.java:
##########
@@ -1907,4 +1908,73 @@ public void
testBlockFullSyncOnAppendReducesReplayOnRotation() throws Exception
assertTrue("Replay should be bounded by last partial block, not all
records. Got: "
+ newWriterAppendCount, newWriterAppendCount < id);
}
+
+ /**
+ * Tests that the size check is not invoked from inside apply(). Pre-fix,
every successful action
+ * inside apply() ran requestRotationIfNeeded(); when a rotation swapped in
a new writer mid-batch
+ * and replay made it exceed the threshold, the size check after every
append would request
+ * another rotation, drain it on the next apply(), replay again, and loop
indefinitely.
+ * <p>
+ * The new writer is mocked so its getLength() always reports above the
rotation threshold. On the
+ * buggy code this triggers a rotation request after every successful
append, which cascades into
+ * one writer per append. With the fix, requestRotationIfNeeded() runs only
after currentBatch is
+ * cleared, so the chain is bounded.
+ */
+ @Test
+ public void testSizeRotationDoesNotLoopOnReplay() throws Exception {
+ final String tableName = "TBLSRDLOR";
+ long commitId = 1L;
+ // Long enough that only the first scheduled tick can fire within the test
window.
Review Comment:
The comment says `roundDurationSeconds` is "Long enough that only the first
scheduled tick can fire", but the value is `1` second and the test runtime can
exceed 2 seconds (sleep for tick + append loop sleeps), so multiple scheduled
ticks can still fire. Please adjust the comment (or the duration/logic) so it
matches the test behavior.
##########
phoenix-core/src/test/java/org/apache/phoenix/replication/ReplicationLogGroupTest.java:
##########
@@ -1907,4 +1908,73 @@ public void
testBlockFullSyncOnAppendReducesReplayOnRotation() throws Exception
assertTrue("Replay should be bounded by last partial block, not all
records. Got: "
+ newWriterAppendCount, newWriterAppendCount < id);
}
+
+ /**
+ * Tests that the size check is not invoked from inside apply(). Pre-fix,
every successful action
+ * inside apply() ran requestRotationIfNeeded(); when a rotation swapped in
a new writer mid-batch
+ * and replay made it exceed the threshold, the size check after every
append would request
+ * another rotation, drain it on the next apply(), replay again, and loop
indefinitely.
+ * <p>
+ * The new writer is mocked so its getLength() always reports above the
rotation threshold. On the
+ * buggy code this triggers a rotation request after every successful
append, which cascades into
+ * one writer per append. With the fix, requestRotationIfNeeded() runs only
after currentBatch is
+ * cleared, so the chain is bounded.
+ */
+ @Test
+ public void testSizeRotationDoesNotLoopOnReplay() throws Exception {
+ final String tableName = "TBLSRDLOR";
+ long commitId = 1L;
+ // Long enough that only the first scheduled tick can fire within the test
window.
+ final int roundDurationSeconds = 1;
+ final int numAppendsAfterTick = 20;
+
+ conf.setInt(PHOENIX_REPLICATION_ROUND_DURATION_SECONDS_KEY,
roundDurationSeconds);
+ recreateLogGroup();
+
+ ReplicationLog activeLog = logGroup.getActiveLog();
+ LogFileWriter initialWriter = activeLog.getWriter();
+ assertNotNull("Initial writer should not be null", initialWriter);
+
+ // Count writers created so we can detect a rotation chain. Each writer
created after the
+ // initial one is configured to report a length above the rotation
threshold so the size
+ // check (wherever it runs) sees an over-threshold writer.
+ AtomicInteger writersCreated = new AtomicInteger(1); // initial writer
already created
+ final long oversizedLength = TEST_ROTATION_SIZE_BYTES + 1;
+ doAnswer(invocation -> {
+ LogFileWriter w = (LogFileWriter) invocation.callRealMethod();
+ doReturn(oversizedLength).when(w).getLength();
+ writersCreated.incrementAndGet();
+ return w;
+ }).when(activeLog).createNewWriter();
+
+ // Append unsynced records so currentBatch is non-empty when the rotation
tick fires.
+ final Mutation put = LogFileTestUtil.newPut("row", 1, 1);
+ for (int i = 0; i < 5; i++) {
+ logGroup.append(tableName, commitId++, put);
+ }
+
+ // Wait for the time-based rotation tick to stage a fresh pendingWriter.
+ waitForRotationTick(roundDurationSeconds);
+
+ // Continue appending. On the buggy code, the first such append drains the
staged writer,
+ // replays the batch, then requestRotationIfNeeded() inside apply() fires
(length is over
+ // threshold) → requests another rotation. Subsequent appends drain that
one too, replay,
+ // request another, and so on — one new writer per append.
+ for (int i = 0; i < numAppendsAfterTick; i++) {
+ logGroup.append(tableName, commitId++, put);
+ // Small sleep to give the on-demand rotation task time to actually
create the writer
+ // before the next append drains it. This makes the chain observable in
the test.
+ Thread.sleep(20);
+ }
Review Comment:
This test relies on a hard-coded `Thread.sleep(20)` to let the background
rotation task create/stage a writer. This timing dependency can be flaky on
slower CI or loaded machines; consider using a synchronization primitive (e.g.,
latch in the `createNewWriter()` Answer) to deterministically wait for the
writer to be staged before proceeding.
##########
phoenix-core/src/test/java/org/apache/phoenix/replication/ReplicationLogGroupTest.java:
##########
@@ -1907,4 +1908,73 @@ public void
testBlockFullSyncOnAppendReducesReplayOnRotation() throws Exception
assertTrue("Replay should be bounded by last partial block, not all
records. Got: "
+ newWriterAppendCount, newWriterAppendCount < id);
}
+
+ /**
+ * Tests that the size check is not invoked from inside apply(). Pre-fix,
every successful action
+ * inside apply() ran requestRotationIfNeeded(); when a rotation swapped in
a new writer mid-batch
+ * and replay made it exceed the threshold, the size check after every
append would request
+ * another rotation, drain it on the next apply(), replay again, and loop
indefinitely.
+ * <p>
+ * The new writer is mocked so its getLength() always reports above the
rotation threshold. On the
+ * buggy code this triggers a rotation request after every successful
append, which cascades into
+ * one writer per append. With the fix, requestRotationIfNeeded() runs only
after currentBatch is
+ * cleared, so the chain is bounded.
+ */
+ @Test
+ public void testSizeRotationDoesNotLoopOnReplay() throws Exception {
+ final String tableName = "TBLSRDLOR";
+ long commitId = 1L;
+ // Long enough that only the first scheduled tick can fire within the test
window.
+ final int roundDurationSeconds = 1;
+ final int numAppendsAfterTick = 20;
+
+ conf.setInt(PHOENIX_REPLICATION_ROUND_DURATION_SECONDS_KEY,
roundDurationSeconds);
+ recreateLogGroup();
+
+ ReplicationLog activeLog = logGroup.getActiveLog();
+ LogFileWriter initialWriter = activeLog.getWriter();
+ assertNotNull("Initial writer should not be null", initialWriter);
+
+ // Count writers created so we can detect a rotation chain. Each writer
created after the
+ // initial one is configured to report a length above the rotation
threshold so the size
+ // check (wherever it runs) sees an over-threshold writer.
+ AtomicInteger writersCreated = new AtomicInteger(1); // initial writer
already created
+ final long oversizedLength = TEST_ROTATION_SIZE_BYTES + 1;
+ doAnswer(invocation -> {
+ LogFileWriter w = (LogFileWriter) invocation.callRealMethod();
+ doReturn(oversizedLength).when(w).getLength();
+ writersCreated.incrementAndGet();
+ return w;
+ }).when(activeLog).createNewWriter();
+
+ // Append unsynced records so currentBatch is non-empty when the rotation
tick fires.
+ final Mutation put = LogFileTestUtil.newPut("row", 1, 1);
+ for (int i = 0; i < 5; i++) {
+ logGroup.append(tableName, commitId++, put);
+ }
+
+ // Wait for the time-based rotation tick to stage a fresh pendingWriter.
+ waitForRotationTick(roundDurationSeconds);
+
+ // Continue appending. On the buggy code, the first such append drains the
staged writer,
+ // replays the batch, then requestRotationIfNeeded() inside apply() fires
(length is over
+ // threshold) → requests another rotation. Subsequent appends drain that
one too, replay,
+ // request another, and so on — one new writer per append.
+ for (int i = 0; i < numAppendsAfterTick; i++) {
+ logGroup.append(tableName, commitId++, put);
+ // Small sleep to give the on-demand rotation task time to actually
create the writer
+ // before the next append drains it. This makes the chain observable in
the test.
+ Thread.sleep(20);
+ }
+ logGroup.sync();
+
+ // Without the fix the count grows roughly linearly with
numAppendsAfterTick (one new
+ // writer per append). With the fix, it is bounded regardless of how many
appends are
+ // performed: initial + scheduled ticks (test runtime ~2s with 1s round =
1-2 ticks) +
+ // at most one size-triggered rotation per sync point. Use a generous
bound that is well
+ // below the buggy behavior (which produced ~24) but tolerates extra
scheduled ticks.
+ int writers = writersCreated.get();
+ assertTrue("Size-based rotation must not loop. Writers created: " +
writers,
+ writers <= numAppendsAfterTick / 2);
Review Comment:
The assertion reads `writersCreated` immediately after `logGroup.sync()`,
but writer creation happens on the separate `rotationExecutor` thread and may
still be in-flight/queued at that point. That can make the test miss the
"writer-per-append" regression if the executor hasn’t caught up yet. Consider
waiting until rotations have quiesced (e.g., wait for `rotationRequested` to
clear and/or for `pendingWriter` to be staged/drained) before asserting on the
final writer count.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]