Copilot commented on code in PR #2485:
URL: https://github.com/apache/phoenix/pull/2485#discussion_r3277765049


##########
phoenix-core-server/src/main/java/org/apache/phoenix/replication/ReplicationLog.java:
##########
@@ -439,6 +447,9 @@ public void run() {
             numFailures, maxRotationRetries, t);
         }
       } finally {
+        // Clear the flag last so requestRotation()'s CAS rejects duplicate 
on-demand submissions
+        // while this task is still creating/staging a writer.
+        rotationRequested.set(false);
         CountDownLatch latch = rotationStagedLatch;

Review Comment:
   `LogRotationTask` now unconditionally does `rotationRequested.set(false)` in 
`finally`. Because scheduled rotations run the same task without ever setting 
`rotationRequested` to true, this can race with a concurrent on-demand 
`requestRotation()` (which sets the flag to true) and then get cleared by the 
scheduled task, allowing duplicate on-demand rotations to be submitted. 
Consider distinguishing scheduled vs on-demand tasks (e.g., constructor flag), 
and only clear `rotationRequested` for the on-demand path (or have scheduled 
rotations also participate in the flag consistently).



##########
phoenix-core/src/test/java/org/apache/phoenix/replication/ReplicationLogGroupTest.java:
##########
@@ -1907,4 +1908,73 @@ public void 
testBlockFullSyncOnAppendReducesReplayOnRotation() throws Exception
     assertTrue("Replay should be bounded by last partial block, not all 
records. Got: "
       + newWriterAppendCount, newWriterAppendCount < id);
   }
+
+  /**
+   * Tests that the size check is not invoked from inside apply(). Pre-fix, 
every successful action
+   * inside apply() ran requestRotationIfNeeded(); when a rotation swapped in 
a new writer mid-batch
+   * and replay made it exceed the threshold, the size check after every 
append would request
+   * another rotation, drain it on the next apply(), replay again, and loop 
indefinitely.
+   * <p>
+   * The new writer is mocked so its getLength() always reports above the 
rotation threshold. On the
+   * buggy code this triggers a rotation request after every successful 
append, which cascades into
+   * one writer per append. With the fix, requestRotationIfNeeded() runs only 
after currentBatch is
+   * cleared, so the chain is bounded.
+   */
+  @Test
+  public void testSizeRotationDoesNotLoopOnReplay() throws Exception {
+    final String tableName = "TBLSRDLOR";
+    long commitId = 1L;
+    // Long enough that only the first scheduled tick can fire within the test 
window.

Review Comment:
   The comment says `roundDurationSeconds` is "Long enough that only the first 
scheduled tick can fire", but the value is `1` second and the test runtime can 
exceed 2 seconds (sleep for tick + append loop sleeps), so multiple scheduled 
ticks can still fire. Please adjust the comment (or the duration/logic) so it 
matches the test behavior.
   



##########
phoenix-core/src/test/java/org/apache/phoenix/replication/ReplicationLogGroupTest.java:
##########
@@ -1907,4 +1908,73 @@ public void 
testBlockFullSyncOnAppendReducesReplayOnRotation() throws Exception
     assertTrue("Replay should be bounded by last partial block, not all 
records. Got: "
       + newWriterAppendCount, newWriterAppendCount < id);
   }
+
+  /**
+   * Tests that the size check is not invoked from inside apply(). Pre-fix, 
every successful action
+   * inside apply() ran requestRotationIfNeeded(); when a rotation swapped in 
a new writer mid-batch
+   * and replay made it exceed the threshold, the size check after every 
append would request
+   * another rotation, drain it on the next apply(), replay again, and loop 
indefinitely.
+   * <p>
+   * The new writer is mocked so its getLength() always reports above the 
rotation threshold. On the
+   * buggy code this triggers a rotation request after every successful 
append, which cascades into
+   * one writer per append. With the fix, requestRotationIfNeeded() runs only 
after currentBatch is
+   * cleared, so the chain is bounded.
+   */
+  @Test
+  public void testSizeRotationDoesNotLoopOnReplay() throws Exception {
+    final String tableName = "TBLSRDLOR";
+    long commitId = 1L;
+    // Long enough that only the first scheduled tick can fire within the test 
window.
+    final int roundDurationSeconds = 1;
+    final int numAppendsAfterTick = 20;
+
+    conf.setInt(PHOENIX_REPLICATION_ROUND_DURATION_SECONDS_KEY, 
roundDurationSeconds);
+    recreateLogGroup();
+
+    ReplicationLog activeLog = logGroup.getActiveLog();
+    LogFileWriter initialWriter = activeLog.getWriter();
+    assertNotNull("Initial writer should not be null", initialWriter);
+
+    // Count writers created so we can detect a rotation chain. Each writer 
created after the
+    // initial one is configured to report a length above the rotation 
threshold so the size
+    // check (wherever it runs) sees an over-threshold writer.
+    AtomicInteger writersCreated = new AtomicInteger(1); // initial writer 
already created
+    final long oversizedLength = TEST_ROTATION_SIZE_BYTES + 1;
+    doAnswer(invocation -> {
+      LogFileWriter w = (LogFileWriter) invocation.callRealMethod();
+      doReturn(oversizedLength).when(w).getLength();
+      writersCreated.incrementAndGet();
+      return w;
+    }).when(activeLog).createNewWriter();
+
+    // Append unsynced records so currentBatch is non-empty when the rotation 
tick fires.
+    final Mutation put = LogFileTestUtil.newPut("row", 1, 1);
+    for (int i = 0; i < 5; i++) {
+      logGroup.append(tableName, commitId++, put);
+    }
+
+    // Wait for the time-based rotation tick to stage a fresh pendingWriter.
+    waitForRotationTick(roundDurationSeconds);
+
+    // Continue appending. On the buggy code, the first such append drains the 
staged writer,
+    // replays the batch, then requestRotationIfNeeded() inside apply() fires 
(length is over
+    // threshold) → requests another rotation. Subsequent appends drain that 
one too, replay,
+    // request another, and so on — one new writer per append.
+    for (int i = 0; i < numAppendsAfterTick; i++) {
+      logGroup.append(tableName, commitId++, put);
+      // Small sleep to give the on-demand rotation task time to actually 
create the writer
+      // before the next append drains it. This makes the chain observable in 
the test.
+      Thread.sleep(20);
+    }

Review Comment:
   This test relies on a hard-coded `Thread.sleep(20)` to let the background 
rotation task create/stage a writer. This timing dependency can be flaky on 
slower CI or loaded machines; consider using a synchronization primitive (e.g., 
latch in the `createNewWriter()` Answer) to deterministically wait for the 
writer to be staged before proceeding.



##########
phoenix-core/src/test/java/org/apache/phoenix/replication/ReplicationLogGroupTest.java:
##########
@@ -1907,4 +1908,73 @@ public void 
testBlockFullSyncOnAppendReducesReplayOnRotation() throws Exception
     assertTrue("Replay should be bounded by last partial block, not all 
records. Got: "
       + newWriterAppendCount, newWriterAppendCount < id);
   }
+
+  /**
+   * Tests that the size check is not invoked from inside apply(). Pre-fix, 
every successful action
+   * inside apply() ran requestRotationIfNeeded(); when a rotation swapped in 
a new writer mid-batch
+   * and replay made it exceed the threshold, the size check after every 
append would request
+   * another rotation, drain it on the next apply(), replay again, and loop 
indefinitely.
+   * <p>
+   * The new writer is mocked so its getLength() always reports above the 
rotation threshold. On the
+   * buggy code this triggers a rotation request after every successful 
append, which cascades into
+   * one writer per append. With the fix, requestRotationIfNeeded() runs only 
after currentBatch is
+   * cleared, so the chain is bounded.
+   */
+  @Test
+  public void testSizeRotationDoesNotLoopOnReplay() throws Exception {
+    final String tableName = "TBLSRDLOR";
+    long commitId = 1L;
+    // Long enough that only the first scheduled tick can fire within the test 
window.
+    final int roundDurationSeconds = 1;
+    final int numAppendsAfterTick = 20;
+
+    conf.setInt(PHOENIX_REPLICATION_ROUND_DURATION_SECONDS_KEY, 
roundDurationSeconds);
+    recreateLogGroup();
+
+    ReplicationLog activeLog = logGroup.getActiveLog();
+    LogFileWriter initialWriter = activeLog.getWriter();
+    assertNotNull("Initial writer should not be null", initialWriter);
+
+    // Count writers created so we can detect a rotation chain. Each writer 
created after the
+    // initial one is configured to report a length above the rotation 
threshold so the size
+    // check (wherever it runs) sees an over-threshold writer.
+    AtomicInteger writersCreated = new AtomicInteger(1); // initial writer 
already created
+    final long oversizedLength = TEST_ROTATION_SIZE_BYTES + 1;
+    doAnswer(invocation -> {
+      LogFileWriter w = (LogFileWriter) invocation.callRealMethod();
+      doReturn(oversizedLength).when(w).getLength();
+      writersCreated.incrementAndGet();
+      return w;
+    }).when(activeLog).createNewWriter();
+
+    // Append unsynced records so currentBatch is non-empty when the rotation 
tick fires.
+    final Mutation put = LogFileTestUtil.newPut("row", 1, 1);
+    for (int i = 0; i < 5; i++) {
+      logGroup.append(tableName, commitId++, put);
+    }
+
+    // Wait for the time-based rotation tick to stage a fresh pendingWriter.
+    waitForRotationTick(roundDurationSeconds);
+
+    // Continue appending. On the buggy code, the first such append drains the 
staged writer,
+    // replays the batch, then requestRotationIfNeeded() inside apply() fires 
(length is over
+    // threshold) → requests another rotation. Subsequent appends drain that 
one too, replay,
+    // request another, and so on — one new writer per append.
+    for (int i = 0; i < numAppendsAfterTick; i++) {
+      logGroup.append(tableName, commitId++, put);
+      // Small sleep to give the on-demand rotation task time to actually 
create the writer
+      // before the next append drains it. This makes the chain observable in 
the test.
+      Thread.sleep(20);
+    }
+    logGroup.sync();
+
+    // Without the fix the count grows roughly linearly with 
numAppendsAfterTick (one new
+    // writer per append). With the fix, it is bounded regardless of how many 
appends are
+    // performed: initial + scheduled ticks (test runtime ~2s with 1s round = 
1-2 ticks) +
+    // at most one size-triggered rotation per sync point. Use a generous 
bound that is well
+    // below the buggy behavior (which produced ~24) but tolerates extra 
scheduled ticks.
+    int writers = writersCreated.get();
+    assertTrue("Size-based rotation must not loop. Writers created: " + 
writers,
+      writers <= numAppendsAfterTick / 2);

Review Comment:
   The assertion reads `writersCreated` immediately after `logGroup.sync()`, 
but writer creation happens on the separate `rotationExecutor` thread and may 
still be in-flight/queued at that point. That can make the test miss the 
"writer-per-append" regression if the executor hasn’t caught up yet. Consider 
waiting until rotations have quiesced (e.g., wait for `rotationRequested` to 
clear and/or for `pendingWriter` to be staged/drained) before asserting on the 
final writer count.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to