tkhurana opened a new pull request, #2433: URL: https://github.com/apache/phoenix/pull/2433
## Summary Rewrites the `ReplicationLog` writer rotation mechanism to be lock-free and asynchronous, improving throughput on the disruptor consumer hot path. - **Lock-free writer swap**: Replaced `ReentrantLock`-based synchronous `rotateLog()` with an `AtomicReference<LogFileWriter> pendingWriter` staging pattern. `LogRotationTask` creates new writers on a background thread and stages them in `pendingWriter`. The disruptor consumer thread atomically drains the staged writer inside `apply()` before each action, replays any unsynced appends, and continues. - **Async close**: Old writers are submitted to a dedicated `closeExecutor` for background close, keeping the hot path unblocked. - **On-demand size rotation**: Instead of checking size under a lock in `getWriter()`, `requestRotationIfNeeded()` is called post-action and submits a `LogRotationTask` to the executor if the size threshold is exceeded. An `AtomicBoolean rotationRequested` prevents duplicate submissions. - **Round-boundary aligned scheduling**: The rotation executor's `initialDelay` is computed via `computeInitialDelay()` to align with replication round boundaries. The rotation interval is now derived from `replicationRoundDurationSeconds` rather than a separate config key. - **Simplified retry logic in `apply()`**: First failure retries on the same writer (transient). Second failure triggers `requestRotation()` to create a new writer on the background thread. The retry sleep gives the background thread time to stage it, and the next attempt drains it. - **Removed `RotationReason` enum and per-reason metrics**: Eliminated `TIME`/`SIZE`/`ERROR` rotation reason tracking. Only total rotation count and failure count remain. - **`closed` field upgraded to `AtomicBoolean`** for safe concurrent access without locks. ### Test changes - Added `recreateLogGroup()`, `overrideConf()`, and `waitForRotationTick()` helpers to `ReplicationLogBaseTest` - `TestableLog` overrides `startRotationExecutor()` with a full-round initial delay to prevent flaky boundary-related failures - 8 new tests covering: mid-batch replay, undrained pending writer replacement, staged-writer pickup on retry, idle + lease recovery, replay failure retry, error-recovery rotation, on-demand rotation not suppressing scheduled ticks, round-boundary alignment - New `ReplicationLogTest` with unit tests for `computeInitialDelay` - Updated existing tests to use `forceRotation()` instead of removed `rotateLog()` ## Test plan - [ ] `mvn test -pl phoenix-core -Dtest=ReplicationLogTest` - [ ] `mvn test -pl phoenix-core -Dtest=ReplicationLogGroupTest` - [ ] `mvn test -pl phoenix-core -Dtest=ReplicationLogDiscoveryForwarderTest` - [ ] Verify no regressions in other replication tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
