aglinxinyuan opened a new pull request, #5554: URL: https://github.com/apache/texera/pull/5554
### What changes were proposed in this PR? Pin behavior of two previously-uncovered modules in `engine/architecture/logreplay`. No production-code changes. | Spec | Source class | Tests | | --- | --- | --- | | `EmptyReplayLoggerSpec` | `EmptyReplayLogger` | 9 | | `ReplayLogGeneratorSpec` | `ReplayLogGenerator` | 10 | Both spec files follow the `<srcClassName>Spec.scala` one-to-one convention. **Behavior pinned — `EmptyReplayLogger`** | Surface | Contract | | --- | --- | | `drainCurrentLogRecords(step)` | returns an empty `Array[ReplayLogRecord]` regardless of `step` (positive, zero, `Long.MaxValue`, negative); returns a non-null array; element type is `ReplayLogRecord` (compile-time enforced) | | `markAsReplayDestination` | no-op for any `EmbeddedControlMessageIdentity`; does not accumulate state | | `logCurrentStepWithMessage` | no-op for any `(step, channelId, msg)` triple, including `msg = None` and 100 successive calls | | Trait conformance | usable through the `ReplayLogger` interface | **Behavior pinned — `ReplayLogGenerator.generate`** | Surface | Contract | | --- | --- | | `getStorage(None)` (EmptyRecordStorage) | returns `(empty queue, empty queue)` | | empty `VFSRecordStorage` file | returns `(empty queue, empty queue)` | | only `ProcessingStep` records | enqueues all into the steps queue, preserving insertion order | | only `MessageContent` records | enqueues all into the messages queue, preserving insertion order | | interleaved steps + messages | partitions correctly by type, preserving per-type order | | `ReplayDestination(id)` matching `replayTo` | short-circuits — records after the cut are NOT enqueued | | `ReplayDestination(id)` NOT matching `replayTo` | is silently skipped, iteration continues | | multiple matching `ReplayDestination` | stops at the FIRST one | | unknown record type (e.g. `TerminateSignal`) | throws `RuntimeException` with a diagnostic message | **Notes** While writing `ReplayLogGeneratorSpec` I discovered a **production stream-leak** in `ReplayLogGenerator.generate`: when it hits the matching `ReplayDestination` it short-circuits via a non-local `return`, leaving the `SequentialRecordReader`'s underlying `Input` stream open. On Windows the leaked file handle blocks subsequent temp-dir deletion. The spec's `cleanup` helper tolerates the resulting `FileSystemException` so the case still passes; the real fix is at the source and is out of scope for a test-coverage PR (will file a follow-up issue). `ReplayLogGeneratorSpec` uses a temp-dir-backed `VFSRecordStorage[ReplayLogRecord]` and the production `AmberRuntime.serde` (suite-local `ActorSystem` injected via reflection, torn down in `afterAll`) — same harness pattern as `CheckpointSubsystemSpec` / `ClientEventSpec` / `VFSRecordStorageSpec`. ### Any related issues, documentation, discussions? Closes #5550. ### How was this PR tested? Pure unit-test additions; verified locally with: - `sbt "WorkflowExecutionService/testOnly org.apache.texera.amber.engine.architecture.logreplay.EmptyReplayLoggerSpec org.apache.texera.amber.engine.architecture.logreplay.ReplayLogGeneratorSpec"` — 19 tests, all green - `sbt scalafmtCheckAll` — clean - CI to confirm ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Sonnet 4.5) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
