aglinxinyuan opened a new issue, #5446:
URL: https://github.com/apache/texera/issues/5446
## Background
Three modules in `engine/common/storage` currently lack a dedicated
unit-spec:
| Source class | Package | Purpose |
| --- | --- | --- |
| `SequentialRecordStorage` |
`org.apache.texera.amber.engine.common.storage` | Abstract sequential-record
reader/writer + `getStorage` factory |
| `VFSRecordStorage` | (same) | Apache Commons VFS concrete implementation |
| `EmptyRecordStorage` | (same) | Null-object implementation (no-op writer /
EOF reader / always-`false` `containsFolder`) |
All three are reachable from production code
(`SequentialRecordStorage.getStorage` is the factory used by checkpoint
logging) but none have characterization tests. A regression in any of these
would only surface as a downstream serde / replay failure.
## What we want pinned
Behavior we want to lock in:
| Area | Contract |
| --- | --- |
| `SequentialRecordStorage.getStorage(None)` | returns an
`EmptyRecordStorage` |
| `SequentialRecordStorage.getStorage(Some(file://…))` | returns a
`VFSRecordStorage` |
| `SequentialRecordStorage.getStorage(Some(hdfs://…))` | dispatches to
`HDFSRecordStorage` (path covered without actually opening an HDFS connection
by asserting the constructor blows up on a non-resolvable host rather than
silently returning `VFSRecordStorage`) |
| `SequentialRecordWriter` / `SequentialRecordReader` | round-trip a
sequence of records through `AmberRuntime.serde` (size-prefixed framing) |
| `SequentialRecordStorage.fetchAllRecords` | iterates all records returned
by the underlying reader |
| `VFSRecordStorage` constructor | auto-creates the target folder when it
does not exist |
| `VFSRecordStorage.getWriter` / `getReader` | round-trip a record through a
local `file://` URI |
| `VFSRecordStorage.deleteStorage` | removes the on-disk folder created by
the constructor |
| `VFSRecordStorage.containsFolder` | distinguishes existing folder vs.
existing file vs. missing entry |
| `EmptyRecordStorage.getWriter` | returns a writer backed by
`NullOutputStream` (writes are silently discarded) |
| `EmptyRecordStorage.getReader` | returns a reader that yields zero records
|
| `EmptyRecordStorage.deleteStorage` / `containsFolder` | are no-op and
always-`false` respectively |
## Scope
- New spec files (one per source class per the spec-filename convention):
- `SequentialRecordStorageSpec.scala`
- `VFSRecordStorageSpec.scala`
- `EmptyRecordStorageSpec.scala`
- No production-code changes.
- Tests use the production wire path (`AmberRuntime.serde`) the same way
`CheckpointSubsystemSpec` / `ClientEventSpec` do (a suite-local `ActorSystem`
injected into `AmberRuntime` via reflection, torn down in `afterAll`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]