aglinxinyuan opened a new issue, #5446:
URL: https://github.com/apache/texera/issues/5446

   ## Background
   
   Three modules in `engine/common/storage` currently lack a dedicated 
unit-spec:
   
   | Source class | Package | Purpose |
   | --- | --- | --- |
   | `SequentialRecordStorage` | 
`org.apache.texera.amber.engine.common.storage` | Abstract sequential-record 
reader/writer + `getStorage` factory |
   | `VFSRecordStorage` | (same) | Apache Commons VFS concrete implementation |
   | `EmptyRecordStorage` | (same) | Null-object implementation (no-op writer / 
EOF reader / always-`false` `containsFolder`) |
   
   All three are reachable from production code 
(`SequentialRecordStorage.getStorage` is the factory used by checkpoint 
logging) but none have characterization tests. A regression in any of these 
would only surface as a downstream serde / replay failure.
   
   ## What we want pinned
   
   Behavior we want to lock in:
   
   | Area | Contract |
   | --- | --- |
   | `SequentialRecordStorage.getStorage(None)` | returns an 
`EmptyRecordStorage` |
   | `SequentialRecordStorage.getStorage(Some(file://…))` | returns a 
`VFSRecordStorage` |
   | `SequentialRecordStorage.getStorage(Some(hdfs://…))` | dispatches to 
`HDFSRecordStorage` (path covered without actually opening an HDFS connection 
by asserting the constructor blows up on a non-resolvable host rather than 
silently returning `VFSRecordStorage`) |
   | `SequentialRecordWriter` / `SequentialRecordReader` | round-trip a 
sequence of records through `AmberRuntime.serde` (size-prefixed framing) |
   | `SequentialRecordStorage.fetchAllRecords` | iterates all records returned 
by the underlying reader |
   | `VFSRecordStorage` constructor | auto-creates the target folder when it 
does not exist |
   | `VFSRecordStorage.getWriter` / `getReader` | round-trip a record through a 
local `file://` URI |
   | `VFSRecordStorage.deleteStorage` | removes the on-disk folder created by 
the constructor |
   | `VFSRecordStorage.containsFolder` | distinguishes existing folder vs. 
existing file vs. missing entry |
   | `EmptyRecordStorage.getWriter` | returns a writer backed by 
`NullOutputStream` (writes are silently discarded) |
   | `EmptyRecordStorage.getReader` | returns a reader that yields zero records 
|
   | `EmptyRecordStorage.deleteStorage` / `containsFolder` | are no-op and 
always-`false` respectively |
   
   ## Scope
   
   - New spec files (one per source class per the spec-filename convention):
     - `SequentialRecordStorageSpec.scala`
     - `VFSRecordStorageSpec.scala`
     - `EmptyRecordStorageSpec.scala`
   - No production-code changes.
   - Tests use the production wire path (`AmberRuntime.serde`) the same way 
`CheckpointSubsystemSpec` / `ClientEventSpec` do (a suite-local `ActorSystem` 
injected into `AmberRuntime` via reflection, torn down in `afterAll`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to