[
https://issues.apache.org/jira/browse/HADOOP-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083113#comment-18083113
]
ASF GitHub Bot commented on HADOOP-19902:
-----------------------------------------
sunchao opened a new pull request, #8513:
URL: https://github.com/apache/hadoop/pull/8513
## Why are the changes needed?
With `fs.azure.write.enableappendwithflush=true`, ABFS's small-write
optimized `hflush()` submits and consumes the active `DataBlock` but leaves it
recorded as the active writable block. A later `close()` retries that already
consumed block and fails with `IllegalStateException: Expected stream state
Writing -but actual state is Closed`.
JIRA: [HADOOP-19902](https://issues.apache.org/jira/browse/HADOOP-19902)
## What changes were proposed in this PR?
Clear the active block after the optimized append-with-flush path submits it
for upload, matching the lifecycle already used by `uploadCurrentBlock()`.
Add a regression test with small-write optimization enabled that executes
`write()`, `hflush()`, and `close()`, then asserts the payload is appended
exactly once using `FLUSH_MODE`.
Contains content generated by Codex.
## How was this PR tested?
- Unit test: `./mvnw -pl hadoop-tools/hadoop-azure -am -DskipITs
-Dtest=org.apache.hadoop.fs.azurebfs.services.TestAbfsOutputStream#verifySmallWriteOptimizedHFlushFollowedByClose
-DfailIfNoTests=false test`
### AI Tooling
- [x] The PR includes the phrase `Contains content generated by Codex`.
- [x] My use of AI tooling follows ASF legal policy.
> [ABFS] Small write optimization fails hflush followed by close by retaining
> consumed block
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-19902
> URL: https://issues.apache.org/jira/browse/HADOOP-19902
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Chao Sun
> Priority: Major
>
> When `fs.azure.write.enableappendwithflush` is enabled, `AbfsOutputStream`
> fails for a short write followed by `hflush()` and `close()`.
> h3. Reproducer
> {code:java}
> try (FSDataOutputStream out = fs.create(path)) {
> out.write(new byte[1000]);
> out.hflush();
> }
> {code}
> Run with `fs.azure.write.enableappendwithflush=true` and a write buffer
> larger than the payload. The issue is present on current trunk and branch-3.4.
> h3. Actual behavior
> The `hflush()` call sends an append-with-flush request and consumes the
> underlying data block. The subsequent `close()` still sees the same block as
> active and attempts to upload it again, failing before a second append can be
> sent:
> {code}
> java.lang.IllegalStateException: Expected stream state Writing -but actual
> state is Closed in ByteBufferBlock\{...}
> at org.apache.hadoop.fs.store.DataBlocks$DataBlock.verifyState(...)
> at org.apache.hadoop.fs.store.DataBlocks$ByteBufferBlock.startUpload(...)
> at org.apache.hadoop.fs.azurebfs.services.AbfsBlock.startUpload(...)
> at
> org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.uploadBlockAsync(...)
> at
> org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.smallWriteOptimizedflushInternal(...)
> at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.close(...)
> {code}
> h3. Expected behavior
> After an optimized `hflush()`, `close()` should complete successfully without
> attempting to re-upload the data already submitted by the flush-mode append.
> h3. Root cause
> `smallWriteOptimizedflushInternal()` calls `uploadBlockAsync()`, which
> invokes `startUpload()` and consumes the active block, but the optimized path
> does not clear that block from the block manager. The regular
> `uploadCurrentBlock()` path already clears the active block in a `finally`
> block after submission.
> h3. Proposed fix
> Clear the active block after submitting the optimized append-with-flush,
> matching the lifecycle used by regular uploads, and add a regression test for
> `write() -> hflush() -> close()` that verifies the payload is appended
> exactly once.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]