danny0405 opened a new pull request, #18776:
URL: https://github.com/apache/hudi/pull/18776
```markdown
### Describe the issue this Pull Request addresses
Several Hudi file-writing paths can allocate output streams or file writers
before all initialization/write steps complete. If a later constructor step,
write operation, flush, or close-time metadata operation throws before the
normal close path is reached, the underlying writer or stream may remain open.
This PR tightens cleanup for Parquet, HFile, binary-copy, Spark/Flink row
writers, bootstrap index writers, and write handles so writer-owned resources
are closed when failures occur. There are no storage format, public API, or
config changes.
### Summary and Changelog
This change makes file-writer cleanup more exception-safe across production
write paths and adds focused coverage for the binary-copy and create-handle
failure cases.
#### Working tree: close file writers on failure paths
- Close raw output streams if Parquet writer construction fails in
`HoodieParquetStreamWriter`, `HoodieSparkParquetStreamWriter`, and
`HoodieRowDataParquetOutputStreamWriter`.
- Close `HoodieAvroHFileWriter` output stream only when writer construction
does not take ownership.
- Harden `HFileBootstrapIndexWriter.begin()` and `close()` so partially
initialized HFile writers are closed and close failures are preserved.
- Ensure `ParquetUtils.serializeRecordsToLogBlock` closes the
`HoodieFileWriter` with try-with-resources.
- Ensure `SparkHelpers` closes `HoodieAvroParquetWriter` in a `finally`
block.
- Harden `HoodieParquetBinaryCopyBase` by closing the Parquet writer when
`start()` fails, clearing writer state after close, and closing column writers
in `maskColumn` / `addNullColumn` when flush/write fails.
- Add compatibility handling for `ParquetFileWriter.close()` via reflective
method lookup only when the runtime class exposes `close()`.
#### Working tree: close write-handle owned writers on failure paths
- `BaseCreateHandle` now closes and clears `fileWriter` when record writing
fails and write failures are not ignored.
- `HoodieAppendHandle` now closes the log writer when record write,
compaction write, or close-time append/flush fails.
- `HoodieWriteMergeHandle` now closes and clears `fileWriter` when
`writeIncomingRecords()` or close-time operations fail.
- `HoodieSortedMergeHandle` now routes pending insert writes through
`writeIncomingRecords()` so the base merge-handle close protection applies.
- `HoodieBinaryCopyHandle` now closes the binary copier if `binaryCopy()`
fails before the normal close path.
#### Working tree: tests
- Added `TestHoodieParquetBinaryCopyBaseSchemaEvolution` coverage for:
- missing `ParquetFileWriter.close()` compatibility behavior,
- invoking `close()` when present,
- clearing the writer when `end()` fails.
- Added `TestHoodieCreateHandle#testFileWriterClosedWhenDoWriteFails` to
verify `fileWriter` is cleared after write failure.
### Impact
No public API, config, or storage format changes. The impact is limited to
safer resource cleanup in failure paths for file-writing and write-handle code.
Successful write behavior should remain unchanged.
Affected paths include Hudi create/append/merge/binary-copy handles, Parquet
stream writers, HFile writers, bootstrap index writers, and Spark/Flink writer
integrations.
### Risk Level
medium
This touches core write-path cleanup logic across multiple modules,
including append, merge, create, binary-copy, Parquet, and HFile paths. The
behavioral intent is narrow: close resources on exception paths and preserve
original failures by adding close failures as suppressed where applicable.
Validation performed:
- `git diff --check` passes.
- `mvn -pl hudi-hadoop-common -DskipITs -Dcheckstyle.skip -Dspotbugs.skip
-Dtest=TestHoodieParquetBinaryCopyBaseSchemaEvolution test` passed with 11
tests.
- `mvn -pl hudi-hadoop-common,hudi-common -DskipTests -DskipITs
-Dcheckstyle.skip -Dspotbugs.skip compile` passed.
- `mvn -pl hudi-client/hudi-client-common -DskipITs -Dcheckstyle.skip
-Dspotbugs.skip
-Dtest=TestHoodieCreateHandle#testFileWriterClosedWhenDoWriteFails test` was
attempted but blocked during compile by an unrelated existing error in
`StreamingOffsetValidator.java:170` for missing
`ValidationContext#getTotalWriteErrors()`.
### Documentation Update
none
This PR does not add or change user-facing configs, APIs, file formats, or
documented behavior. It only improves cleanup behavior on internal failure
paths.
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Enough context is provided in the sections above
- [ ] Adequate tests were added if applicable
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]