voonhous opened a new pull request, #18532:
URL: https://github.com/apache/hudi/pull/18532
### Describe the issue this Pull Request addresses
<!-- Either describe the issue inline here with motivation behind the
changes
(or) link to an issue by including `Closes #<issue-number>` for
context.
If this PR includes changes to the storage format, public APIs,
or has breaking changes, use `!` (e.g., feat!: ...) -->
`HoodieTableMetadataUtil.convertMetadataToFilesPartitionRecords` aggregated
per-partition write stats via `writeStats.stream().reduce(new HashMap<>(...),
accumulator, CollectionUtils::combine)`.
The "identity" is a mutable `HashMap` that the accumulator mutates and
returns which is a misuse of `Stream.reduce`, whose contract assumes the
identity is safe to combine with any element as a no-op.
The only reason this works is that the stream is sequential and the method
runs on the driver (the caller, `HoodieMetadataWriteUtils`, then wraps the
returned list via `context.parallelize(..., 1)`).
A plain for-loop expresses the same aggregation directly and is the
idiomatic shape for mutable-accumulation sequential code.
### Summary and Changelog
<!-- Short, plain-English summary of what users gain or what changed in
behavior.
Followed by a detailed log of all the changes. Highlight if any code
was copied. -->
Internal readability/idiom cleanup in the metadata-table write path. No
behavior change.
- Replaced the `writeStats.stream().reduce(...)` call with an imperative
`for (HoodieWriteStat stat : writeStats)` loop that builds the
`updatedFilesToSizesMapping` HashMap directly.
- Same merge semantics (`Math::max` on per-file size; CDC path/size entries
overlaid).
- No change to the surrounding
`partitionToWriteStats.entrySet().stream().map(...)` pipeline.
### Impact
<!-- Describe any public API or user-facing feature change or any
performance impact. -->
None. Readability and idiom cleanup only; behavior and allocation shape are
materially unchanged.
### Risk Level
<!-- Accepted values: none, low, medium or high. Other than `none`, explain
the risk.
If medium or high, explain what verification was done to mitigate the
risks. -->
low
### Documentation Update
<!-- Describe any necessary documentation update if there is any new
feature, config, or user-facing change. If not, put "none".
- The config description must be updated if new configs are added or the
default value of the configs are changed.
- Any new feature or user-facing change requires updating the Hudi website.
Please follow the
[instruction](https://hudi.apache.org/contribute/developer-setup#website)
to make changes to the website. -->
none
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]