aviralgarg05 opened a new pull request, #15927:
URL: https://github.com/apache/iceberg/pull/15927
Fixes #15924
## Summary
This change fixes `RewriteTablePathUtil.rewriteDVFile` so DV Puffin files
are rewritten in a streaming fashion instead of buffering every rewritten blob
in memory first.
The previous implementation collected all rewritten `Blob` instances into a
list and wrote them only after the read loop finished. That created unnecessary
peak memory usage for large deletion vector files. The new implementation
rewrites each blob and writes it directly to the destination `PuffinWriter` as
it is read.
## What changed
- Reworked `rewriteDVFile` to open the `PuffinWriter` alongside the
`PuffinReader`.
- Removed the intermediate `List<Blob>` accumulation.
- Preserved the existing `referenced-data-file` path rewrite behavior for DV
blobs.
- Added a regression test that:
- creates a real Puffin DV file with multiple blobs,
- rewrites it through `RewriteTablePathUtil`,
- verifies the rewritten blob metadata,
- verifies the blob payloads are preserved.
## Why this fixes the issue
The DV rewrite path is only supposed to update blob metadata, not
materialize the entire file in memory. Writing each blob as soon as it is read
keeps memory usage bounded by a single blob instead of the full DV file
contents.
## Verification
Ran the following checks successfully:
- `./gradlew :iceberg-core:test --tests
org.apache.iceberg.TestRewriteTablePathUtil`
- `./gradlew :iceberg-core:spotlessCheck :iceberg-core:test --tests
org.apache.iceberg.TestRewriteTablePathUtil`
- `git diff --check`
The targeted core test suite was executed three times during validation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]