wirybeaver commented on PR #2203: URL: https://github.com/apache/iceberg-rust/pull/2203#issuecomment-4356747375
Thank you @mbutrovich — this is the most thorough review the PR has received and it caught real bugs. Here is a point-by-point response, followed by what was changed. --- ## Blocking Issues ### 1 & 2: Manifest rewriting + sequence numbers Both are now fixed together, as you predicted. `existing_manifest()` now rewrites any manifest that contains a deleted file, matching Java's `ManifestFilterManager.filterManifestWithDeletedFiles`: - Removed files → `DELETED` entry (snapshot_id updated to current commit, sequence numbers copied verbatim from the loaded entry) - Surviving files → `EXISTING` entry (all original fields, including sequence numbers, preserved) - Manifests with no deleted files → carried forward unchanged Since the sequence numbers are read directly off the loaded `ManifestEntry` (which has them set via inheritance during `load_manifest`), the TODO for sequence numbers is gone — fixing the manifest rewrite gave us correct sequence numbers for free, exactly as you described. To enable this, `SnapshotProduceOperation::existing_manifest` was changed to take `&mut SnapshotProducer<'_>` so implementations can call `snapshot_produce.new_manifest_writer()` to produce rewritten manifests. ### 3: Snapshot summary missing deleted-file metrics Fixed. Added `removed_data_files()` as a default method on `SnapshotProduceOperation` (returns `&[]` by default, overridden in `RowDeltaOperation`). `SnapshotProducer::summary()` now calls `summary_collector.remove_file()` for each removed file, populating `deleted-data-files`, `deleted-records`, and `removed-files-size`. --- ## Significant Issues ### 4: Tests don't exercise critical code paths Added `test_row_delta_cow_manifest_rewrite`: uses `FastAppendAction` to write file-A + file-B into a real snapshot S1 (backed by in-memory FileIO), then `RowDeltaAction` to remove file-A and add file-C. The test loads S2's manifest list and asserts: - file-A → `ManifestStatus::Deleted` with non-null sequence numbers - file-B → `ManifestStatus::Existing` with non-null sequence numbers - file-C → `ManifestStatus::Added` - `deleted-data-files = 1` in the snapshot summary ### 5: `test_row_delta_remove_only` removes a phantom file You are correct that the concurrency concern (files removed by a concurrent commit) is separate from partition validation. That validation (`validateDataFilesExist`) is out of scope for this PR and belongs alongside the broader `validate_from_snapshot` / ancestry-check work. I've updated the reply to jdockerty's original comment to clarify this distinction. --- ## Design Concerns ### 6: `add_delete_files` silently drops files Fixed. `commit()` now returns `ErrorKind::FeatureUnsupported` immediately if `added_delete_files` is non-empty. The API is preserved so MoR support can wire it up without a breaking change. ### 7: `write_delete_manifest` naming confusion Renamed to `write_manifest_with_deleted_entries` with a comment distinguishing it from Iceberg delete manifests (content=Deletes). --- ## Nits ### 8: Verbose doc comments Trimmed. `RowDeltaAction` now has a 2-line struct comment; method comments match the single-sentence style used in the rest of the transaction module. ### 9 & 10: Noted as follow-ups - The "fast append" string in the shared validation error message would require a change in `snapshot.rs` that affects all callers — deferred. - The O(manifests × entries) scan in `existing_manifest` has a TODO comment; optimizing with manifest-level statistics (partition bounds, file counts) is tracked separately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
