Re: [PR] feat(transaction): add RowDelta transaction action for row-level modi… [iceberg-rust]

via GitHub Fri, 17 Apr 2026 12:28:09 -0700


mbutrovich commented on PR #2203:
URL: https://github.com/apache/iceberg-rust/pull/2203#issuecomment-4270715471


   Thanks for the contribution @wirybeaver! The overall structure looks solid.
   
   I had Claude make sense of my notes during reviewing, and most of my 
feedback is around how manifests need to be handled when deleting files. The 
good news is that the manifest carry-forward and sequence number issues are 
closely related, so fixing manifest rewriting solves both at once:
   
   ## Blocking Issues
   
   ### 1. `existing_manifest()` drops entire manifests containing deleted files 
(data loss)
   
   `row_delta.rs:213-247`: When any file in a manifest is being deleted, the 
entire manifest is excluded. Non-deleted files in that manifest silently vanish 
from the table.
   
   **Spec (Snapshots section):**
   > "Manifest files are reused across snapshots to avoid rewriting metadata 
that is slow-changing."
   
   The spec expects manifests to be carried forward or rewritten, not dropped. 
Per the Scan Planning section:
   > "all file paths marked with 'ADDED' or 'EXISTING' may appear at most once 
across all manifest files in the snapshot"
   
   so you can't just re-add the surviving files to a different manifest without 
removing their entries from the original. The manifest must be rewritten.
   
   **Java (`ManifestFilterManager.filterManifestWithDeletedFiles`, lines 
498-575):** When a manifest contains files to delete, Java rewrites the 
manifest. Entries being deleted get `writer.delete(entry)` (DELETED status), 
surviving entries get `writer.existing(entry)` (EXISTING status). The rewritten 
manifest replaces the original in the manifest list. This is the core of 
`filterManager.filterManifests()` called in `MergingSnapshotProducer.apply()` 
at lines 965-968.
   
   I think the PR needs to implement this rewrite logic. The current 
`SnapshotProduceOperation` trait's `existing_manifest()` returns 
`Vec<ManifestFile>` which only supports carry-forward or drop. One approach 
would be to extend the trait; another would be to do the rewrite within 
`existing_manifest()` itself (write new manifest files, return those). Take a 
look at how `filterManifestWithDeletedFiles` works in Java, it's the clearest 
reference for this.
   
   ---
   
   ### 2. Sequence numbers on DELETED entries are zero (spec violation)
   
   `row_delta.rs:183-200`: Uses `sequence_number(0)` and 
`file_sequence_number(0)` for DELETED entries.
   
   **Spec (Sequence Number Inheritance section):**
   > "When writing an existing file to a new manifest or marking an existing 
file as deleted, the data and file sequence numbers must be non-null and set to 
the original values that were either inherited or provided at the commit time."
   
   And from the Manifest Entry Fields:
   > "The `sequence_number` field represents the data sequence number and must 
never change after a file is added to the dataset."
   
   Zero is not the original value. To fix this, you'd need to load the original 
manifest entry for each removed file and copy its sequence numbers. The TODO at 
line 185 shows you're aware of this. I'd suggest not deferring it since it 
produces metadata that other Iceberg implementations may reject.
   
   **Java (`ManifestWriter.delete`, lines 176-194):** Java preserves original 
sequence numbers explicitly:
   ```java
   public void delete(F deletedFile, long dataSequenceNumber, Long 
fileSequenceNumber) {
       addEntry(reused.wrapDelete(snapshotId, dataSequenceNumber, 
fileSequenceNumber, deletedFile));
   }
   ```
   The `dataSequenceNumber` and `fileSequenceNumber` are copied from the 
original manifest entry, never set to zero.
   
   Note: this is coupled to the manifest carry-forward issue above. If you 
implement manifest rewriting (as Java does), you'll already have the original 
manifest entries in hand and can copy the sequence numbers directly. So fixing 
the manifest rewrite gives you this almost for free.
   
   ---
   
   ### 3. Snapshot summary omits deleted-file metrics
   
   `snapshot.rs:356-404`: `SnapshotProducer::summary()` only iterates 
`self.added_data_files`. Removed files are never counted.
   
   **Spec (Appendix F: Optional Snapshot Summary Fields):** The spec defines 
`deleted-data-files`, `deleted-records`, `removed-files-size`, and 
`total-data-files` as standard summary metrics. While technically optional, 
these are expected by table maintenance operations (compaction, snapshot 
expiration) and other Iceberg implementations.
   
   **Java (`SnapshotSummary.removedFile`, lines 319-345):** Java tracks 
`removedFiles`, `deletedRecords`, and `removedSize` whenever a file is deleted. 
These are merged into the summary via `filterManager.buildSummary(filtered)` in 
`MergingSnapshotProducer.apply()` at lines 1002-1007.
   
   This produces snapshots where `deleted-data-files` is absent (or zero) even 
when files are being removed. Downstream consumers (including iceberg-java 
based readers) may rely on these for planning, so it's worth wiring up.
   
   ---
   
   ## Significant Issues
   
   ### 4. Tests don't exercise the critical code paths
   
   All tests use `make_v2_minimal_table()` which has no existing snapshots. 
This means `existing_manifest()` always returns `[]`, so the manifest 
carry-forward bug is never triggered. The broken sequence number path runs but 
produces entries with `snapshot_id = None` (no current snapshot), so the `0` 
values are never validated against real data.
   
   Compare with `test_fast_append` in `append.rs:263-335`, which loads the 
resulting manifest list, verifies entry counts, checks sequence numbers, and 
compares data files. I'd suggest the RowDelta tests should at minimum:
   - Set up a table with existing data files (via `FastAppendAction`)
   - Remove one file via `RowDeltaAction`
   - Verify surviving files remain accessible in the new snapshot's manifests
   - Verify DELETED entries have correct sequence numbers
   
   ### 5. `test_row_delta_remove_only` removes a phantom file
   
   The test deletes a file that was never part of the table. This doesn't 
really test the CoW flow; it just proves you can create a snapshot with DELETED 
entries for nonexistent files. @jdockerty's question about `removed_data_files` 
validation (line 165) is relevant here. Your response ("already validated when 
originally committed") is correct for partition checking, but misses the 
concurrency concern.
   
   **Java (`BaseRowDelta.validate`, lines 132-174):** Java's validation 
includes `validateDataFilesExist()` which checks that files being removed 
haven't already been removed by a concurrent commit. This is a concurrency 
safety check, not a partition validation. The starting snapshot ancestry check 
(`SnapshotUtil.isAncestorOf`) is also more nuanced than the PR's strict 
equality check at `row_delta.rs:131-142`.
   
   ---
   
   ## Design Concerns
   
   ### 6. `add_delete_files` stub silently drops files
   
   `row_delta.rs:98-103`: The public method accepts delete files and stores 
them, but `commit()` never writes them to any manifest. If called, the delete 
files vanish silently.
   
   **Suggestion:** Either remove the method entirely (add it when MoR is 
implemented), or have `commit()` return an error if `added_delete_files` is 
non-empty. Failing loudly is better than failing silently. This was partially 
discussed by @jdockerty (line 195) and you acknowledged it's deferred, but the 
current code is a trap for callers.
   
   ### 7. `write_delete_manifest` naming is confusing
   
   `snapshot.rs:322-338`: In Iceberg, a "delete manifest" is a manifest with 
`content=Deletes` containing delete files (MoR). This method writes a data 
manifest (`ManifestContentType::Data`) with entries having 
`ManifestStatus::Deleted`. These are different concepts per the spec:
   
   > **Manifest Lists section:** A manifest list includes `content` field: `0: 
data`, `1: deletes`
   
   The method name conflates "manifest containing DELETED-status data file 
entries" with "manifest containing delete files." Something like 
`write_manifest_with_deleted_entries` would be clearer.
   
   ---
   
   ## Nits
   
   ### 8. Comments are verbose relative to codebase style
   
   The struct-level doc comment on `RowDeltaAction` is ~20 lines explaining 
CoW/MoR strategy with step-by-step numbered lists. Compare with 
`FastAppendAction` in `append.rs:33` which has a single line: `FastAppendAction 
is a transaction action for fast append data files to the table.` The codebase 
convention is brief doc comments; spec-level explanations belong in the spec, 
not inline. A one-liner with a link to the relevant spec section would be more 
consistent.
   
   Similarly, method comments like `add_data_files` have multi-line 
explanations (`Used for: New rows from INSERT operations, Rewritten data files 
in COW mode...`) where the existing pattern (e.g., `append.rs:59`: `Add data 
files to the snapshot.`) is a single sentence.
   
   ### 9. Error message references "fast append"
   
   `row_delta.rs:165` calls `snapshot_producer.validate_added_data_files()` 
which uses the error message `"Only data content type is allowed for fast 
append"` (`snapshot.rs:148`). This is misleading when triggered from a RowDelta 
commit. The message is in shared code, so fixing it is outside this PR's scope, 
but worth noting.
   
   ### 10. `existing_manifest()` loads every manifest, O(manifests × entries)
   
   `row_delta.rs:226-244`: Every manifest is loaded and scanned to check if it 
contains a deleted file. Java uses manifest-level metadata (partition bounds, 
file counts in the manifest summary) to skip manifests that can't possibly 
contain the deleted files. Acceptable for a first pass, but worth a TODO for 
large tables where this becomes expensive.
   
   ---
   
   ## Already Addressed (from prior reviews)
   
   These were raised by @jdockerty/@DAlperin and resolved by the author:
   - COW acronym spelled out
   - Error assertion uses `ErrorKind::DataInvalid` instead of string matching
   - `Operation::Delete` explicitly documented as deferred for MoR
   - UUID v7 choice explained (time-ordered for debuggability)
   - Error message typo fixed
   
   ---
   
   ## Summary
   
   The core structural issue is that the PR doesn't implement manifest 
rewriting, which is fundamental to how Iceberg handles file deletion. The 
manifest carry-forward and sequence number issues are coupled: Java solves both 
in `ManifestFilterManager.filterManifestWithDeletedFiles()` which rewrites the 
manifest with correct statuses and preserved sequence numbers. This is the main 
piece of work needed before this can merge.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(transaction): add RowDelta transaction action for row-level modi… [iceberg-rust]

Reply via email to