mbutrovich commented on PR #2203:
URL: https://github.com/apache/iceberg-rust/pull/2203#issuecomment-4270715471
Thanks for the contribution @wirybeaver! The overall structure looks solid.
I had Claude make sense of my notes during reviewing, and most of my
feedback is around how manifests need to be handled when deleting files. The
good news is that the manifest carry-forward and sequence number issues are
closely related, so fixing manifest rewriting solves both at once:
## Blocking Issues
### 1. `existing_manifest()` drops entire manifests containing deleted files
(data loss)
`row_delta.rs:213-247`: When any file in a manifest is being deleted, the
entire manifest is excluded. Non-deleted files in that manifest silently vanish
from the table.
**Spec (Snapshots section):**
> "Manifest files are reused across snapshots to avoid rewriting metadata
that is slow-changing."
The spec expects manifests to be carried forward or rewritten, not dropped.
Per the Scan Planning section:
> "all file paths marked with 'ADDED' or 'EXISTING' may appear at most once
across all manifest files in the snapshot"
so you can't just re-add the surviving files to a different manifest without
removing their entries from the original. The manifest must be rewritten.
**Java (`ManifestFilterManager.filterManifestWithDeletedFiles`, lines
498-575):** When a manifest contains files to delete, Java rewrites the
manifest. Entries being deleted get `writer.delete(entry)` (DELETED status),
surviving entries get `writer.existing(entry)` (EXISTING status). The rewritten
manifest replaces the original in the manifest list. This is the core of
`filterManager.filterManifests()` called in `MergingSnapshotProducer.apply()`
at lines 965-968.
I think the PR needs to implement this rewrite logic. The current
`SnapshotProduceOperation` trait's `existing_manifest()` returns
`Vec<ManifestFile>` which only supports carry-forward or drop. One approach
would be to extend the trait; another would be to do the rewrite within
`existing_manifest()` itself (write new manifest files, return those). Take a
look at how `filterManifestWithDeletedFiles` works in Java, it's the clearest
reference for this.
---
### 2. Sequence numbers on DELETED entries are zero (spec violation)
`row_delta.rs:183-200`: Uses `sequence_number(0)` and
`file_sequence_number(0)` for DELETED entries.
**Spec (Sequence Number Inheritance section):**
> "When writing an existing file to a new manifest or marking an existing
file as deleted, the data and file sequence numbers must be non-null and set to
the original values that were either inherited or provided at the commit time."
And from the Manifest Entry Fields:
> "The `sequence_number` field represents the data sequence number and must
never change after a file is added to the dataset."
Zero is not the original value. To fix this, you'd need to load the original
manifest entry for each removed file and copy its sequence numbers. The TODO at
line 185 shows you're aware of this. I'd suggest not deferring it since it
produces metadata that other Iceberg implementations may reject.
**Java (`ManifestWriter.delete`, lines 176-194):** Java preserves original
sequence numbers explicitly:
```java
public void delete(F deletedFile, long dataSequenceNumber, Long
fileSequenceNumber) {
addEntry(reused.wrapDelete(snapshotId, dataSequenceNumber,
fileSequenceNumber, deletedFile));
}
```
The `dataSequenceNumber` and `fileSequenceNumber` are copied from the
original manifest entry, never set to zero.
Note: this is coupled to the manifest carry-forward issue above. If you
implement manifest rewriting (as Java does), you'll already have the original
manifest entries in hand and can copy the sequence numbers directly. So fixing
the manifest rewrite gives you this almost for free.
---
### 3. Snapshot summary omits deleted-file metrics
`snapshot.rs:356-404`: `SnapshotProducer::summary()` only iterates
`self.added_data_files`. Removed files are never counted.
**Spec (Appendix F: Optional Snapshot Summary Fields):** The spec defines
`deleted-data-files`, `deleted-records`, `removed-files-size`, and
`total-data-files` as standard summary metrics. While technically optional,
these are expected by table maintenance operations (compaction, snapshot
expiration) and other Iceberg implementations.
**Java (`SnapshotSummary.removedFile`, lines 319-345):** Java tracks
`removedFiles`, `deletedRecords`, and `removedSize` whenever a file is deleted.
These are merged into the summary via `filterManager.buildSummary(filtered)` in
`MergingSnapshotProducer.apply()` at lines 1002-1007.
This produces snapshots where `deleted-data-files` is absent (or zero) even
when files are being removed. Downstream consumers (including iceberg-java
based readers) may rely on these for planning, so it's worth wiring up.
---
## Significant Issues
### 4. Tests don't exercise the critical code paths
All tests use `make_v2_minimal_table()` which has no existing snapshots.
This means `existing_manifest()` always returns `[]`, so the manifest
carry-forward bug is never triggered. The broken sequence number path runs but
produces entries with `snapshot_id = None` (no current snapshot), so the `0`
values are never validated against real data.
Compare with `test_fast_append` in `append.rs:263-335`, which loads the
resulting manifest list, verifies entry counts, checks sequence numbers, and
compares data files. I'd suggest the RowDelta tests should at minimum:
- Set up a table with existing data files (via `FastAppendAction`)
- Remove one file via `RowDeltaAction`
- Verify surviving files remain accessible in the new snapshot's manifests
- Verify DELETED entries have correct sequence numbers
### 5. `test_row_delta_remove_only` removes a phantom file
The test deletes a file that was never part of the table. This doesn't
really test the CoW flow; it just proves you can create a snapshot with DELETED
entries for nonexistent files. @jdockerty's question about `removed_data_files`
validation (line 165) is relevant here. Your response ("already validated when
originally committed") is correct for partition checking, but misses the
concurrency concern.
**Java (`BaseRowDelta.validate`, lines 132-174):** Java's validation
includes `validateDataFilesExist()` which checks that files being removed
haven't already been removed by a concurrent commit. This is a concurrency
safety check, not a partition validation. The starting snapshot ancestry check
(`SnapshotUtil.isAncestorOf`) is also more nuanced than the PR's strict
equality check at `row_delta.rs:131-142`.
---
## Design Concerns
### 6. `add_delete_files` stub silently drops files
`row_delta.rs:98-103`: The public method accepts delete files and stores
them, but `commit()` never writes them to any manifest. If called, the delete
files vanish silently.
**Suggestion:** Either remove the method entirely (add it when MoR is
implemented), or have `commit()` return an error if `added_delete_files` is
non-empty. Failing loudly is better than failing silently. This was partially
discussed by @jdockerty (line 195) and you acknowledged it's deferred, but the
current code is a trap for callers.
### 7. `write_delete_manifest` naming is confusing
`snapshot.rs:322-338`: In Iceberg, a "delete manifest" is a manifest with
`content=Deletes` containing delete files (MoR). This method writes a data
manifest (`ManifestContentType::Data`) with entries having
`ManifestStatus::Deleted`. These are different concepts per the spec:
> **Manifest Lists section:** A manifest list includes `content` field: `0:
data`, `1: deletes`
The method name conflates "manifest containing DELETED-status data file
entries" with "manifest containing delete files." Something like
`write_manifest_with_deleted_entries` would be clearer.
---
## Nits
### 8. Comments are verbose relative to codebase style
The struct-level doc comment on `RowDeltaAction` is ~20 lines explaining
CoW/MoR strategy with step-by-step numbered lists. Compare with
`FastAppendAction` in `append.rs:33` which has a single line: `FastAppendAction
is a transaction action for fast append data files to the table.` The codebase
convention is brief doc comments; spec-level explanations belong in the spec,
not inline. A one-liner with a link to the relevant spec section would be more
consistent.
Similarly, method comments like `add_data_files` have multi-line
explanations (`Used for: New rows from INSERT operations, Rewritten data files
in COW mode...`) where the existing pattern (e.g., `append.rs:59`: `Add data
files to the snapshot.`) is a single sentence.
### 9. Error message references "fast append"
`row_delta.rs:165` calls `snapshot_producer.validate_added_data_files()`
which uses the error message `"Only data content type is allowed for fast
append"` (`snapshot.rs:148`). This is misleading when triggered from a RowDelta
commit. The message is in shared code, so fixing it is outside this PR's scope,
but worth noting.
### 10. `existing_manifest()` loads every manifest, O(manifests × entries)
`row_delta.rs:226-244`: Every manifest is loaded and scanned to check if it
contains a deleted file. Java uses manifest-level metadata (partition bounds,
file counts in the manifest summary) to skip manifests that can't possibly
contain the deleted files. Acceptable for a first pass, but worth a TODO for
large tables where this becomes expensive.
---
## Already Addressed (from prior reviews)
These were raised by @jdockerty/@DAlperin and resolved by the author:
- COW acronym spelled out
- Error assertion uses `ErrorKind::DataInvalid` instead of string matching
- `Operation::Delete` explicitly documented as deferred for MoR
- UUID v7 choice explained (time-ordered for debuggability)
- Error message typo fixed
---
## Summary
The core structural issue is that the PR doesn't implement manifest
rewriting, which is fundamental to how Iceberg handles file deletion. The
manifest carry-forward and sequence number issues are coupled: Java solves both
in `ManifestFilterManager.filterManifestWithDeletedFiles()` which rewrites the
manifest with correct statuses and preserved sequence numbers. This is the main
piece of work needed before this can merge.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]