This is an automated email from the ASF dual-hosted git repository.
liurenjie1024 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg-rust.git
The following commit(s) were added to refs/heads/main by this push:
new ff408ff3 refactor: Remove redundant parameters from SnapshotProducer
validation methods (#1853)
ff408ff3 is described below
commit ff408ff3e0dfff24a6e6d39090dbd145afecf2f6
Author: Li0k <[email protected]>
AuthorDate: Thu Nov 13 18:36:57 2025 +0800
refactor: Remove redundant parameters from SnapshotProducer validation
methods (#1853)
## Which issue does this PR close?
- Closes #.
## What changes are included in this PR?
## Summary
Refactor `SnapshotProducer` validation methods to use internal state
instead of requiring redundant parameters.
## Problem
I've noticed that while the current **SnapshotProducer** API design
already equips SnapshotProducer with all necessary state, the current
invocations still redundantly pass parameters externally. I believe this
could lead to some issues.
1. **Data mismatch risk**: Callers could pass different data than what's
stored in `SnapshotProducer`, leading to validating one set of files but
committing another
2. **API complexity**: As more validations are added (e.g., delete
files, file existence checks), each method would require additional
parameters, making the API harder to use
3. **Redundant passing**: The same data that was already provided during
construction has to be passed again
## Changes
- Modified `validate_added_data_files()` and
`validate_duplicate_files()` to operate on `self.added_data_files`
directly
- Updated `FastAppendAction::commit()` to call validation methods
without passing `added_data_files` parameter
## Motivation
Previously, `added_data_files` was passed as a parameter to validation
methods even though it was already stored in `SnapshotProducer`:
```rust
// Before
snapshot_producer.validate_added_data_files(&self.added_data_files)?;
// After
snapshot_producer.validate_added_data_files()?;
```
## Benefits
1. Better encapsulation - validation operates on object's own state
2. Safer API - eliminates possibility of data mismatch
3. Simpler interface - no redundant parameters needed
## Discussion
Since **SnapshotProducer** already holds all necessary state, can we
further refine validation by performing it during the **new** function's
execution to improve data consistency and encapsulation?
## Are these changes tested?
---
crates/iceberg/src/transaction/append.rs | 6 ++----
crates/iceberg/src/transaction/snapshot.rs | 12 +++++-------
2 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/crates/iceberg/src/transaction/append.rs
b/crates/iceberg/src/transaction/append.rs
index f248543d..08d40324 100644
--- a/crates/iceberg/src/transaction/append.rs
+++ b/crates/iceberg/src/transaction/append.rs
@@ -93,13 +93,11 @@ impl TransactionAction for FastAppendAction {
);
// validate added files
- snapshot_producer.validate_added_data_files(&self.added_data_files)?;
+ snapshot_producer.validate_added_data_files()?;
// Checks duplicate files
if self.check_duplicate {
- snapshot_producer
- .validate_duplicate_files(&self.added_data_files)
- .await?;
+ snapshot_producer.validate_duplicate_files().await?;
}
snapshot_producer
diff --git a/crates/iceberg/src/transaction/snapshot.rs
b/crates/iceberg/src/transaction/snapshot.rs
index 4f85962f..6b3d0e4f 100644
--- a/crates/iceberg/src/transaction/snapshot.rs
+++ b/crates/iceberg/src/transaction/snapshot.rs
@@ -99,8 +99,8 @@ impl<'a> SnapshotProducer<'a> {
}
}
- pub(crate) fn validate_added_data_files(&self, added_data_files:
&[DataFile]) -> Result<()> {
- for data_file in added_data_files {
+ pub(crate) fn validate_added_data_files(&self) -> Result<()> {
+ for data_file in &self.added_data_files {
if data_file.content_type() != crate::spec::DataContentType::Data {
return Err(Error::new(
ErrorKind::DataInvalid,
@@ -123,11 +123,9 @@ impl<'a> SnapshotProducer<'a> {
Ok(())
}
- pub(crate) async fn validate_duplicate_files(
- &self,
- added_data_files: &[DataFile],
- ) -> Result<()> {
- let new_files: HashSet<&str> = added_data_files
+ pub(crate) async fn validate_duplicate_files(&self) -> Result<()> {
+ let new_files: HashSet<&str> = self
+ .added_data_files
.iter()
.map(|df| df.file_path.as_str())
.collect();