This is an automated email from the ASF dual-hosted git repository.

liurenjie1024 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg-rust.git


The following commit(s) were added to refs/heads/main by this push:
     new ff408ff3 refactor: Remove redundant parameters from SnapshotProducer 
validation methods (#1853)
ff408ff3 is described below

commit ff408ff3e0dfff24a6e6d39090dbd145afecf2f6
Author: Li0k <[email protected]>
AuthorDate: Thu Nov 13 18:36:57 2025 +0800

    refactor: Remove redundant parameters from SnapshotProducer validation 
methods (#1853)
    
    ## Which issue does this PR close?
    
    
    - Closes #.
    
    ## What changes are included in this PR?
    
    
    
    ## Summary
    Refactor `SnapshotProducer` validation methods to use internal state
    instead of requiring redundant parameters.
    
    ## Problem
    I've noticed that while the current **SnapshotProducer** API design
    already equips SnapshotProducer with all necessary state, the current
    invocations still redundantly pass parameters externally. I believe this
    could lead to some issues.
    
    1. **Data mismatch risk**: Callers could pass different data than what's
    stored in `SnapshotProducer`, leading to validating one set of files but
    committing another
    2. **API complexity**: As more validations are added (e.g., delete
    files, file existence checks), each method would require additional
    parameters, making the API harder to use
    3. **Redundant passing**: The same data that was already provided during
    construction has to be passed again
    
    ## Changes
    - Modified `validate_added_data_files()` and
    `validate_duplicate_files()` to operate on `self.added_data_files`
    directly
    - Updated `FastAppendAction::commit()` to call validation methods
    without passing `added_data_files` parameter
    
    ## Motivation
    Previously, `added_data_files` was passed as a parameter to validation
    methods even though it was already stored in `SnapshotProducer`:
    
    ```rust
    // Before
    snapshot_producer.validate_added_data_files(&self.added_data_files)?;
    
    // After
    snapshot_producer.validate_added_data_files()?;
    ```
    
    ## Benefits
    1. Better encapsulation - validation operates on object's own state
    2. Safer API - eliminates possibility of data mismatch
    3. Simpler interface - no redundant parameters needed
    
    ## Discussion
    
    Since **SnapshotProducer** already holds all necessary state, can we
    further refine validation by performing it during the **new** function's
    execution to improve data consistency and encapsulation?
    
    
    ## Are these changes tested?
---
 crates/iceberg/src/transaction/append.rs   |  6 ++----
 crates/iceberg/src/transaction/snapshot.rs | 12 +++++-------
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/crates/iceberg/src/transaction/append.rs 
b/crates/iceberg/src/transaction/append.rs
index f248543d..08d40324 100644
--- a/crates/iceberg/src/transaction/append.rs
+++ b/crates/iceberg/src/transaction/append.rs
@@ -93,13 +93,11 @@ impl TransactionAction for FastAppendAction {
         );
 
         // validate added files
-        snapshot_producer.validate_added_data_files(&self.added_data_files)?;
+        snapshot_producer.validate_added_data_files()?;
 
         // Checks duplicate files
         if self.check_duplicate {
-            snapshot_producer
-                .validate_duplicate_files(&self.added_data_files)
-                .await?;
+            snapshot_producer.validate_duplicate_files().await?;
         }
 
         snapshot_producer
diff --git a/crates/iceberg/src/transaction/snapshot.rs 
b/crates/iceberg/src/transaction/snapshot.rs
index 4f85962f..6b3d0e4f 100644
--- a/crates/iceberg/src/transaction/snapshot.rs
+++ b/crates/iceberg/src/transaction/snapshot.rs
@@ -99,8 +99,8 @@ impl<'a> SnapshotProducer<'a> {
         }
     }
 
-    pub(crate) fn validate_added_data_files(&self, added_data_files: 
&[DataFile]) -> Result<()> {
-        for data_file in added_data_files {
+    pub(crate) fn validate_added_data_files(&self) -> Result<()> {
+        for data_file in &self.added_data_files {
             if data_file.content_type() != crate::spec::DataContentType::Data {
                 return Err(Error::new(
                     ErrorKind::DataInvalid,
@@ -123,11 +123,9 @@ impl<'a> SnapshotProducer<'a> {
         Ok(())
     }
 
-    pub(crate) async fn validate_duplicate_files(
-        &self,
-        added_data_files: &[DataFile],
-    ) -> Result<()> {
-        let new_files: HashSet<&str> = added_data_files
+    pub(crate) async fn validate_duplicate_files(&self) -> Result<()> {
+        let new_files: HashSet<&str> = self
+            .added_data_files
             .iter()
             .map(|df| df.file_path.as_str())
             .collect();

Reply via email to