Re: [PR] Spark (3.4,3.5,4.0) : Include snapshotId and branch in SparkTable equals and hashCode [iceberg]

via GitHub Thu, 02 Apr 2026 20:19:38 -0700


anuragmantri commented on code in PR #15840:
URL: https://github.com/apache/iceberg/pull/15840#discussion_r3031202717



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########
@@ -466,15 +467,15 @@ public boolean equals(Object other) {
       return false;
     }
 
-    // use only name in order to correctly invalidate Spark cache
     SparkTable that = (SparkTable) other;
-    return icebergTable.name().equals(that.icebergTable.name());
+    return icebergTable.name().equals(that.icebergTable.name())
+        && Objects.equals(snapshotId, that.snapshotId)
+        && Objects.equals(branch, that.branch);
   }
 
   @Override
   public int hashCode() {
-    // use only name in order to correctly invalidate Spark cache
-    return icebergTable.name().hashCode();
+    return Objects.hash(icebergTable.name(), snapshotId, branch);

Review Comment:
   Same here, should we add table.uuid()? 
   
https://github.com/apache/iceberg/blob/a114e955e4f16d9b5636e675d9c59be92f5ab978/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L309



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########
@@ -466,15 +467,15 @@ public boolean equals(Object other) {
       return false;
     }
 
-    // use only name in order to correctly invalidate Spark cache
     SparkTable that = (SparkTable) other;
-    return icebergTable.name().equals(that.icebergTable.name());
+    return icebergTable.name().equals(that.icebergTable.name())

Review Comment:
   Should we also add table.uuid() like in Spark 4.1 
?https://github.com/apache/iceberg/blob/a114e955e4f16d9b5636e675d9c59be92f5ab978/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L299



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########


Review Comment:
   Now that `branch` is part of `equals()` and not `final` here, are we allowed 
to mutate it in 
[L422](https://github.com/apache/iceberg/pull/15840/changes#diff-fdac223ccf200e4cdb931b23690929130bbb0ddde7ce9bd86e2995e6af3f38ebL442)
 ? 
   
   I see that in 4.1 we made `branch` final, but it looks like the WAP logic 
moved somewhere else.



##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkTable.java:
##########
@@ -56,4 +58,60 @@ public void testTableEquality() throws NoSuchTableException {
     assertThat(table1).as("References must be different").isNotSameAs(table2);
     assertThat(table1).as("Tables must be equivalent").isEqualTo(table2);
   }
+
+  @TestTemplate
+  public void testTableInequalityWithDifferentSnapshots() throws 
NoSuchTableException {
+    sql("INSERT INTO %s VALUES (1, 'a')", tableName);
+    sql("INSERT INTO %s VALUES (2, 'b')", tableName);
+
+    CatalogManager catalogManager = spark.sessionState().catalogManager();
+    TableCatalog catalog = (TableCatalog) catalogManager.catalog(catalogName);
+    Identifier identifier = Identifier.of(tableIdent.namespace().levels(), 
tableIdent.name());
+    SparkTable table = (SparkTable) catalog.loadTable(identifier);
+
+    if (catalog instanceof SparkCatalog) {
+      Table icebergTable = ((SparkCatalog) 
catalog).icebergCatalog().loadTable(tableIdent);
+      long[] snapshotIds = icebergTable.history().stream().mapToLong(h -> 
h.snapshotId()).toArray();

Review Comment:
   Nit: `.mapToLong(h -> h.snapshotId())` --> 
`.mapToLong(HistoryEntry::snapshotId)`
   
   



##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkTable.java:
##########
@@ -56,4 +58,60 @@ public void testTableEquality() throws NoSuchTableException {
     assertThat(table1).as("References must be different").isNotSameAs(table2);
     assertThat(table1).as("Tables must be equivalent").isEqualTo(table2);
   }
+
+  @TestTemplate
+  public void testTableInequalityWithDifferentSnapshots() throws 
NoSuchTableException {
+    sql("INSERT INTO %s VALUES (1, 'a')", tableName);
+    sql("INSERT INTO %s VALUES (2, 'b')", tableName);
+
+    CatalogManager catalogManager = spark.sessionState().catalogManager();
+    TableCatalog catalog = (TableCatalog) catalogManager.catalog(catalogName);
+    Identifier identifier = Identifier.of(tableIdent.namespace().levels(), 
tableIdent.name());
+    SparkTable table = (SparkTable) catalog.loadTable(identifier);
+
+    if (catalog instanceof SparkCatalog) {

Review Comment:
   You can use the validationCatalog directly. 
   ```
   Table icebergTable = validationCatalog.loadTable(tableIdent);                
                                                                       
         long[] snapshotIds =                                                   
                                                                             
             
icebergTable.history().stream().mapToLong(HistoryEntry::snapshotId).toArray();  
                                                                
                                                                                
                                                                             
         SparkTable tableAtSnapshot1 = 
table.copyWithSnapshotId(snapshotIds[0]);                                       
                                      
         SparkTable tableAtSnapshot2 = 
table.copyWithSnapshotId(snapshotIds[1]);       
   ```



##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkTable.java:
##########
@@ -56,4 +58,60 @@ public void testTableEquality() throws NoSuchTableException {
     assertThat(table1).as("References must be different").isNotSameAs(table2);
     assertThat(table1).as("Tables must be equivalent").isEqualTo(table2);
   }
+
+  @TestTemplate
+  public void testTableInequalityWithDifferentSnapshots() throws 
NoSuchTableException {
+    sql("INSERT INTO %s VALUES (1, 'a')", tableName);
+    sql("INSERT INTO %s VALUES (2, 'b')", tableName);
+
+    CatalogManager catalogManager = spark.sessionState().catalogManager();
+    TableCatalog catalog = (TableCatalog) catalogManager.catalog(catalogName);
+    Identifier identifier = Identifier.of(tableIdent.namespace().levels(), 
tableIdent.name());
+    SparkTable table = (SparkTable) catalog.loadTable(identifier);
+
+    if (catalog instanceof SparkCatalog) {

Review Comment:
   Curious: Why do we need `if (catalog instanceof SparkCatalog)` here? Does 
the change here not apply to `SparkSessionCatalog`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark (3.4,3.5,4.0) : Include snapshotId and branch in SparkTable equals and hashCode [iceberg]

Reply via email to