andyguwc opened a new issue, #15610:
URL: https://github.com/apache/iceberg/issues/15610

   ### Apache Iceberg version
   
   1.10.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   # SPJ Broken After Partition Evolution (Iceberg 1.10.0)
   
   ## Summary
   
   Storage-Partitioned Join (SPJ) stops working for MERGE INTO operations after 
a table undergoes partition evolution, even when all data has been rewritten 
into the new partition layout and old snapshots/specs have been fully cleaned 
up. The workaround is to recreate the table from scratch with the desired 
partition spec.
   
   ## Environment
   
   - Iceberg Spark Runtime: `iceberg-spark-runtime-3.5_2.12-1.10.0`
   - Spark: 3.5 (AWS Glue 5.0)
   - Catalog: AWS Glue Data Catalog
   
   ## SPJ Configuration
   
   The following Spark settings are configured and **work correctly for tables 
that were created with their partition spec from the start** (no evolution 
history):
   
   ```
   spark.sql.sources.v2.bucketing.enabled = true
   spark.sql.sources.v2.bucketing.pushPartValues.enabled = true
   spark.sql.sources.v2.bucketing.pushPartKeys.enabled = true
   spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled = true
   spark.sql.requireAllClusterKeysForCoPartition = false
   spark.sql.iceberg.planning.preserve-data-grouping = true
   ```
   
   ## Steps Performed
   
   1. Table originally created with `bucket(4, _ID)`
   2. Evolved partition via `ALTER TABLE DROP PARTITION FIELD` / `ALTER TABLE 
ADD PARTITION FIELD bucket(64, _ID)`
   3. Ran `rewrite_data_files` with `rewrite-all: true` — all data files 
rewritten under new partition layout
   4. Ran `expire_snapshots` with `retain_last: 1` and `older_than_minutes: 1` 
— old snapshots fully removed
   5. Ran `remove_orphan_files` — old data files cleaned up from S3
   6. Verified via `DESCRIBE TABLE EXTENDED`: only `Part 0: bucket(64, _ID)` 
remains (single clean spec)
   7. Staging table created with matching `PARTITIONED BY (bucket(64, _ID))`
   
   ## Expected
   
   MERGE INTO should use SPJ (no Exchange/shuffle between staging and target 
table scans), since both tables share identical `bucket(64, _ID)` partitioning 
with a single spec.
   
   ## Actual
   
   Physical plan shows `Exchange hashpartitioning(_ID, 32)` and 
`ShuffledHashJoin` — full shuffle. Target table `BatchScan` shows `groupedBy=` 
(empty), while staging table correctly shows `groupedBy=_ID_bucket`. SPJ works 
for tables that were created with this partition spec originally (no evolution 
history).
   
   ## Root Cause
   
   Partition evolution metadata (even after cleanup) leaves residual state that 
prevents Spark from recognizing the table's bucket grouping during MERGE 
planning. Related upstream issue: 
[apache/iceberg#13534](https://github.com/apache/iceberg/pull/13534) (closed 
without merge, Sep 2025).
   
   ## Workaround
   
   Recreate the table fresh via `CREATE TABLE ... AS SELECT` with the desired 
partition spec, then swap via `ALTER TABLE RENAME`. The new table has no 
evolution history and SPJ works immediately.
   
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to