andyguwc opened a new issue, #15610: URL: https://github.com/apache/iceberg/issues/15610
### Apache Iceberg version 1.10.0 ### Query engine Spark ### Please describe the bug 🐞 # SPJ Broken After Partition Evolution (Iceberg 1.10.0) ## Summary Storage-Partitioned Join (SPJ) stops working for MERGE INTO operations after a table undergoes partition evolution, even when all data has been rewritten into the new partition layout and old snapshots/specs have been fully cleaned up. The workaround is to recreate the table from scratch with the desired partition spec. ## Environment - Iceberg Spark Runtime: `iceberg-spark-runtime-3.5_2.12-1.10.0` - Spark: 3.5 (AWS Glue 5.0) - Catalog: AWS Glue Data Catalog ## SPJ Configuration The following Spark settings are configured and **work correctly for tables that were created with their partition spec from the start** (no evolution history): ``` spark.sql.sources.v2.bucketing.enabled = true spark.sql.sources.v2.bucketing.pushPartValues.enabled = true spark.sql.sources.v2.bucketing.pushPartKeys.enabled = true spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled = true spark.sql.requireAllClusterKeysForCoPartition = false spark.sql.iceberg.planning.preserve-data-grouping = true ``` ## Steps Performed 1. Table originally created with `bucket(4, _ID)` 2. Evolved partition via `ALTER TABLE DROP PARTITION FIELD` / `ALTER TABLE ADD PARTITION FIELD bucket(64, _ID)` 3. Ran `rewrite_data_files` with `rewrite-all: true` — all data files rewritten under new partition layout 4. Ran `expire_snapshots` with `retain_last: 1` and `older_than_minutes: 1` — old snapshots fully removed 5. Ran `remove_orphan_files` — old data files cleaned up from S3 6. Verified via `DESCRIBE TABLE EXTENDED`: only `Part 0: bucket(64, _ID)` remains (single clean spec) 7. Staging table created with matching `PARTITIONED BY (bucket(64, _ID))` ## Expected MERGE INTO should use SPJ (no Exchange/shuffle between staging and target table scans), since both tables share identical `bucket(64, _ID)` partitioning with a single spec. ## Actual Physical plan shows `Exchange hashpartitioning(_ID, 32)` and `ShuffledHashJoin` — full shuffle. Target table `BatchScan` shows `groupedBy=` (empty), while staging table correctly shows `groupedBy=_ID_bucket`. SPJ works for tables that were created with this partition spec originally (no evolution history). ## Root Cause Partition evolution metadata (even after cleanup) leaves residual state that prevents Spark from recognizing the table's bucket grouping during MERGE planning. Related upstream issue: [apache/iceberg#13534](https://github.com/apache/iceberg/pull/13534) (closed without merge, Sep 2025). ## Workaround Recreate the table fresh via `CREATE TABLE ... AS SELECT` with the desired partition spec, then swap via `ALTER TABLE RENAME`. The new table has no evolution history and SPJ works immediately. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
