zheliu2 opened a new pull request, #15567:
URL: https://github.com/apache/iceberg/pull/15567

   ## Summary
   
   When migrating a Hive table to Iceberg using `CALL 
spark_catalog.system.migrate()`, partition values of 
`__HIVE_DEFAULT_PARTITION__` (Hive's sentinel for NULL) were not being 
converted back to null in the Iceberg partition metadata. This caused `WHERE 
column IS NULL` queries on the migrated table to return 0 rows, while the total 
row count remained correct.
   
   The fix adds explicit `__HIVE_DEFAULT_PARTITION__` → null conversion in 
`TableMigrationUtil.listPartition()` when extracting partition values from the 
Hive partition spec map.
   
   While `Conversions.fromPartitionString()` already handles this sentinel 
value, the conversion was not consistently applied in the migration path where 
partition values are collected as raw strings from the Hive metastore partition 
spec. Adding the conversion at the point of extraction ensures null partition 
semantics are preserved regardless of how downstream code processes the values.
   
   Fixes #15332
   
   ## Changes
   
   - **`data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java`**: 
Added `__HIVE_DEFAULT_PARTITION__` → null conversion when extracting partition 
values from the Hive partition spec map in `listPartition()`.
   - 
**`data/src/test/java/org/apache/iceberg/data/TestTableMigrationUtil.java`**: 
Added unit test verifying that `__HIVE_DEFAULT_PARTITION__` partition values 
are converted to null.
   - 
**`spark/v{3.4,3.5,4.0,4.1}/spark-extensions/.../TestMigrateTableProcedure.java`**:
 Added integration test across all Spark versions that creates a partitioned 
Hive table with a null partition value, migrates it to Iceberg, and verifies 
`IS NULL` / `IS NOT NULL` queries return correct results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to