[PR] Spark, Hive: Fix snapshot procedure for tables with Variant columns [iceberg]

via GitHub Mon, 13 Apr 2026 09:44:19 -0700


nssalian opened a new pull request, #15964:
URL: https://github.com/apache/iceberg/pull/15964


   Fixes: https://github.com/apache/iceberg/issues/14123
   
   ## Summary
   
   The `snapshot` procedure fails on Spark tables with Variant columns because 
the Hive catalog stores `LazySimpleSerDe` instead of `ParquetHiveSerDe` for 
these tables. The SerDe-based format detection doesn't recognize it and throws.
   After fixing that, a second failure occurs when running the test provided in 
the issue, `HiveSchemaUtil.convertToTypeString` which has no case for VARIANT.
   
   ## Changes
   This adds a `resolveFileFormat` helper that falls back to `table.provider()` 
when the SerDe doesn't match a known format, and maps VARIANT to "unknown" in 
the Hive schema conversion, following the conversation here: 
https://github.com/apache/iceberg/pull/15228
   
   Made changes in Spark v4.0 and v4.1
   
   ## Test plan
   - Expanded the test in the issue to include both partitioned and 
unpartitioned tables in both Spark 4.0 and 4.1 - skipped for hive and spark 
catalog until hive 4 lands
   - Added a unit test for the `HiveSchemaUtil` conversion check


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Spark, Hive: Fix snapshot procedure for tables with Variant columns [iceberg]

Reply via email to