nssalian opened a new pull request, #15964: URL: https://github.com/apache/iceberg/pull/15964
Fixes: https://github.com/apache/iceberg/issues/14123 ## Summary The `snapshot` procedure fails on Spark tables with Variant columns because the Hive catalog stores `LazySimpleSerDe` instead of `ParquetHiveSerDe` for these tables. The SerDe-based format detection doesn't recognize it and throws. After fixing that, a second failure occurs when running the test provided in the issue, `HiveSchemaUtil.convertToTypeString` which has no case for VARIANT. ## Changes This adds a `resolveFileFormat` helper that falls back to `table.provider()` when the SerDe doesn't match a known format, and maps VARIANT to "unknown" in the Hive schema conversion, following the conversation here: https://github.com/apache/iceberg/pull/15228 Made changes in Spark v4.0 and v4.1 ## Test plan - Expanded the test in the issue to include both partitioned and unpartitioned tables in both Spark 4.0 and 4.1 - skipped for hive and spark catalog until hive 4 lands - Added a unit test for the `HiveSchemaUtil` conversion check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
