[I] How to provide Iceberg partitions metadata to upper-layer data platform services? [iceberg]

via GitHub Wed, 08 Jan 2025 03:56:44 -0800


KnightChess opened a new issue, #11926:
URL: https://github.com/apache/iceberg/issues/11926


   ### Query engine
   
   - spark
   - hive metaStore
   
   ### Question
   
   Dear Iceberg Team,
   When using hive as the catalog type, Iceberg does not sync partition 
metadata to the Hive Metastore, and even the partition schema is not 
synchronized to the Hive Metastore. However, the upper-layer data platform 
services rely on partition metadata and partition schema information in the 
Hive Metastore. What are the best practices for handling such a scenario?
   There are some possible approaches here. 
   - The first is to develop support for synchronizing incremental partitions 
to the Hive Metastore after each data write is completed. This might also 
require modifications to the Hive Metastore, as Iceberg supports partition 
schema evolution. 
   - The second approach is to create a service similar to the Spark Thrift 
Server, which provides partition metadata through the community-provided SQL 
query, such as xxx.snapshots. 
   - The third option is to develop a metadata management service that offers 
various Iceberg metadata services. 
   Are there any better approaches? Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] How to provide Iceberg partitions metadata to upper-layer data platform services? [iceberg]

Reply via email to