GitHub user vamshipasunuru created a discussion: Presto/Trino support for files 
produced by metadata boostrap

[RFC-12](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi#RFC12:EfficientMigrationofLargeParquetTablestoApacheHudi-NewBootstrapProcess:)
 introduced support for metadata bootstrap, this can be powerful feature to 
adopt Hudi for large non-hudi (eg., Hive) tables without the need to re-write 
the existing files. 

However support for reading this data varies, Spark is fully supported but 
query engines like presto/trino can't query the data. 

We need to add support in Hudi connector to;

1. Understand the index from .hoodie generated during the bootstrap.
<img width="860" height="1520" alt="image" 
src="https://github.com/user-attachments/assets/3aed3b57-ebda-4410-b254-b6203a672cda";
 />

2.  Perform merging of Hudi columns (from generated Skelton parquet files) and 
non-hudi columns (from original parquet files) during the query read time

<img width="1068" height="1420" alt="image" 
src="https://github.com/user-attachments/assets/89634671-bb0d-4409-bdb0-500b78dd3262";
 />


This feature will be also useful for older Hudi versions (0.14).

GitHub link: https://github.com/apache/hudi/discussions/18137

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to