GitHub user vamshipasunuru edited a discussion: Presto/Trino support for files produced by metadata boostrap
[RFC-12](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi#RFC12:EfficientMigrationofLargeParquetTablestoApacheHudi-NewBootstrapProcess:) introduced support for metadata bootstrap, this can be powerful feature to adopt Hudi for large non-hudi (eg., Hive) tables without the need to re-write the existing files. However support for reading this data varies, Spark is fully supported but query engines like presto/trino can't query the data. We need to add support in Hudi connector to; 1. Understand the index from .hoodie generated during the bootstrap. <img width="860" height="1520" alt="image" src="https://github.com/user-attachments/assets/3aed3b57-ebda-4410-b254-b6203a672cda" /> 2. Perform merging of Hudi columns (from generated Skelton parquet files) and non-hudi columns (from original parquet files) during the query read time <img width="1068" height="1420" alt="image" src="https://github.com/user-attachments/assets/89634671-bb0d-4409-bdb0-500b78dd3262" /> This feature will be mainly used for 0.14 and 1.2 Hudi and Presto 0.287 version within Uber. GitHub link: https://github.com/apache/hudi/discussions/18137 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
