[ https://issues.apache.org/jira/browse/HUDI-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-431: ---------------------------- Sprint: Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-25 (was: Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18) > Support Parquet in MOR log files > -------------------------------- > > Key: HUDI-431 > URL: https://issues.apache.org/jira/browse/HUDI-431 > Project: Apache Hudi > Issue Type: New Feature > Components: storage-management > Reporter: sivabalan narayanan > Assignee: Alexey Kudinkin > Priority: Blocker > Labels: help-requested, pull-request-available > Fix For: 0.11.0 > > Original Estimate: 4h > Remaining Estimate: 4h > > We have a basic implementation of inline filesystem, to read a file format > like Parquet, embedded "inline" into another file. > [https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystem.java] > for sample usage. > This idea here is to see if we can embed parquet/hfile formats into the Hudi > log files, to get columnar reads on the delta log files as well. This helps > us speed up query performance, given the log is row based today. Once Inline > FS is available, enable parquet logging support with HoodieLogFile. LogFile > can expose a writer (essentially ParquetWriter) and users can write records > as though writing to parquet files. Similarly on the read path, a reader > (parquetReader) will be exposed which the user can use to read data out of > it. > This Jira tracks work to implement such parquet inlining into the log format > and have the writer and reader use it. > -- This message was sent by Atlassian Jira (v8.20.1#820001)