I have created an umbrella JIRA to track this story: https://issues.apache.org/jira/browse/HUDI-2687 Please also join #trino-hudi-connector channel in Hudi Slack for more discussion.
Regards, Sagar On Thu, Oct 21, 2021 at 5:38 PM sagar sumit <[email protected]> wrote: > This patch supports snapshot queries on MOR table: > https://github.com/trinodb/trino/pull/9641 > That works with the existing hive connector. > > Right now, I have only prototyped snapshot queries on COW table with the > new hudi connector in https://github.com/codope/trino/tree/hudi-plugin > I will be working on supporting the MOR table as well. > > Regards, > Sagar > > On Wed, Oct 20, 2021 at 4:48 PM Jian Feng <[email protected]> wrote: > >> When can Trino support snapshot queries on the Merge-on-read table? >> >> On Mon, Oct 18, 2021 at 9:06 PM 周康 <[email protected]> wrote: >> >> > +1 i have send a message on trino slack, really appreciate for the new >> > trino plugin/connector. >> > https://trinodb.slack.com/archives/CP1MUNEUX/p1623838591370200 >> > >> > looking forward to the RFC and more discussion >> > >> > On 2021/10/17 06:06:09 sagar sumit wrote: >> > > Dear Hudi Community, >> > > >> > > I would like to propose the development of a new Trino >> plugin/connector >> > for >> > > Hudi. >> > > >> > > Today, Hudi supports snapshot queries on Copy-On-Write (COW) tables >> and >> > > read-optimized queries on Merge-On-Read tables with Trino, through the >> > > input format based integration in the Hive connector [1 >> > > <https://github.com/prestodb/presto/commits?author=vinothchandar>]. >> This >> > > approach has known performance limitations with very large tables, >> which >> > > has been since fixed on PrestoDB [2 >> > > <https://prestodb.io/blog/2020/08/04/prestodb-and-hudi>]. We are >> > working on >> > > replicating the same fixes on Trino as well [3 >> > > <https://github.com/trinodb/trino/pull/9641>]. >> > > >> > > However, as Hudi keeps getting better, a new plugin to provide access >> to >> > > Hudi data and metadata will help in unlocking those capabilities for >> the >> > > Trino users. Just to name a few benefits, metadata-based listing, full >> > > schema evolution, etc [4 >> > > < >> > >> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution >> > >]. >> > > Moreover, a separate Hudi connector would allow its independent >> evolution >> > > without having to worry about hacking/breaking the Hive connector. >> > > >> > > A separate connector also falls in line with our vision [5 >> > > < >> > >> https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver >> > >] >> > > when we think of a standalone timeline server or a lake cache to >> balance >> > > the tradeoff between writing and querying. Imagine users having read >> and >> > > write access to data and metadata in Hudi directly through Trino. >> > > >> > > I did some prototyping to get the snapshot queries on a Hudi COW table >> > > working with a new plugin [6 >> > > <https://github.com/codope/trino/tree/hudi-plugin>], and I feel the >> > effort >> > > is worth it. High-level approach is to implement the connector SPI [7 >> > > <https://trino.io/docs/current/develop/connectors.html>] provided by >> > Trino >> > > such as: >> > > a) HudiMetadata implements ConnectorMetadata to fetch table metadata. >> > > b) HudiSplit and HudiSplitManager implement ConnectorSplit and >> > > ConnectorSplitManager to produce logical units of data partitioning, >> so >> > > that Trino can parallelize reads and writes. >> > > >> > > Let me know your thoughts on the proposal. I can draft an RFC for the >> > > detailed design discussion once we have consensus. >> > > >> > > Regards, >> > > Sagar >> > > >> > > References: >> > > [1] https://github.com/prestodb/presto/commits?author=vinothchandar >> > > [2] https://prestodb.io/blog/2020/08/04/prestodb-and-hudi >> > > [3] https://github.com/trinodb/trino/pull/9641 >> > > [4] >> > > >> > >> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution >> > > [5] >> > > >> > >> https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver >> > > [6] https://github.com/codope/trino/tree/hudi-plugin >> > > [7] https://trino.io/docs/current/develop/connectors.html >> > > >> > >> >> >> -- >> *Jian Feng,冯健* >> Shopee | Engineer | Data Infrastructure >> >
