[ 
https://issues.apache.org/jira/browse/HUDI-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447608#comment-17447608
 ] 

Vinoth Govindarajan commented on HUDI-2438:
-------------------------------------------

Hi [~qiao.xu]! 

After investigation, I've found out that the BigQuery doesn't support any kind 
of manifest file to read the metadata of which file to read, so it's making 
this implementation very difficult, now the only option is to select the set of 
correct parquet files for each and every commit and sync only those files to 
BigQuery GFS so it would break snapshot isolation and other benefits of Hudi 
time-travel.

I'm focusing on other initiatives like dbt and snowflake integration, I'll look 
into this after completing both those initiatives.

 

 

 

> [Umbrella] [RFC-34] Implement BigQuerySyncTool for BigQuery Sync
> ----------------------------------------------------------------
>
>                 Key: HUDI-2438
>                 URL: https://issues.apache.org/jira/browse/HUDI-2438
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Common Core
>            Reporter: Vinoth Govindarajan
>            Assignee: Vinoth Govindarajan
>            Priority: Major
>              Labels: BigQuery, Integration
>             Fix For: 0.11.0
>
>
> BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective 
> analytics data warehouse that lets you run analytics over vast amounts of 
> data in near real-time. BigQuery currently [doesn’t 
> support|https://cloud.google.com/bigquery/external-data-cloud-storage] Apache 
> Hudi file format, but it has support for the Parquet file format. The 
> proposal is to implement a BigQuerySync similar to HiveSync to sync the Hudi 
> table as the BigQuery External Parquet table so that users can query the Hudi 
> tables using BigQuery. Uber is already syncing some of its Hudi tables to 
> BigQuery data mart this will help them to write, sync, and query.
>  
> More details are in RFC-34: 
> [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=188745980]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to