[ 
https://issues.apache.org/jira/browse/HUDI-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Govindarajan updated HUDI-2832:
--------------------------------------
    Description: Snowflake is a fully managed service that’s simple to use but 
can power a near-unlimited number of concurrent workloads. Snowflake is a 
solution for data warehousing, data lakes, data engineering, data science, data 
application development, and securely sharing and consuming shared data. 
Snowflake [doesn’t 
support|https://docs.snowflake.com/en/sql-reference/sql/alter-file-format.html] 
Apache Hudi file format yet, but it has support for Parquet, ORC, and Delta 
file format. This proposal is to implement a SnowflakeSync similar to HiveSync 
to sync the Hudi table as the Snowflake External Parquet table so that users 
can query the Hudi tables using Snowflake. Many users have expressed interest 
in Hudi and other support channels asking to integrate Hudi with Snowflake, 
this will unlock new use cases for Hudi.  (was: BigQuery is Google Cloud's 
fully managed, petabyte-scale, and cost-effective analytics data warehouse that 
lets you run analytics over vast amounts of data in near real-time. BigQuery 
currently [doesn’t 
support|https://cloud.google.com/bigquery/external-data-cloud-storage] Apache 
Hudi file format, but it has support for the Parquet file format. The proposal 
is to implement a BigQuerySync similar to HiveSync to sync the Hudi table as 
the BigQuery External Parquet table so that users can query the Hudi tables 
using BigQuery. Uber is already syncing some of its Hudi tables to BigQuery 
data mart this will help them to write, sync, and query.

 

More details are in RFC-34: 
[https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=188745980])

> [Umbrella] [RFC-40] Implement SnowflakeSyncTool for Hudi to Snowflake 
> Integration
> ---------------------------------------------------------------------------------
>
>                 Key: HUDI-2832
>                 URL: https://issues.apache.org/jira/browse/HUDI-2832
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Common Core
>            Reporter: Vinoth Govindarajan
>            Assignee: Vinoth Govindarajan
>            Priority: Major
>              Labels: BigQuery, Integration
>             Fix For: 0.11.0
>
>
> Snowflake is a fully managed service that’s simple to use but can power a 
> near-unlimited number of concurrent workloads. Snowflake is a solution for 
> data warehousing, data lakes, data engineering, data science, data 
> application development, and securely sharing and consuming shared data. 
> Snowflake [doesn’t 
> support|https://docs.snowflake.com/en/sql-reference/sql/alter-file-format.html]
>  Apache Hudi file format yet, but it has support for Parquet, ORC, and Delta 
> file format. This proposal is to implement a SnowflakeSync similar to 
> HiveSync to sync the Hudi table as the Snowflake External Parquet table so 
> that users can query the Hudi tables using Snowflake. Many users have 
> expressed interest in Hudi and other support channels asking to integrate 
> Hudi with Snowflake, this will unlock new use cases for Hudi.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to