Great! This is always welcomed to have more IOs in Beam. I’d be happy to take 
look on your PR once it will be created.

Just a couple of questions for now.

1) Afaik, you can connect to Snowflake using standard JDBC driver. Do you plan 
to compare a performance between this SnowflakeIO and Beam JdbcIO?
2) Are you going to support staging in other locations, like S3 and Azure?
3) Does “ withSchema()” allows to infer Snowflake schema to Beam schema?

> On 23 Mar 2020, at 15:23, Katarzyna Kucharczyk <ka.kucharc...@gmail.com> 
> wrote:
> 
> Hi all,
> 
> Me and my colleagues have developed a new Java connector for Snowflake that 
> we would like to add to Beam.
> 
> Snowflake is an analytic data warehouse provided as Software-as-a-Service 
> (SaaS). It uses a new SQL database engine with a unique architecture designed 
> for the cloud. To read more details please check [1] and [2].
> 
> Proposed Snowflake IOs use JDBC Snowflake library [3]. The IOs are batch 
> write and batch read that use the Snowflake COPY [4] operation underneath. In 
> both cases ParDo IOs load files on a stage and then they are inserted into 
> the Snowflake table of choice using the COPY API. The currently supported 
> stage is Google Cloud Storage[5].
> 
> The schema how Snowflake Read IO works (write operation works similarly but 
> in opposite direction):
> 
> 
> 
> Here is an Apache Beam fork [6] with current work of the Snowflake IO.
> 
> In the near future we would like to also add IO for writing streams which 
> will use SnowPipe - Snowflake mechanism for continuous loading[7]. Also, we 
> would like to use cross language to provide Python connectors as well.
> 
> We are open for all opinions and suggestions. In case of any 
> questions/comments please do not hesitate to post them.
> 
> In case of no objection I will create jira tickets and share them in this 
> thread.
> 
> Cheers,
> Kasia
> 
> [1] https://www.snowflake.com <https://www.snowflake.com/> 
> [2] https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html 
> <https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html> 
> [3] https://docs.snowflake.net/manuals/user-guide/jdbc.html 
> <https://docs.snowflake.net/manuals/user-guide/jdbc.html> 
> [4] https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html 
> <https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html> 
> [5] 
> https://github.com/PolideaInternal/beam/tree/snowflake-io/sdks/java/io/snowflake
>  
> <https://github.com/PolideaInternal/beam/tree/snowflake-io/sdks/java/io/snowflake>
>  
> [6] https://cloud.google.com/storage <https://cloud.google.com/storage> 
> [7] https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe.html 
> <https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe.html> 
> 

Reply via email to