[PROPOSAL] Snowflake Java Connector for Apache Beam

Katarzyna Kucharczyk Mon, 23 Mar 2020 07:23:47 -0700

Hi all,

Me and my colleagues have developed a new Java connector for Snowflake that
we would like to add to Beam.


Snowflake is an analytic data warehouse provided as Software-as-a-Service
(SaaS). It uses a new SQL database engine with a unique architecture
designed for the cloud. To read more details please check [1] and [2].

Proposed Snowflake IOs use JDBC Snowflake library [3]. The IOs are batch
write and batch read that use the Snowflake COPY [4] operation underneath.
In both cases ParDo IOs load files on a stage and then they are inserted
into the Snowflake table of choice using the COPY API. The currently
supported stage is Google Cloud Storage[5].

The schema how Snowflake Read IO works (write operation works similarly but
in opposite direction):

Here is an Apache Beam fork [6] with current work of the Snowflake IO.

In the near future we would like to also add IO for writing streams which
will use SnowPipe - Snowflake mechanism for continuous loading[7]. Also, we
would like to use cross language to provide Python connectors as well.

We are open for all opinions and suggestions. In case of any
questions/comments please do not hesitate to post them.

In case of no objection I will create jira tickets and share them in this
thread. Cheers, Kasia

[1] https://www.snowflake.com

[2] https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html

[3] https://docs.snowflake.net/manuals/user-guide/jdbc.html

[4] https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html

[5]
https://github.com/PolideaInternal/beam/tree/snowflake-io/sdks/java/io/snowflake


[6] https://cloud.google.com/storage

[7] https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe.html

[PROPOSAL] Snowflake Java Connector for Apache Beam

Reply via email to