Hi all, Me and my colleagues have developed a new Java connector for Snowflake that we would like to add to Beam.
Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). It uses a new SQL database engine with a unique architecture designed for the cloud. To read more details please check [1] and [2]. Proposed Snowflake IOs use JDBC Snowflake library [3]. The IOs are batch write and batch read that use the Snowflake COPY [4] operation underneath. In both cases ParDo IOs load files on a stage and then they are inserted into the Snowflake table of choice using the COPY API. The currently supported stage is Google Cloud Storage[5]. The schema how Snowflake Read IO works (write operation works similarly but in opposite direction): Here is an Apache Beam fork [6] with current work of the Snowflake IO. In the near future we would like to also add IO for writing streams which will use SnowPipe - Snowflake mechanism for continuous loading[7]. Also, we would like to use cross language to provide Python connectors as well. We are open for all opinions and suggestions. In case of any questions/comments please do not hesitate to post them. In case of no objection I will create jira tickets and share them in this thread. Cheers, Kasia [1] https://www.snowflake.com [2] https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html [3] https://docs.snowflake.net/manuals/user-guide/jdbc.html [4] https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html [5] https://github.com/PolideaInternal/beam/tree/snowflake-io/sdks/java/io/snowflake [6] https://cloud.google.com/storage [7] https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe.html