Awesome job! I'm very interested in the cross-language support. Cheers,
On Tue, 24 Mar 2020 at 01:20, Chamikara Jayalath <chamik...@google.com> wrote: > Sounds great. Looks like operation of the Snowflake source will be similar > to BigQuery source (export files to GCS and read files). This will allow > you to better parallelize reading (current JDBC source is limited to one > worker when reading). > > Seems like you already support initial splitting using files - > https://github.com/PolideaInternal/beam/blob/snowflake-io/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java#L374 > Prob. also consider supporting dynamic work rebalancing when runners > support this through SDF. > > Thanks, > Cham > > > > > On Mon, Mar 23, 2020 at 9:49 AM Alexey Romanenko <aromanenko....@gmail.com> > wrote: > >> Great! This is always welcomed to have more IOs in Beam. I’d be happy to >> take look on your PR once it will be created. >> >> Just a couple of questions for now. >> >> 1) Afaik, you can connect to Snowflake using standard JDBC driver. Do you >> plan to compare a performance between this SnowflakeIO and Beam JdbcIO? >> 2) Are you going to support staging in other locations, like S3 and Azure? >> 3) Does “ withSchema()” allows to infer Snowflake schema to Beam schema? >> >> On 23 Mar 2020, at 15:23, Katarzyna Kucharczyk <ka.kucharc...@gmail.com> >> wrote: >> >> Hi all, >> >> Me and my colleagues have developed a new Java connector for Snowflake >> that we would like to add to Beam. >> >> Snowflake is an analytic data warehouse provided as Software-as-a-Service >> (SaaS). It uses a new SQL database engine with a unique architecture >> designed for the cloud. To read more details please check [1] and [2]. >> >> Proposed Snowflake IOs use JDBC Snowflake library [3]. The IOs are batch >> write and batch read that use the Snowflake COPY [4] operation underneath. >> In both cases ParDo IOs load files on a stage and then they are inserted >> into the Snowflake table of choice using the COPY API. The currently >> supported stage is Google Cloud Storage[5]. >> >> The schema how Snowflake Read IO works (write operation works similarly >> but in opposite direction): >> Here is an Apache Beam fork [6] with current work of the Snowflake IO. >> >> In the near future we would like to also add IO for writing streams which >> will use SnowPipe - Snowflake mechanism for continuous loading[7]. Also, we >> would like to use cross language to provide Python connectors as well. >> >> We are open for all opinions and suggestions. In case of any >> questions/comments please do not hesitate to post them. >> >> In case of no objection I will create jira tickets and share them in this >> thread. Cheers, Kasia >> >> [1] https://www.snowflake.com >> [2] https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html >> >> [3] https://docs.snowflake.net/manuals/user-guide/jdbc.html >> [4] https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html >> [5] >> https://github.com/PolideaInternal/beam/tree/snowflake-io/sdks/java/io/snowflake >> >> [6] https://cloud.google.com/storage >> [7] https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe.html >> >> >> >> -- Elias Djurfeldt Mirado Consulting