Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

Elias Djurfeldt Tue, 24 Mar 2020 02:03:08 -0700

Awesome job! I'm very interested in the cross-language support.

Cheers,


On Tue, 24 Mar 2020 at 01:20, Chamikara Jayalath <chamik...@google.com>
wrote:

> Sounds great. Looks like operation of the Snowflake source will be similar
> to BigQuery source (export files to GCS and read files). This will allow
> you to better parallelize reading (current JDBC source is limited to one
> worker when reading).
>
> Seems like you already support initial splitting using files -
> https://github.com/PolideaInternal/beam/blob/snowflake-io/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java#L374
> Prob. also consider supporting dynamic work rebalancing when runners
> support this through SDF.
>
> Thanks,
> Cham
>
>
>
>
> On Mon, Mar 23, 2020 at 9:49 AM Alexey Romanenko <aromanenko....@gmail.com>
> wrote:
>
>> Great! This is always welcomed to have more IOs in Beam. I’d be happy to
>> take look on your PR once it will be created.
>>
>> Just a couple of questions for now.
>>
>> 1) Afaik, you can connect to Snowflake using standard JDBC driver. Do you
>> plan to compare a performance between this SnowflakeIO and Beam JdbcIO?
>> 2) Are you going to support staging in other locations, like S3 and Azure?
>> 3) Does “ withSchema()” allows to infer Snowflake schema to Beam schema?
>>
>> On 23 Mar 2020, at 15:23, Katarzyna Kucharczyk <ka.kucharc...@gmail.com>
>> wrote:
>>
>> Hi all,
>>
>> Me and my colleagues have developed a new Java connector for Snowflake
>> that we would like to add to Beam.
>>
>> Snowflake is an analytic data warehouse provided as Software-as-a-Service
>> (SaaS). It uses a new SQL database engine with a unique architecture
>> designed for the cloud. To read more details please check [1] and [2].
>>
>> Proposed Snowflake IOs use JDBC Snowflake library [3]. The IOs are batch
>> write and batch read that use the Snowflake COPY [4] operation underneath.
>> In both cases ParDo IOs load files on a stage and then they are inserted
>> into the Snowflake table of choice using the COPY API. The currently
>> supported stage is Google Cloud Storage[5].
>>
>> The schema how Snowflake Read IO works (write operation works similarly
>> but in opposite direction):
>> Here is an Apache Beam fork [6] with current work of the Snowflake IO.
>>
>> In the near future we would like to also add IO for writing streams which
>> will use SnowPipe - Snowflake mechanism for continuous loading[7]. Also, we
>> would like to use cross language to provide Python connectors as well.
>>
>> We are open for all opinions and suggestions. In case of any
>> questions/comments please do not hesitate to post them.
>>
>> In case of no objection I will create jira tickets and share them in this
>> thread. Cheers, Kasia
>>
>> [1] https://www.snowflake.com
>> [2] https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html
>>
>> [3] https://docs.snowflake.net/manuals/user-guide/jdbc.html
>> [4] https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
>> [5]
>> https://github.com/PolideaInternal/beam/tree/snowflake-io/sdks/java/io/snowflake
>>
>> [6] https://cloud.google.com/storage
>> [7] https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe.html
>>
>>
>>
>>

-- 
Elias Djurfeldt
Mirado Consulting

Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

Reply via email to