Re: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.

2022-09-26 Thread Sachin Agarwal via dev
It turns out there was a commit submitted here!
https://github.com/nanhu-lab/beam/commit/d4f5fa4c41602b4696737929dd1bdd5ae2302a65

Related GH issue: https://github.com/apache/beam/issues/23074

On Tue, Aug 30, 2022 at 10:28 AM Sachin Agarwal  wrote:

> I would posit that something is better than nothing - did we ever see that
> generic implementation?
>
> On Tue, Aug 30, 2022 at 10:22 AM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
>
>> Is there enough commonality across Delta, Hudi, Iceberg for this generic
>> solution?  I imagined we'd potentially have individual IOs for each.  A
>> generic one seems possible, but certainly would like to learn more.
>>
>> Also, are others in the community working on connectors for ANY of those
>> Delta Lake, Hudi, or Iceberg IOs?  Would hope for some form of coordination
>> and/or at least awareness between people addressing
>> complementary/overlapping areas.
>>
>> On Mon, Aug 29, 2022 at 4:15 PM Neil Kolban via dev 
>> wrote:
>>
>>> Howdy,
>>> I have a client who would be interested to use this.  Is there a link to
>>> a GitHub repo or other place I can read more?
>>>
>>> Neil  (kol...@google.com)
>>>
>>> On 2022/08/05 07:23:31 张涛 wrote:
>>> >
>>> > Hi, we developed a new IO connector named DataLakeIO, to connect Beam
>>> and data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can
>>> use DataLakeIO to read data from data lake, and write data to data lake. We
>>> did not find data lake IO on
>>> https://beam.apache.org/documentation/io/built-in/, we want to
>>> contribute this new IO connector to Beam, what should we do next? Thank you
>>> very much!
>>>
>>


Re: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.

2022-08-30 Thread Sachin Agarwal via dev
I would posit that something is better than nothing - did we ever see that
generic implementation?

On Tue, Aug 30, 2022 at 10:22 AM Austin Bennett 
wrote:

> Is there enough commonality across Delta, Hudi, Iceberg for this generic
> solution?  I imagined we'd potentially have individual IOs for each.  A
> generic one seems possible, but certainly would like to learn more.
>
> Also, are others in the community working on connectors for ANY of those
> Delta Lake, Hudi, or Iceberg IOs?  Would hope for some form of coordination
> and/or at least awareness between people addressing
> complementary/overlapping areas.
>
> On Mon, Aug 29, 2022 at 4:15 PM Neil Kolban via dev 
> wrote:
>
>> Howdy,
>> I have a client who would be interested to use this.  Is there a link to
>> a GitHub repo or other place I can read more?
>>
>> Neil  (kol...@google.com)
>>
>> On 2022/08/05 07:23:31 张涛 wrote:
>> >
>> > Hi, we developed a new IO connector named DataLakeIO, to connect Beam
>> and data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can
>> use DataLakeIO to read data from data lake, and write data to data lake. We
>> did not find data lake IO on
>> https://beam.apache.org/documentation/io/built-in/, we want to
>> contribute this new IO connector to Beam, what should we do next? Thank you
>> very much!
>>
>


Re: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.

2022-08-30 Thread Austin Bennett
Is there enough commonality across Delta, Hudi, Iceberg for this generic
solution?  I imagined we'd potentially have individual IOs for each.  A
generic one seems possible, but certainly would like to learn more.

Also, are others in the community working on connectors for ANY of those
Delta Lake, Hudi, or Iceberg IOs?  Would hope for some form of coordination
and/or at least awareness between people addressing
complementary/overlapping areas.

On Mon, Aug 29, 2022 at 4:15 PM Neil Kolban via dev 
wrote:

> Howdy,
> I have a client who would be interested to use this.  Is there a link to a
> GitHub repo or other place I can read more?
>
> Neil  (kol...@google.com)
>
> On 2022/08/05 07:23:31 张涛 wrote:
> >
> > Hi, we developed a new IO connector named DataLakeIO, to connect Beam
> and data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can
> use DataLakeIO to read data from data lake, and write data to data lake. We
> did not find data lake IO on
> https://beam.apache.org/documentation/io/built-in/, we want to contribute
> this new IO connector to Beam, what should we do next? Thank you very much!
>


RE: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.

2022-08-29 Thread Neil Kolban via dev
Howdy,
I have a client who would be interested to use this.  Is there a link to a
GitHub repo or other place I can read more?

Neil  (kol...@google.com)

On 2022/08/05 07:23:31 张涛 wrote:
>
> Hi, we developed a new IO connector named DataLakeIO, to connect Beam and
data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can use
DataLakeIO to read data from data lake, and write data to data lake. We did
not find data lake IO on https://beam.apache.org/documentation/io/built-in/,
we want to contribute this new IO connector to Beam, what should we do
next? Thank you very much!


Re: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.

2022-08-05 Thread Sachin Agarwal via dev
This is wonderful to hear -
https://beam.apache.org/contribute/get-started-contributing/#contribute-code
has the process to contribute; we're very much looking forward to seeing
your DataLakeIO!

On Fri, Aug 5, 2022 at 9:02 AM 张涛  wrote:

>
> Hi, we developed a new IO connector named DataLakeIO, to connect Beam and
> data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can use
> DataLakeIO to read data from data lake, and write data to data lake. We did
> not find data lake IO on
> https://beam.apache.org/documentation/io/built-in/, we want to contribute
> this new IO connector to Beam, what should we do next? Thank you very
> much!
>


[idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.

2022-08-05 Thread 张涛

Hi, we developed a new IO connector named DataLakeIO, to connect Beam and data 
lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can use DataLakeIO 
to read data from data lake, and write data to data lake. We did not find data 
lake IO on https://beam.apache.org/documentation/io/built-in/, we want to 
contribute this new IO connector to Beam, what should we do next? Thank you 
very much!