Re: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.
This is wonderful to hear - https://beam.apache.org/contribute/get-started-contributing/#contribute-code has the process to contribute; we're very much looking forward to seeing your DataLakeIO! On Fri, Aug 5, 2022 at 9:02 AM 张涛 wrote: > > Hi, we developed a new IO connector named DataLakeIO, to connect Beam and > data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can use > DataLakeIO to read data from data lake, and write data to data lake. We did > not find data lake IO on > https://beam.apache.org/documentation/io/built-in/, we want to contribute > this new IO connector to Beam, what should we do next? Thank you very > much! >
RE: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.
Howdy, I have a client who would be interested to use this. Is there a link to a GitHub repo or other place I can read more? Neil (kol...@google.com) On 2022/08/05 07:23:31 张涛 wrote: > > Hi, we developed a new IO connector named DataLakeIO, to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can use DataLakeIO to read data from data lake, and write data to data lake. We did not find data lake IO on https://beam.apache.org/documentation/io/built-in/, we want to contribute this new IO connector to Beam, what should we do next? Thank you very much!
Re: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.
Is there enough commonality across Delta, Hudi, Iceberg for this generic solution? I imagined we'd potentially have individual IOs for each. A generic one seems possible, but certainly would like to learn more. Also, are others in the community working on connectors for ANY of those Delta Lake, Hudi, or Iceberg IOs? Would hope for some form of coordination and/or at least awareness between people addressing complementary/overlapping areas. On Mon, Aug 29, 2022 at 4:15 PM Neil Kolban via dev wrote: > Howdy, > I have a client who would be interested to use this. Is there a link to a > GitHub repo or other place I can read more? > > Neil (kol...@google.com) > > On 2022/08/05 07:23:31 张涛 wrote: > > > > Hi, we developed a new IO connector named DataLakeIO, to connect Beam > and data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can > use DataLakeIO to read data from data lake, and write data to data lake. We > did not find data lake IO on > https://beam.apache.org/documentation/io/built-in/, we want to contribute > this new IO connector to Beam, what should we do next? Thank you very much! >
Re: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.
I would posit that something is better than nothing - did we ever see that generic implementation? On Tue, Aug 30, 2022 at 10:22 AM Austin Bennett wrote: > Is there enough commonality across Delta, Hudi, Iceberg for this generic > solution? I imagined we'd potentially have individual IOs for each. A > generic one seems possible, but certainly would like to learn more. > > Also, are others in the community working on connectors for ANY of those > Delta Lake, Hudi, or Iceberg IOs? Would hope for some form of coordination > and/or at least awareness between people addressing > complementary/overlapping areas. > > On Mon, Aug 29, 2022 at 4:15 PM Neil Kolban via dev > wrote: > >> Howdy, >> I have a client who would be interested to use this. Is there a link to >> a GitHub repo or other place I can read more? >> >> Neil (kol...@google.com) >> >> On 2022/08/05 07:23:31 张涛 wrote: >> > >> > Hi, we developed a new IO connector named DataLakeIO, to connect Beam >> and data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can >> use DataLakeIO to read data from data lake, and write data to data lake. We >> did not find data lake IO on >> https://beam.apache.org/documentation/io/built-in/, we want to >> contribute this new IO connector to Beam, what should we do next? Thank you >> very much! >> >
Re: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.
It turns out there was a commit submitted here! https://github.com/nanhu-lab/beam/commit/d4f5fa4c41602b4696737929dd1bdd5ae2302a65 Related GH issue: https://github.com/apache/beam/issues/23074 On Tue, Aug 30, 2022 at 10:28 AM Sachin Agarwal wrote: > I would posit that something is better than nothing - did we ever see that > generic implementation? > > On Tue, Aug 30, 2022 at 10:22 AM Austin Bennett < > whatwouldausti...@gmail.com> wrote: > >> Is there enough commonality across Delta, Hudi, Iceberg for this generic >> solution? I imagined we'd potentially have individual IOs for each. A >> generic one seems possible, but certainly would like to learn more. >> >> Also, are others in the community working on connectors for ANY of those >> Delta Lake, Hudi, or Iceberg IOs? Would hope for some form of coordination >> and/or at least awareness between people addressing >> complementary/overlapping areas. >> >> On Mon, Aug 29, 2022 at 4:15 PM Neil Kolban via dev >> wrote: >> >>> Howdy, >>> I have a client who would be interested to use this. Is there a link to >>> a GitHub repo or other place I can read more? >>> >>> Neil (kol...@google.com) >>> >>> On 2022/08/05 07:23:31 张涛 wrote: >>> > >>> > Hi, we developed a new IO connector named DataLakeIO, to connect Beam >>> and data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can >>> use DataLakeIO to read data from data lake, and write data to data lake. We >>> did not find data lake IO on >>> https://beam.apache.org/documentation/io/built-in/, we want to >>> contribute this new IO connector to Beam, what should we do next? Thank you >>> very much! >>> >>