Hi Tao Wu.

I think the apache iceberg community is very consistent in providing the
Iceberg SDK for native languages.  I am very happy to offer my perspective
and help if needed when you try to move this thing forward.

On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote:

> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming
> system that is written in Rust.
>
> Actually, I'm planning to implement an Iceberg Rust SDK so that we can
> have better integration with the existing Iceberg ecosystem. Initially I
> found https://github.com/oliverdaff/iceberg-rs, but it appears the author
> hasn't been active lately. So I'm looking to see if the Iceberg community
> has any consensus on a Rust/C++ SDK (Rust is preferable), and if there is,
> we'd love to contribute. I believe as Iceberg increases its popularity,
> there will eventually be more systems that want such libraries. There could
> have even been some ongoing works without consulting with the community.
>
> Additionally, I think the initial Rust/C++ SDK can only support the
> reader&writer sides of Iceberg. Because there have been plenty of JVM-based
> query engines out there taking charge of data maintenance. We don't have to
> rewrite every corner of Iceberg in Rust. That means less engineering work.
>
> On 2022/06/08 10:16:05 OpenInx wrote:
> > As a cloud-native table format standard for the big-data ecosystem,  I
> > believe supporting multiple languages is the correct direction so that
> > different languages can connect to the apache iceberg table format.
> >
> > But I can also get Kyle's point about lacking enough resources(developers
> > and reviewers ) to accomplish this goal.  In my mind,  Python, Golang,
> C++,
> > Rust , all of them can be regarded as the native language support.  we
> may
> > just need to support the Rust SDK and then all of the other languages can
> > just wrap the Rust SDK to access the table format.
> >
> > Anyway,  we will need to wait for the REST catalog finished before we
> > introduce another languages support , because we can not access the
> iceberg
> > table by invoking the JVM catalog interfaces.
> >
> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> > > There’s also the question of how useful this would be in practice given
> > >> the complexity of using C++ (or Rust etc) within some of the major
> > >> frameworks.
> > >>
> > >
> > > One place this would be useful is for the Arrow's DataSet API [1].  An
> > > option the Arrow community might be open to is hosting parts of the
> code
> > > there (this is what is done for Apache Parquet C++).  This helps shape
> some
> > > of the answers to other questions posed (ORC and Parquet are already
> in the
> > > Repo, it provides a Filesystem interface, etc).  The project doesn't
> > > currently consume Avro, and I think the preferred approach is to make a
> > > clean room Avro parser.  But I agree this is a non-trivial effort to
> get
> > > underway.
> > >
> > > Another area to consider is compatibility testing.  I think before a
> third
> > > officially supported community library is introduced it would be good
> to
> > > have a compatibility framework in place to make sure implementations
> are
> > > all interpreting the specification correctly.  If there isn't already
> an
> > > effort here, I'd like to start contributing something (probably will
> have
> > > bandwidth sometime place in Q3).
> > >
> > > Thanks,
> > > -Micah
> > >
> > >
> > > [1] https://arrow.apache.org/docs/cpp/dataset.html
> > >
> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io>
> wrote:
> > >
> > >> Hi caneGuy,
> > >>
> > >> I personally don’t dislike this idea. I understand the performance
> > >> benefits.
> > >>
> > >> But this would be a huge undertaking for the community. We’d need to
> > >> ensure we had sufficient developer support for reviews (likely one of
> the
> > >> biggest issues), as well as a number of other things. Particularly
> > >> dependencies, package management, etc. We’d also need to scope
> support down
> > >> to specific OS / compilers etc.
> > >>
> > >> We’d also need to be sure we had adequate developer support from a
> wide
> > >> enough range of the community to support the project long term. One
> issue
> > >> in open source is that developers will work on something tangential to
> > >> their project in another repository, but nobody is available to
> maintain it.
> > >>
> > >> There’s also the question of how useful this would be in practice
> given
> > >> the complexity of using C++ (or Rust etc) within some of the major
> > >> frameworks.
> > >>
> > >> Again, I’m not opposed to the idea but just trying to be realistic
> about
> > >> the realities of such an undertaking. It would need full community
> support
> > >> (or at least support from enough community members to be sustainable).
> > >>
> > >> If you wanted to make a design doc, the milestones tab in the Iceberg
> > >> project has some that you might use as reference.
> > >>
> > >> *I highly suggest you come to the next community sync and bring this
> up
> > >> to the community then.*
> > >>
> > >> If you’re not already on the invite list for the monthly community
> sync,
> > >> you can get on it by joining the Google group. You’ll receive incites
> when
> > >> they go out:
> > >> https://groups.google.com/g/iceberg-sync
> > >>
> > >> Looking forward to seeing you at the next community sync.
> > >>
> > >> A design document and/or any prior art would be very helpful as the
> > >> community sync does discuss many topics (possibly there is existing
> C++
> > >> support in StarRocks for Iceberg V1?).
> > >>
> > >> Thank you,
> > >> Kyle Bendickson
> > >> GitHub: kbendick
> > >>
> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> wrote:
> > >>
> > >>> Currently there is no existing effort to develop a C++ package. That
> > >>> being said I think it would be awesome to have one! If anyone is
> willing to
> > >>> start that development effort, I can help with some of the ground
> work to
> > >>> kickstart it.
> > >>>
> > >>> I would say the first step would be for someone to prepare a
> high-level
> > >>> proposal.
> > >>>
> > >>> -Sam
> > >>>
> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com> wrote:
> > >>>
> > >>>> Hi team
> > >>>> I am a dev from StarRocks community, and we have supported iceberg
> v1
> > >>>> format.
> > >>>> We are also planning to support v2 format. If there is a C++
> package,
> > >>>> it will be very convenient for our implementation.
> > >>>> At the same time, other c++ computing engines support v2 format will
> > >>>> also be faster.
> > >>>>
> > >>>> Do we have plans to support c++ version sdk?
> > >>>> --
> > >>>> caneGuy
> > >>>>
> > >>> --
> > >>>
> > >>> Sam Redai <s...@tabular.io>
> > >>>
> > >>> Developer Advocate  |  Tabular <https://tabular.io/>
> > >>>
> > >>> c (267) 226-8606
> > >>>
> > >>
> >
>

Reply via email to