+1 for using rust as the backbone for new language bindings

On Sun, Jun 12, 2022 at 23:52 OpenInx <open...@gmail.com> wrote:

> Thanks Kyle for sharing your context.
>
> Recently, I also spent some time practicing my Rust skills.  Generally,
> I'm +1 for adding Rust SDK support for native language.
>
>
> On Mon, Jun 13, 2022 at 12:51 PM Kyle Bendickson <k...@tabular.io> wrote:
>
>> Thanks for starting this discussion.
>>
>> I know I was the first to mention some of my concerns (which I still have
>> and would apply to any new major change), but I also think that this is an
>> avenue that should be explored.
>>
>> Specifically a native integration would have many benefits for read paths
>> (in addition to others). I know that the Rust avro reader is
>> significantly faster, as well as native columnar formats.
>>
>> So while I do have some concerns about making sure we have enough people
>> to support this endeavor, I do want to say I think it's a really good idea.
>> My apologies if I gave the impression otherwise.
>>
>> I would personally be interested in contributing to and reviewing for a
>> native Rust library (or CPP, but I think Rust is a much more elegant
>> language and I'd personally prefer to work in that as it's easier to work
>> with across systems than C++ imo though I would defer to others on that).
>>
>> I would also be happy to offer my help and perspective in moving this
>> forward if need be. But I did want to express my practical concerns so that
>> we don't have an area of the codebase where there aren't enough people to
>> help maintain it etc.
>>
>> But in general I think this is an exciting opportunity, and results have
>> shown time and time again that native readers / writers are much more
>> performant.
>>
>> +1 to using Rust as well (which is a language I know more of than C++
>> these days - though both I'd have to brush off my skillset).
>>
>> Best, Kyle
>>
>> On Sun, Jun 12, 2022 at 8:20 PM OpenInx <open...@gmail.com> wrote:
>>
>>> Hi Tao Wu.
>>>
>>> I think the apache iceberg community is very consistent in providing the
>>> Iceberg SDK for native languages.  I am very happy to offer my perspective
>>> and help if needed when you try to move this thing forward.
>>>
>>> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote:
>>>
>>>> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming
>>>> system that is written in Rust.
>>>>
>>>> Actually, I'm planning to implement an Iceberg Rust SDK so that we can
>>>> have better integration with the existing Iceberg ecosystem. Initially I
>>>> found https://github.com/oliverdaff/iceberg-rs, but it appears the
>>>> author hasn't been active lately. So I'm looking to see if the Iceberg
>>>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and if
>>>> there is, we'd love to contribute. I believe as Iceberg increases its
>>>> popularity, there will eventually be more systems that want such libraries.
>>>> There could have even been some ongoing works without consulting with the
>>>> community.
>>>>
>>>> Additionally, I think the initial Rust/C++ SDK can only support the
>>>> reader&writer sides of Iceberg. Because there have been plenty of JVM-based
>>>> query engines out there taking charge of data maintenance. We don't have to
>>>> rewrite every corner of Iceberg in Rust. That means less engineering work.
>>>>
>>>> On 2022/06/08 10:16:05 OpenInx wrote:
>>>> > As a cloud-native table format standard for the big-data ecosystem,  I
>>>> > believe supporting multiple languages is the correct direction so that
>>>> > different languages can connect to the apache iceberg table format.
>>>> >
>>>> > But I can also get Kyle's point about lacking enough
>>>> resources(developers
>>>> > and reviewers ) to accomplish this goal.  In my mind,  Python,
>>>> Golang, C++,
>>>> > Rust , all of them can be regarded as the native language support.
>>>> we may
>>>> > just need to support the Rust SDK and then all of the other languages
>>>> can
>>>> > just wrap the Rust SDK to access the table format.
>>>> >
>>>> > Anyway,  we will need to wait for the REST catalog finished before we
>>>> > introduce another languages support , because we can not access the
>>>> iceberg
>>>> > table by invoking the JVM catalog interfaces.
>>>> >
>>>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <emkornfi...@gmail.com
>>>> >
>>>> > wrote:
>>>> >
>>>> > > There’s also the question of how useful this would be in practice
>>>> given
>>>> > >> the complexity of using C++ (or Rust etc) within some of the major
>>>> > >> frameworks.
>>>> > >>
>>>> > >
>>>> > > One place this would be useful is for the Arrow's DataSet API [1].
>>>> An
>>>> > > option the Arrow community might be open to is hosting parts of the
>>>> code
>>>> > > there (this is what is done for Apache Parquet C++).  This helps
>>>> shape some
>>>> > > of the answers to other questions posed (ORC and Parquet are
>>>> already in the
>>>> > > Repo, it provides a Filesystem interface, etc).  The project doesn't
>>>> > > currently consume Avro, and I think the preferred approach is to
>>>> make a
>>>> > > clean room Avro parser.  But I agree this is a non-trivial effort
>>>> to get
>>>> > > underway.
>>>> > >
>>>> > > Another area to consider is compatibility testing.  I think before
>>>> a third
>>>> > > officially supported community library is introduced it would be
>>>> good to
>>>> > > have a compatibility framework in place to make sure
>>>> implementations are
>>>> > > all interpreting the specification correctly.  If there isn't
>>>> already an
>>>> > > effort here, I'd like to start contributing something (probably
>>>> will have
>>>> > > bandwidth sometime place in Q3).
>>>> > >
>>>> > > Thanks,
>>>> > > -Micah
>>>> > >
>>>> > >
>>>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html
>>>> > >
>>>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io>
>>>> wrote:
>>>> > >
>>>> > >> Hi caneGuy,
>>>> > >>
>>>> > >> I personally don’t dislike this idea. I understand the performance
>>>> > >> benefits.
>>>> > >>
>>>> > >> But this would be a huge undertaking for the community. We’d need
>>>> to
>>>> > >> ensure we had sufficient developer support for reviews (likely one
>>>> of the
>>>> > >> biggest issues), as well as a number of other things. Particularly
>>>> > >> dependencies, package management, etc. We’d also need to scope
>>>> support down
>>>> > >> to specific OS / compilers etc.
>>>> > >>
>>>> > >> We’d also need to be sure we had adequate developer support from a
>>>> wide
>>>> > >> enough range of the community to support the project long term.
>>>> One issue
>>>> > >> in open source is that developers will work on something
>>>> tangential to
>>>> > >> their project in another repository, but nobody is available to
>>>> maintain it.
>>>> > >>
>>>> > >> There’s also the question of how useful this would be in practice
>>>> given
>>>> > >> the complexity of using C++ (or Rust etc) within some of the major
>>>> > >> frameworks.
>>>> > >>
>>>> > >> Again, I’m not opposed to the idea but just trying to be realistic
>>>> about
>>>> > >> the realities of such an undertaking. It would need full community
>>>> support
>>>> > >> (or at least support from enough community members to be
>>>> sustainable).
>>>> > >>
>>>> > >> If you wanted to make a design doc, the milestones tab in the
>>>> Iceberg
>>>> > >> project has some that you might use as reference.
>>>> > >>
>>>> > >> *I highly suggest you come to the next community sync and bring
>>>> this up
>>>> > >> to the community then.*
>>>> > >>
>>>> > >> If you’re not already on the invite list for the monthly community
>>>> sync,
>>>> > >> you can get on it by joining the Google group. You’ll receive
>>>> incites when
>>>> > >> they go out:
>>>> > >> https://groups.google.com/g/iceberg-sync
>>>> > >>
>>>> > >> Looking forward to seeing you at the next community sync.
>>>> > >>
>>>> > >> A design document and/or any prior art would be very helpful as the
>>>> > >> community sync does discuss many topics (possibly there is
>>>> existing C++
>>>> > >> support in StarRocks for Iceberg V1?).
>>>> > >>
>>>> > >> Thank you,
>>>> > >> Kyle Bendickson
>>>> > >> GitHub: kbendick
>>>> > >>
>>>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> wrote:
>>>> > >>
>>>> > >>> Currently there is no existing effort to develop a C++ package.
>>>> That
>>>> > >>> being said I think it would be awesome to have one! If anyone is
>>>> willing to
>>>> > >>> start that development effort, I can help with some of the ground
>>>> work to
>>>> > >>> kickstart it.
>>>> > >>>
>>>> > >>> I would say the first step would be for someone to prepare a
>>>> high-level
>>>> > >>> proposal.
>>>> > >>>
>>>> > >>> -Sam
>>>> > >>>
>>>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com>
>>>> wrote:
>>>> > >>>
>>>> > >>>> Hi team
>>>> > >>>> I am a dev from StarRocks community, and we have supported
>>>> iceberg v1
>>>> > >>>> format.
>>>> > >>>> We are also planning to support v2 format. If there is a C++
>>>> package,
>>>> > >>>> it will be very convenient for our implementation.
>>>> > >>>> At the same time, other c++ computing engines support v2 format
>>>> will
>>>> > >>>> also be faster.
>>>> > >>>>
>>>> > >>>> Do we have plans to support c++ version sdk?
>>>> > >>>> --
>>>> > >>>> caneGuy
>>>> > >>>>
>>>> > >>> --
>>>> > >>>
>>>> > >>> Sam Redai <s...@tabular.io>
>>>> > >>>
>>>> > >>> Developer Advocate  |  Tabular <https://tabular.io/>
>>>> > >>>
>>>> > >>> c (267) 226-8606
>>>> > >>>
>>>> > >>
>>>> >
>>>>
>>>
>>
>> --
>>
>> Kyle Bendickson
>>
>> OSS Developer  |  Tabular <https://tabular.io/>
>>
>> k...@tabular.io
>>
>

Reply via email to