+1 for using rust as the backbone for new language bindings On Sun, Jun 12, 2022 at 23:52 OpenInx <open...@gmail.com> wrote:
> Thanks Kyle for sharing your context. > > Recently, I also spent some time practicing my Rust skills. Generally, > I'm +1 for adding Rust SDK support for native language. > > > On Mon, Jun 13, 2022 at 12:51 PM Kyle Bendickson <k...@tabular.io> wrote: > >> Thanks for starting this discussion. >> >> I know I was the first to mention some of my concerns (which I still have >> and would apply to any new major change), but I also think that this is an >> avenue that should be explored. >> >> Specifically a native integration would have many benefits for read paths >> (in addition to others). I know that the Rust avro reader is >> significantly faster, as well as native columnar formats. >> >> So while I do have some concerns about making sure we have enough people >> to support this endeavor, I do want to say I think it's a really good idea. >> My apologies if I gave the impression otherwise. >> >> I would personally be interested in contributing to and reviewing for a >> native Rust library (or CPP, but I think Rust is a much more elegant >> language and I'd personally prefer to work in that as it's easier to work >> with across systems than C++ imo though I would defer to others on that). >> >> I would also be happy to offer my help and perspective in moving this >> forward if need be. But I did want to express my practical concerns so that >> we don't have an area of the codebase where there aren't enough people to >> help maintain it etc. >> >> But in general I think this is an exciting opportunity, and results have >> shown time and time again that native readers / writers are much more >> performant. >> >> +1 to using Rust as well (which is a language I know more of than C++ >> these days - though both I'd have to brush off my skillset). >> >> Best, Kyle >> >> On Sun, Jun 12, 2022 at 8:20 PM OpenInx <open...@gmail.com> wrote: >> >>> Hi Tao Wu. >>> >>> I think the apache iceberg community is very consistent in providing the >>> Iceberg SDK for native languages. I am very happy to offer my perspective >>> and help if needed when you try to move this thing forward. >>> >>> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote: >>> >>>> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming >>>> system that is written in Rust. >>>> >>>> Actually, I'm planning to implement an Iceberg Rust SDK so that we can >>>> have better integration with the existing Iceberg ecosystem. Initially I >>>> found https://github.com/oliverdaff/iceberg-rs, but it appears the >>>> author hasn't been active lately. So I'm looking to see if the Iceberg >>>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and if >>>> there is, we'd love to contribute. I believe as Iceberg increases its >>>> popularity, there will eventually be more systems that want such libraries. >>>> There could have even been some ongoing works without consulting with the >>>> community. >>>> >>>> Additionally, I think the initial Rust/C++ SDK can only support the >>>> reader&writer sides of Iceberg. Because there have been plenty of JVM-based >>>> query engines out there taking charge of data maintenance. We don't have to >>>> rewrite every corner of Iceberg in Rust. That means less engineering work. >>>> >>>> On 2022/06/08 10:16:05 OpenInx wrote: >>>> > As a cloud-native table format standard for the big-data ecosystem, I >>>> > believe supporting multiple languages is the correct direction so that >>>> > different languages can connect to the apache iceberg table format. >>>> > >>>> > But I can also get Kyle's point about lacking enough >>>> resources(developers >>>> > and reviewers ) to accomplish this goal. In my mind, Python, >>>> Golang, C++, >>>> > Rust , all of them can be regarded as the native language support. >>>> we may >>>> > just need to support the Rust SDK and then all of the other languages >>>> can >>>> > just wrap the Rust SDK to access the table format. >>>> > >>>> > Anyway, we will need to wait for the REST catalog finished before we >>>> > introduce another languages support , because we can not access the >>>> iceberg >>>> > table by invoking the JVM catalog interfaces. >>>> > >>>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <emkornfi...@gmail.com >>>> > >>>> > wrote: >>>> > >>>> > > There’s also the question of how useful this would be in practice >>>> given >>>> > >> the complexity of using C++ (or Rust etc) within some of the major >>>> > >> frameworks. >>>> > >> >>>> > > >>>> > > One place this would be useful is for the Arrow's DataSet API [1]. >>>> An >>>> > > option the Arrow community might be open to is hosting parts of the >>>> code >>>> > > there (this is what is done for Apache Parquet C++). This helps >>>> shape some >>>> > > of the answers to other questions posed (ORC and Parquet are >>>> already in the >>>> > > Repo, it provides a Filesystem interface, etc). The project doesn't >>>> > > currently consume Avro, and I think the preferred approach is to >>>> make a >>>> > > clean room Avro parser. But I agree this is a non-trivial effort >>>> to get >>>> > > underway. >>>> > > >>>> > > Another area to consider is compatibility testing. I think before >>>> a third >>>> > > officially supported community library is introduced it would be >>>> good to >>>> > > have a compatibility framework in place to make sure >>>> implementations are >>>> > > all interpreting the specification correctly. If there isn't >>>> already an >>>> > > effort here, I'd like to start contributing something (probably >>>> will have >>>> > > bandwidth sometime place in Q3). >>>> > > >>>> > > Thanks, >>>> > > -Micah >>>> > > >>>> > > >>>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html >>>> > > >>>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io> >>>> wrote: >>>> > > >>>> > >> Hi caneGuy, >>>> > >> >>>> > >> I personally don’t dislike this idea. I understand the performance >>>> > >> benefits. >>>> > >> >>>> > >> But this would be a huge undertaking for the community. We’d need >>>> to >>>> > >> ensure we had sufficient developer support for reviews (likely one >>>> of the >>>> > >> biggest issues), as well as a number of other things. Particularly >>>> > >> dependencies, package management, etc. We’d also need to scope >>>> support down >>>> > >> to specific OS / compilers etc. >>>> > >> >>>> > >> We’d also need to be sure we had adequate developer support from a >>>> wide >>>> > >> enough range of the community to support the project long term. >>>> One issue >>>> > >> in open source is that developers will work on something >>>> tangential to >>>> > >> their project in another repository, but nobody is available to >>>> maintain it. >>>> > >> >>>> > >> There’s also the question of how useful this would be in practice >>>> given >>>> > >> the complexity of using C++ (or Rust etc) within some of the major >>>> > >> frameworks. >>>> > >> >>>> > >> Again, I’m not opposed to the idea but just trying to be realistic >>>> about >>>> > >> the realities of such an undertaking. It would need full community >>>> support >>>> > >> (or at least support from enough community members to be >>>> sustainable). >>>> > >> >>>> > >> If you wanted to make a design doc, the milestones tab in the >>>> Iceberg >>>> > >> project has some that you might use as reference. >>>> > >> >>>> > >> *I highly suggest you come to the next community sync and bring >>>> this up >>>> > >> to the community then.* >>>> > >> >>>> > >> If you’re not already on the invite list for the monthly community >>>> sync, >>>> > >> you can get on it by joining the Google group. You’ll receive >>>> incites when >>>> > >> they go out: >>>> > >> https://groups.google.com/g/iceberg-sync >>>> > >> >>>> > >> Looking forward to seeing you at the next community sync. >>>> > >> >>>> > >> A design document and/or any prior art would be very helpful as the >>>> > >> community sync does discuss many topics (possibly there is >>>> existing C++ >>>> > >> support in StarRocks for Iceberg V1?). >>>> > >> >>>> > >> Thank you, >>>> > >> Kyle Bendickson >>>> > >> GitHub: kbendick >>>> > >> >>>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> wrote: >>>> > >> >>>> > >>> Currently there is no existing effort to develop a C++ package. >>>> That >>>> > >>> being said I think it would be awesome to have one! If anyone is >>>> willing to >>>> > >>> start that development effort, I can help with some of the ground >>>> work to >>>> > >>> kickstart it. >>>> > >>> >>>> > >>> I would say the first step would be for someone to prepare a >>>> high-level >>>> > >>> proposal. >>>> > >>> >>>> > >>> -Sam >>>> > >>> >>>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com> >>>> wrote: >>>> > >>> >>>> > >>>> Hi team >>>> > >>>> I am a dev from StarRocks community, and we have supported >>>> iceberg v1 >>>> > >>>> format. >>>> > >>>> We are also planning to support v2 format. If there is a C++ >>>> package, >>>> > >>>> it will be very convenient for our implementation. >>>> > >>>> At the same time, other c++ computing engines support v2 format >>>> will >>>> > >>>> also be faster. >>>> > >>>> >>>> > >>>> Do we have plans to support c++ version sdk? >>>> > >>>> -- >>>> > >>>> caneGuy >>>> > >>>> >>>> > >>> -- >>>> > >>> >>>> > >>> Sam Redai <s...@tabular.io> >>>> > >>> >>>> > >>> Developer Advocate | Tabular <https://tabular.io/> >>>> > >>> >>>> > >>> c (267) 226-8606 >>>> > >>> >>>> > >> >>>> > >>>> >>> >> >> -- >> >> Kyle Bendickson >> >> OSS Developer | Tabular <https://tabular.io/> >> >> k...@tabular.io >> >