Thanks Xiaoqiao He! Let me provide more context about this project.
libhdfs-rust aims to provide native HDFS client support for Rust, a rapidly growing systems programming language commonly used in modern infrastructure such as databases. With libhdfs-rust, Rust developers can more easily integrate with HDFS. libhdfs-rust is analogous to both libhdfs (C API) and libhdfspp <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> (C++ API). Its current codebase builds upon libhdfs, but there are plans to rewrite it entirely in pure rust. Consequently, libhdfs-rust will interface directly with the HDFS Java client via JNI, making it fully parallel to both libhdfs and libhdfs-cpp. There are three possible ways for us to take: We have three options to consider: A: Integrate libhdfs-rust into the Hadoop repository, placing it under 'hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native'. B: Accept libhdfs-rust as a subproject and establish a new repository named 'hadoop-hdfs-rust-client' (or another suitable name). C: Maintain libhdfs-rust as an independent project outside of Hadoop. I personally prefer Option B since: For Option A The release process for Hadoop is already quite complex. We should avoid placing additional burdens on the Release Managers, especially when it involves integrating a new language. And it's impossible to wait for libhdfs-rust mature and stable enough to catch up the release train. For Option C libhdfs-rust is exactly the same with libhdfs & libhdfspp <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> but for rust. Building a community for libhdfs-rust outside of Hadoop is challenging. In fact, numerous attempts have been made: the Rust community has developed around 10 different HDFS client projects. However, almost all of them are no longer maintained. In conclusion, I believe that Option B is the best choice for us: we can develop a rust project in hadoop community, attract more rust users, and recruit additional committers from the rust community. On Wed, Dec 20, 2023, at 21:53, Xiaoqiao He wrote: > Thanks Xuanwo for your work. I believe it is valuable to enlarge hadoop > ecosystem. > > I am also concerned that it will involve more hard work to release and > version match, > especially for one who is not familiar with C or Rust. > Moreover, I am not aware the difference between `accept hdfs-sys as part of > hadoop > project` and `one separate project`. > > I think one smooth solution is reference hadoop-thirdparty[1] which is one > hadoop > sub-project but split to separate repo and release line etc, if it is > accepted. > > cc @Ayush Saxena <mailto:ayush...@gmail.com> @Wei-Chiu Chuang > <mailto:weic...@apache.org> @Iñigo Goiri <mailto:elgo...@gmail.com> @Shilun > Fan <mailto:slfan1...@foxmail.com> and other folks, what > do you think? Thanks. > > Best Regards, > - He Xiaoqiao > > [1] https://github.com/apache/hadoop-thirdparty > > On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xua...@apache.org> wrote: >> I'm fine to start work under a new repo, and I'm willing to help maintain >> this repo. The repo could name after hadoop-libhdfs-rust or just >> libhdfs-rust. >> >> I'm PPMC member of other ASF projects so I know how to do release and how to >> make sure the license fit the requirements. I'm willing the become the RM >> until we find more committers for this sub-project. >> >> I'm currently looking for committers willing to help me review PRs and >> validate my releases. Is there anyone interested in sponsoring me? >> >> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote: >> > > What is libdirent? How is it relevant in this context? >> > >> > Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC does >> > not provide this header which causes issues when building libhdfs on >> > Windows platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC >> > port of the dirent.h API for Windows. >> > >> > Fortunately, hdfs has already done similar work in >> > [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can >> > migrate to use hdfs's own implementation instead. >> > >> > > How tightly coupled is it to a specific Hadoop version? >> > >> > Thanks to hdfs's stable API, there is no breakage between different hadoop >> > version (only addition). So the version matrix will be like: >> > >> > - libhdfs-rust (feature flag: v2_2) can access hadoop v2.2 ~ v3.3 >> > ... >> > - libhdfs-rust (feature flag: v2_10) can access hadoop v2.10 ~ v3.3 >> > ... >> > - libhdfs-rust (feature flag: v3_3) can access hadoop v3.3 >> > >> > > The concern I have as a release manager is that it makes my life harder >> > > to ensure the quality of a language binding that I am not familiar with. >> > >> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool >> > developed by the Rust Team to automatically generate Rust FFI bindings for >> > C (and some C++) libraries. Other parts are related to building and >> > linking, similar to Makefile, such as finding libjvm and libhdfs. >> > >> > In general, the task that libhdfs-rust performs is simple: it provides an >> > API to Rust and links it with libhdfs.so, which I believe is easy to test. >> > >> > [libdirect]: https://github.com/tronkko/dirent >> > [native/libhdfspp/lib/x-platform]: >> > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h >> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen >> > >> > >> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote: >> >> Inline >> >> >> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ayush...@gmail.com> wrote: >> >>> Forwarding from dev@hadoop to relevant ML >> >>> >> >>> Original mail: >> >>> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq >> >>> >> >>> -Ayush >> >>> >> >>> On 2023/07/15 09:18:42 Xuanwo wrote: >> >>> > Hello, everyone. >> >>> > >> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for >> >>> > Rust. I want to know is it a good idea of accepting hdfs-sys as a part >> >>> > of hadoop project? >> >>> > >> >>> > Users of hdfs-sys for now: >> >>> > >> >>> > - [OpenDAL]: An Apache Incubator project that allows users to easily >> >>> > and efficiently retrieve data from various storage services in a >> >>> > unified way. >> >>> > - [Databend]: A modern cloud data warehouse focusing on reducing cost >> >>> > and complexity for your massive-scale analytics needs. (via OpenDAL) >> >>> > - [RisingWave]: The distributed streaming database: SQL stream >> >>> > processing with Postgres-like experience. (via OpenDAL) >> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse >> >>> > framework >> >>> > >> >>> > Licenses information of hdfs-sys: >> >>> > >> >>> > - hdfs-sys itself licensed under Apache-2.0 >> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1, >> >>> > hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all >> >>> > dual licensed under Apache-2.0 and MIT. >> >>> > >> >>> > Works need to do if accept: >> >>> > >> >>> > - Replace libdirent with the same dirent API implemented in HDFS >> >>> > project. >> >>> > - Remove all bundled hdfs C code. >> >> What is libdirent? How is it relevant in this context? >> >> >> >> How tightly coupled is it to a specific Hadoop version? I am wondering if >> >> it's possible to host it in a separate Hadoop repo, if it's accepted. The >> >> concern I have as a release manager is that it makes my life harder to >> >> ensure the quality of a language binding that I am not familiar with. >> >>> > >> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys >> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal >> >>> > [Databend]: https://github.com/datafuselabs/databend >> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave >> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul >> >>> > >> >>> > Xuanwo >> >>> > >> >>> > --------------------------------------------------------------------- >> >>> > To unsubscribe, e-mail: dev-unsubscr...@hadoop.apache.org >> >>> > For additional commands, e-mail: dev-h...@hadoop.apache.org >> >>> > >> >>> > >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >> >>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org >> > >> > Xuanwo >> > >> >> Xuanwo Xuanwo