It looks pretty challenging to me. Most of the committers aren't technically equipped to review this code, so getting the initial code reviewed & merged itself would be a challenge, as none of us can actually review the code.
Looking at the repo, it has only 1 or 2 major contributors, which itself is a red flag, the bus factor is pretty low, if we don't find volunteers in future, we would be stuck with some dead code, which most of us don't know how to fix or maintain. If there is any CVE reported from this code post release, that would be a challenge for us to fix Quoting: > the Rust community has developed around 10 different HDFS client projects. However, almost all of them are no longer maintained. If they couldn't do, how we will be able to do that? and this isn't a very good statistic to quote :-) Well, I don't have objections on having this as a separate repo in Hadoop, if others are fine with it, I can try to help whatever is in my capacity, but I still have doubts on how easy would it be to push code or get votes on release of this project, which most of the people doesn't have knowledge & developing a community and stuff seems like a incubator thing to me. -Ayush On Thu, 21 Dec 2023 at 19:01, Xuanwo <xua...@apache.org> wrote: > > Thanks Xiaoqiao He! > > Let me provide more context about this project. > > libhdfs-rust aims to provide native HDFS client support for Rust, a rapidly > growing systems > programming language commonly used in modern infrastructure such as > databases. With > libhdfs-rust, Rust developers can more easily integrate with HDFS. > libhdfs-rust is analogous > to both libhdfs (C API) and libhdfspp > <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> > (C++ API). Its current codebase builds upon libhdfs, but > there are plans to rewrite it entirely in pure rust. Consequently, > libhdfs-rust will interface > directly with the HDFS Java client via JNI, making it fully parallel to both > libhdfs and libhdfs-cpp. > > There are three possible ways for us to take: > > We have three options to consider: > > A: Integrate libhdfs-rust into the Hadoop repository, placing it under > 'hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native'. > B: Accept libhdfs-rust as a subproject and establish a new repository > named 'hadoop-hdfs-rust-client' (or another suitable name). > C: Maintain libhdfs-rust as an independent project outside of Hadoop. > > I personally prefer Option B since: > > For Option A > > The release process for Hadoop is already quite complex. We should avoid > placing additional > burdens on the Release Managers, especially when it involves integrating a > new language. > > And it's impossible to wait for libhdfs-rust mature and stable enough to > catch up the release train. > > For Option C > > libhdfs-rust is exactly the same with libhdfs & libhdfspp > <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> > but for rust. Building a community for > libhdfs-rust outside of Hadoop is challenging. In fact, numerous attempts > have been made: the Rust > community has developed around 10 different HDFS client projects. However, > almost all of them > are no longer maintained. > > In conclusion, I believe that Option B is the best choice for us: we can > develop a rust project in hadoop > community, attract more rust users, and recruit additional committers from > the rust community. > > > On Wed, Dec 20, 2023, at 21:53, Xiaoqiao He wrote: > > Thanks Xuanwo for your work. I believe it is valuable to enlarge hadoop > > ecosystem. > > > > I am also concerned that it will involve more hard work to release and > > version match, > > especially for one who is not familiar with C or Rust. > > Moreover, I am not aware the difference between `accept hdfs-sys as part of > > hadoop > > project` and `one separate project`. > > > > I think one smooth solution is reference hadoop-thirdparty[1] which is one > > hadoop > > sub-project but split to separate repo and release line etc, if it is > > accepted. > > > > cc @Ayush Saxena <mailto:ayush...@gmail.com> @Wei-Chiu Chuang > > <mailto:weic...@apache.org> @Iñigo Goiri <mailto:elgo...@gmail.com> @Shilun > > Fan <mailto:slfan1...@foxmail.com> and other folks, what > > do you think? Thanks. > > > > Best Regards, > > - He Xiaoqiao > > > > [1] https://github.com/apache/hadoop-thirdparty > > > > On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xua...@apache.org> wrote: > >> I'm fine to start work under a new repo, and I'm willing to help maintain > >> this repo. The repo could name after hadoop-libhdfs-rust or just > >> libhdfs-rust. > >> > >> I'm PPMC member of other ASF projects so I know how to do release and how > >> to make sure the license fit the requirements. I'm willing the become the > >> RM until we find more committers for this sub-project. > >> > >> I'm currently looking for committers willing to help me review PRs and > >> validate my releases. Is there anyone interested in sponsoring me? > >> > >> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote: > >> > > What is libdirent? How is it relevant in this context? > >> > > >> > Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC > >> > does not provide this header which causes issues when building libhdfs > >> > on Windows platforms. To solve this problem, hdfs-sys uses libdirent - a > >> > MSVC port of the dirent.h API for Windows. > >> > > >> > Fortunately, hdfs has already done similar work in > >> > [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can > >> > migrate to use hdfs's own implementation instead. > >> > > >> > > How tightly coupled is it to a specific Hadoop version? > >> > > >> > Thanks to hdfs's stable API, there is no breakage between different > >> > hadoop version (only addition). So the version matrix will be like: > >> > > >> > - libhdfs-rust (feature flag: v2_2) can access hadoop v2.2 ~ v3.3 > >> > ... > >> > - libhdfs-rust (feature flag: v2_10) can access hadoop v2.10 ~ v3.3 > >> > ... > >> > - libhdfs-rust (feature flag: v3_3) can access hadoop v3.3 > >> > > >> > > The concern I have as a release manager is that it makes my life > >> > > harder to ensure the quality of a language binding that I am not > >> > > familiar with. > >> > > >> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool > >> > developed by the Rust Team to automatically generate Rust FFI bindings > >> > for C (and some C++) libraries. Other parts are related to building and > >> > linking, similar to Makefile, such as finding libjvm and libhdfs. > >> > > >> > In general, the task that libhdfs-rust performs is simple: it provides > >> > an API to Rust and links it with libhdfs.so, which I believe is easy to > >> > test. > >> > > >> > [libdirect]: https://github.com/tronkko/dirent > >> > [native/libhdfspp/lib/x-platform]: > >> > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h > >> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen > >> > > >> > > >> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote: > >> >> Inline > >> >> > >> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ayush...@gmail.com> wrote: > >> >>> Forwarding from dev@hadoop to relevant ML > >> >>> > >> >>> Original mail: > >> >>> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq > >> >>> > >> >>> -Ayush > >> >>> > >> >>> On 2023/07/15 09:18:42 Xuanwo wrote: > >> >>> > Hello, everyone. > >> >>> > > >> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for > >> >>> > Rust. I want to know is it a good idea of accepting hdfs-sys as a > >> >>> > part of hadoop project? > >> >>> > > >> >>> > Users of hdfs-sys for now: > >> >>> > > >> >>> > - [OpenDAL]: An Apache Incubator project that allows users to easily > >> >>> > and efficiently retrieve data from various storage services in a > >> >>> > unified way. > >> >>> > - [Databend]: A modern cloud data warehouse focusing on reducing > >> >>> > cost and complexity for your massive-scale analytics needs. (via > >> >>> > OpenDAL) > >> >>> > - [RisingWave]: The distributed streaming database: SQL stream > >> >>> > processing with Postgres-like experience. (via OpenDAL) > >> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse > >> >>> > framework > >> >>> > > >> >>> > Licenses information of hdfs-sys: > >> >>> > > >> >>> > - hdfs-sys itself licensed under Apache-2.0 > >> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73, > >> >>> > glob@0.3.1, hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, > >> >>> > they are all dual licensed under Apache-2.0 and MIT. > >> >>> > > >> >>> > Works need to do if accept: > >> >>> > > >> >>> > - Replace libdirent with the same dirent API implemented in HDFS > >> >>> > project. > >> >>> > - Remove all bundled hdfs C code. > >> >> What is libdirent? How is it relevant in this context? > >> >> > >> >> How tightly coupled is it to a specific Hadoop version? I am wondering > >> >> if it's possible to host it in a separate Hadoop repo, if it's > >> >> accepted. The concern I have as a release manager is that it makes my > >> >> life harder to ensure the quality of a language binding that I am not > >> >> familiar with. > >> >>> > > >> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys > >> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal > >> >>> > [Databend]: https://github.com/datafuselabs/databend > >> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave > >> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul > >> >>> > > >> >>> > Xuanwo > >> >>> > > >> >>> > --------------------------------------------------------------------- > >> >>> > To unsubscribe, e-mail: dev-unsubscr...@hadoop.apache.org > >> >>> > For additional commands, e-mail: dev-h...@hadoop.apache.org > >> >>> > > >> >>> > > >> >>> > >> >>> --------------------------------------------------------------------- > >> >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > >> >>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >> > > >> > Xuanwo > >> > > >> > >> Xuanwo > > Xuanwo --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org