Thanks Xiaoqiao He!

Let me provide more context about this project.

libhdfs-rust aims to provide native HDFS client support for Rust, a rapidly 
growing systems
programming language commonly used in modern infrastructure such as databases. 
With 
libhdfs-rust, Rust developers can more easily integrate with HDFS. libhdfs-rust 
is analogous
to both libhdfs (C API) and libhdfspp 
<https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp>
 (C++ API). Its current codebase builds upon libhdfs, but 
there are plans to rewrite it entirely in pure rust. Consequently, libhdfs-rust 
will interface 
directly with the HDFS Java client via JNI, making it fully parallel to both 
libhdfs and libhdfs-cpp.

There are three possible ways for us to take:

We have three options to consider:

A: Integrate libhdfs-rust into the Hadoop repository, placing it under 
    'hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native'.
B: Accept libhdfs-rust as a subproject and establish a new repository 
    named 'hadoop-hdfs-rust-client' (or another suitable name).
C: Maintain libhdfs-rust as an independent project outside of Hadoop.

I personally prefer Option B since:

For Option A

The release process for Hadoop is already quite complex. We should avoid 
placing additional 
burdens on the Release Managers, especially when it involves integrating a new 
language.

And it's impossible to wait for libhdfs-rust mature and stable enough to catch 
up the release train.

For Option C

libhdfs-rust is exactly the same with libhdfs & libhdfspp 
<https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp>
 but for rust. Building a community for 
libhdfs-rust outside of Hadoop is challenging. In fact, numerous attempts have 
been made: the Rust 
community has developed around 10 different HDFS client projects. However, 
almost all of them 
are no longer maintained.

In conclusion, I believe that Option B is the best choice for us: we can 
develop a rust project in hadoop 
community, attract more rust users, and recruit additional committers from the 
rust community.


On Wed, Dec 20, 2023, at 21:53, Xiaoqiao He wrote:
> Thanks Xuanwo for your work. I believe it is valuable to enlarge hadoop 
> ecosystem.
> 
> I am also concerned that it will involve more hard work to release and 
> version match,
> especially for one who is not familiar with C or Rust. 
> Moreover, I am not aware the difference between `accept hdfs-sys as part of 
> hadoop
> project` and `one separate project`.
> 
> I think one smooth solution is reference hadoop-thirdparty[1] which is one 
> hadoop
> sub-project but split to separate repo and release line etc, if it is 
> accepted.
> 
> cc @Ayush Saxena <mailto:ayush...@gmail.com> @Wei-Chiu Chuang 
> <mailto:weic...@apache.org> @Iñigo Goiri <mailto:elgo...@gmail.com> @Shilun 
> Fan <mailto:slfan1...@foxmail.com> and other folks, what
> do you think? Thanks.
> 
> Best Regards,
> - He Xiaoqiao
> 
> [1] https://github.com/apache/hadoop-thirdparty
> 
> On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xua...@apache.org> wrote:
>> I'm fine to start work under a new repo, and I'm willing to help maintain 
>> this repo. The repo could name after hadoop-libhdfs-rust or just 
>> libhdfs-rust. 
>> 
>> I'm PPMC member of other ASF projects so I know how to do release and how to 
>> make sure the license fit the requirements. I'm willing the become the RM 
>> until we find more committers for this sub-project.
>> 
>> I'm currently looking for committers willing to help me review PRs and 
>> validate my releases. Is there anyone interested in sponsoring me?
>> 
>> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
>> > > What is libdirent? How is it relevant in this context? 
>> > 
>> > Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC does 
>> > not provide this header which causes issues when building libhdfs on 
>> > Windows platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC 
>> > port of the dirent.h API for Windows.
>> > 
>> > Fortunately, hdfs has already done similar work in 
>> > [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can 
>> > migrate to use hdfs's own implementation instead.
>> > 
>> > > How tightly coupled is it to a specific Hadoop version?
>> > 
>> > Thanks to hdfs's stable API, there is no breakage between different hadoop 
>> > version (only addition). So the version matrix will be like:
>> > 
>> > - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
>> > ...
>> > - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
>> > ...
>> > - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
>> > 
>> > > The concern I have as a release manager is that it makes my life harder 
>> > > to ensure the quality of a language binding that I am not familiar with.
>> > 
>> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool 
>> > developed by the Rust Team to automatically generate Rust FFI bindings for 
>> > C (and some C++) libraries. Other parts are related to building and 
>> > linking, similar to Makefile, such as finding libjvm and libhdfs.
>> > 
>> > In general, the task that libhdfs-rust performs is simple: it provides an 
>> > API to Rust and links it with libhdfs.so, which I believe is easy to test.
>> > 
>> > [libdirect]: https://github.com/tronkko/dirent
>> > [native/libhdfspp/lib/x-platform]: 
>> > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
>> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
>> > 
>> > 
>> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
>> >> Inline
>> >> 
>> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ayush...@gmail.com> wrote:
>> >>> Forwarding from dev@hadoop to relevant ML
>> >>> 
>> >>> Original mail: 
>> >>> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
>> >>> 
>> >>> -Ayush
>> >>> 
>> >>> On 2023/07/15 09:18:42 Xuanwo wrote:
>> >>> > Hello, everyone.
>> >>> >
>> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for 
>> >>> > Rust. I want to know is it a good idea of accepting hdfs-sys as a part 
>> >>> > of hadoop project?
>> >>> >
>> >>> > Users of hdfs-sys for now:
>> >>> >
>> >>> > - [OpenDAL]: An Apache Incubator project that allows users to easily 
>> >>> > and efficiently retrieve data from various storage services in a 
>> >>> > unified way.
>> >>> > - [Databend]: A modern cloud data warehouse focusing on reducing cost 
>> >>> > and complexity for your massive-scale analytics needs. (via OpenDAL)
>> >>> > - [RisingWave]: The distributed streaming database: SQL stream 
>> >>> > processing with Postgres-like experience. (via OpenDAL)
>> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse 
>> >>> > framework
>> >>> >
>> >>> > Licenses information of hdfs-sys:
>> >>> >
>> >>> > - hdfs-sys itself licensed under Apache-2.0
>> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1, 
>> >>> > hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all 
>> >>> > dual licensed under Apache-2.0 and MIT. 
>> >>> >
>> >>> > Works need to do if accept:
>> >>> >
>> >>> > - Replace libdirent with the same dirent API implemented in HDFS 
>> >>> > project.
>> >>> > - Remove all bundled hdfs C code.
>> >> What is libdirent? How is it relevant in this context? 
>> >> 
>> >> How tightly coupled is it to a specific Hadoop version? I am wondering if 
>> >> it's possible to host it in a separate Hadoop repo, if it's accepted. The 
>> >> concern I have as a release manager is that it makes my life harder to 
>> >> ensure the quality of a language binding that I am not familiar with.
>> >>> >
>> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
>> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal
>> >>> > [Databend]: https://github.com/datafuselabs/databend
>> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
>> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
>> >>> >
>> >>> > Xuanwo
>> >>> >
>> >>> > ---------------------------------------------------------------------
>> >>> > To unsubscribe, e-mail: dev-unsubscr...@hadoop.apache.org
>> >>> > For additional commands, e-mail: dev-h...@hadoop.apache.org
>> >>> >
>> >>> >
>> >>> 
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> >>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>> > 
>> > Xuanwo
>> > 
>> 
>> Xuanwo

Xuanwo

Reply via email to