RE: Re: [DISCUSS] Add Baidu Cloud BOS filesystem connector to Hadoop

Yang,Dongdong(ACG CCN) Mon, 08 Jun 2026 00:28:53 -0700

Hi everyone,

Thank you Shilun for driving this discussion, and thank you Xiaoqiao for
the review and support.

Current Status:
The PR (#8347) is actively being reviewed. We have addressed feedback
from LuciferYang and pan3793, including code quality improvements,
documentation fixes, and GHA workflow integration. CI is passing.

Known Limitations:
- No append support
- No hflush/hsync (calls degrade to no-op; data is persisted on close)
- No concat or truncate
- No symbolic links or extended attributes

Follow-up Plan:
1. Ensure CI remains green after rebase on latest trunk
2. Once community consensus is reached, merge it
3. Continue iterating on improvements in subsequent PRs, such as known 
limitations
4. Upon discovering insufficient performance, we run benchmarks (such as TPCDS 
and NNBench) on HDFS and connector, and optimize them

We welcome any questions, concerns, or suggestions from the dev team.

Best regards,
Dongdong Yang.


On 2026/05/26 06:40:48 Xiaoqiao He wrote:
> Thanks Shilun for driving this progress.
> +1 from my side,
> a. From the PR (https://github.com/apache/hadoop/pull/8347), the code has
> been ready now.
> b. Both of the contributors are PMC members or committers from mature
> community of apache.
> I would like to hear more sound from the dev team about the following
> plan. Good
> Luck!
>
> Best Regards,
> - He Xiaoqiao
>
> On Fri, May 22, 2026 at 9:33 PM slfan1989 <[email protected]> wrote:
>
> > Hi Hadoop community,
> >
> > I would like to start a discussion about adding Baidu Cloud BOS
> > (Baidu Object Storage) as a native Hadoop-compatible filesystem connector.
> >
> > JIRA: https://issues.apache.org/jira/browse/HDFS-11161
> > PR: https://github.com/apache/hadoop/pull/8347
> > CI Status: +1 overall, all checks passed.
> >
> > I have had some offline discussions with LuciferYang and the contributors
> > working on this connector. Based on those discussions, I am helping bring
> > this proposal to the Hadoop community for broader review and feedback.
> >
> > The goal is to integrate BOS support as a native Hadoop filesystem module,
> > similar to the existing hadoop-aws (S3A), hadoop-aliyun, and hadoop-cos
> > connectors.
> >
> > 1. Background
> >
> > Baidu Cloud is one of the major cloud service providers in China. BOS
> > (Baidu Object Storage) is Baidu's core object storage service and is widely
> > used for big data analytics, machine learning, and data lake workloads.
> >
> > A native Hadoop connector would allow Hadoop ecosystem projects, including
> > MapReduce, Spark, Hive, Flink, and others, to access BOS storage directly
> > through the bos:// scheme.
> >
> > According to the contributors, this connector has been running in
> > production
> > at Baidu for around 8 years, serving both BOS users and Baidu MapReduce
> > (BMR) workloads.
> >
> > 2. Implementation
> >
> > The proposed module is placed under:
> >
> >   hadoop-cloud-storage-project/hadoop-bos
> >
> > This follows the structure of the existing cloud storage connectors.
> >
> > The implementation includes:
> >
> > - A full Hadoop FileSystem implementation with the bos:// URI scheme
> > - Pluggable credentials provider support
> > - Contract tests covering standard filesystem operations
> > - Dependency shading or exclusion to avoid classpath conflicts, with shaded
> >   dependencies placed under org.apache.hadoop.fs.bos.shaded.*
> >
> > 3. Long-term Maintenance
> >
> > The following contributors have expressed commitment to maintaining this
> > module:
> >
> > - yangdong2398, BOS R&D
> > - LuciferYang, Apache Spark PMC
> > - jackylee-ch, Apache Gluten PMC
> > - houzhizhen, Apache HugeGraph committer
> > - summaryzb, Apache Uniffle committer
> >
> > They have committed to:
> >
> > - Responding to issues and PRs within one week
> > - Keeping dependencies up to date
> > - Adapting the connector to future Hadoop API changes
> >
> > 4. Why Consider Integrating This into Hadoop
> >
> > This proposal follows a similar rationale to hadoop-aws (S3A),
> > hadoop-aliyun, and hadoop-cos:
> >
> > - Users can rely on a single, consistent Hadoop distribution without
> >   managing separate connector JARs and version compatibility manually
> > - A connector maintained within the Hadoop community is easier for users to
> >   trust and review
> > - Shared CI helps ensure ongoing compatibility with Hadoop trunk
> >
> > I would like to invite feedback from the community on whether this
> > connector
> > is appropriate to include in Hadoop, and what additional work, review, or
> > requirements would be needed before it can be accepted.
> >
> > The contributors are copied / expected to participate in this discussion
> > and
> > can provide more details about the implementation, production usage, and
> > maintenance plan.
> >
> > Best regards,
> > Shilun Fan.
> >
>

RE: Re: [DISCUSS] Add Baidu Cloud BOS filesystem connector to Hadoop

Reply via email to