Han Liu created HDFS-17316:
------------------------------

             Summary: Compatibility Benchmark over HCFS Implementations
                 Key: HDFS-17316
                 URL: https://issues.apache.org/jira/browse/HDFS-17316
             Project: Hadoop HDFS
          Issue Type: New Feature
            Reporter: Han Liu


{*}Background:{*}Hadoop-Compatible File System (HCFS) is a core conception in 
big data storage ecosystem, providing unified interfaces and generally clear 
semantics, and has become the de-factor standard for industry storage systems 
to follow and conform with. There have been a series of HCFS implementations in 
Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
Storage, and more from storage service's providers on their own.

{*}Problems:{*}However, as indicated by introduction.md, there is no formal 
suite to do compatibility assessment of a file system for all such HCFS 
implementations. Thus, whether the functionality is well accomplished and meets 
the core compatible expectations mainly relies on service provider's own 
report. Meanwhile, Hadoop is also developing and new features are continuously 
contributing to HCFS interfaces for existing implementations to follow and 
update, in which case, Hadoop also needs a tool to quickly assess if these 
features are supported or not for a specific HCFS implementation. Besides, the 
known hadoop command line tool or hdfs shell is used to directly interact with 
a HCFS storage system, where most commands correspond to specific HCFS 
interfaces and work well. Still, there are cases that are complicated and may 
not work, like expunge command. To check such commands for an HCFS, we also 
need an approach to figure them out.

{*}Proposal:{*}Accordingly, we propose to define a formal HCFS compatibility 
benchmark and provide corresponding tool to do the compatibility assessment for 
an HCFS storage system. The benchmark and tool should consider both HCFS 
interfaces and hdfs shell commands. Different scenarios require different kinds 
of compatibilities. For such consideration, we could define different suites in 
the benchmark.

*Benefits:* We intend the benchmark and tool to be useful for both storage 
providers and storage users. For end users, it can be used to evalute the 
compatibility level and determine if the storage system in question is suitable 
for the required scenarios. For storage providers, it helps to quickly generate 
an objective and reliable report about core functioins of the storage service. 
As an instance, if the HCFS got a 100% on a suite named 'tpcds', it is 
demonstrated that all functions needed by a tpcds program have been well 
achieved. It is also a guide indicating how storage service abilities can map 
to HCFS interfaces, such as storage class on S3.

Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to