[ 
https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839403#comment-17839403
 ] 

Han Liu commented on HADOOP-19085:
----------------------------------

Sorry for the late reply.

I tried to run the benchmark tool against S3A and ABFS. The result shows that 
S3A passed 140 cases over 222, and ABFS passed 155 cases.

I am using Hadoop 3.2.1 as the baseline. The result for S3A:
{quote}Hadoop Compatibility Report for ALL:

    63.06%, PASSED 140 OVER 222

    URI: <my-test-uri> (suite: 
org.apache.hadoop.fs.compat.suites.HdfsCompatSuiteForAll)

    Hadoop Version as Baseline: 3.2.1
{quote}
The result for ABFS (Azure Data Lake Storage Gen2):
{quote}Hadoop Compatibility Report for ALL:

    69.82%, PASSED 155 OVER 222

    URI: <my-test-uri> (suite: 
org.apache.hadoop.fs.compat.suites.HdfsCompatSuiteForAll)

    Hadoop Version as Baseline: 3.2.1
{quote}
I am planning for more file systems to run. Any suggestions? 
[~ste...@apache.org] 

> Compatibility Benchmark over HCFS Implementations
> -------------------------------------------------
>
>                 Key: HADOOP-19085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19085
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, test
>    Affects Versions: 3.4.0
>            Reporter: Han Liu
>            Assignee: Han Liu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>         Attachments: HADOOP-19085.001.patch, HDFS Compatibility Benchmark 
> Design.pdf
>
>
> {*}Background:{*}Hadoop-Compatible File System (HCFS) is a core conception in 
> big data storage ecosystem, providing unified interfaces and generally clear 
> semantics, and has become the de-factor standard for industry storage systems 
> to follow and conform with. There have been a series of HCFS implementations 
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
> Storage, and more from storage service's providers on their own.
> {*}Problems:{*}However, as indicated by introduction.md, there is no formal 
> suite to do compatibility assessment of a file system for all such HCFS 
> implementations. Thus, whether the functionality is well accomplished and 
> meets the core compatible expectations mainly relies on service provider's 
> own report. Meanwhile, Hadoop is also developing and new features are 
> continuously contributing to HCFS interfaces for existing implementations to 
> follow and update, in which case, Hadoop also needs a tool to quickly assess 
> if these features are supported or not for a specific HCFS implementation. 
> Besides, the known hadoop command line tool or hdfs shell is used to directly 
> interact with a HCFS storage system, where most commands correspond to 
> specific HCFS interfaces and work well. Still, there are cases that are 
> complicated and may not work, like expunge command. To check such commands 
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal:{*}Accordingly, we propose to define a formal HCFS compatibility 
> benchmark and provide corresponding tool to do the compatibility assessment 
> for an HCFS storage system. The benchmark and tool should consider both HCFS 
> interfaces and hdfs shell commands. Different scenarios require different 
> kinds of compatibilities. For such consideration, we could define different 
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage 
> providers and storage users. For end users, it can be used to evalute the 
> compatibility level and determine if the storage system in question is 
> suitable for the required scenarios. For storage providers, it helps to 
> quickly generate an objective and reliable report about core functioins of 
> the storage service. As an instance, if the HCFS got a 100% on a suite named 
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have 
> been well achieved. It is also a guide indicating how storage service 
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to