[ https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839403#comment-17839403 ]
Han Liu commented on HADOOP-19085: ---------------------------------- Sorry for the late reply. I tried to run the benchmark tool against S3A and ABFS. The result shows that S3A passed 140 cases over 222, and ABFS passed 155 cases. I am using Hadoop 3.2.1 as the baseline. The result for S3A: {quote}Hadoop Compatibility Report for ALL: 63.06%, PASSED 140 OVER 222 URI: <my-test-uri> (suite: org.apache.hadoop.fs.compat.suites.HdfsCompatSuiteForAll) Hadoop Version as Baseline: 3.2.1 {quote} The result for ABFS (Azure Data Lake Storage Gen2): {quote}Hadoop Compatibility Report for ALL: 69.82%, PASSED 155 OVER 222 URI: <my-test-uri> (suite: org.apache.hadoop.fs.compat.suites.HdfsCompatSuiteForAll) Hadoop Version as Baseline: 3.2.1 {quote} I am planning for more file systems to run. Any suggestions? [~ste...@apache.org] > Compatibility Benchmark over HCFS Implementations > ------------------------------------------------- > > Key: HADOOP-19085 > URL: https://issues.apache.org/jira/browse/HADOOP-19085 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, test > Affects Versions: 3.4.0 > Reporter: Han Liu > Assignee: Han Liu > Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > Attachments: HADOOP-19085.001.patch, HDFS Compatibility Benchmark > Design.pdf > > > {*}Background:{*}Hadoop-Compatible File System (HCFS) is a core conception in > big data storage ecosystem, providing unified interfaces and generally clear > semantics, and has become the de-factor standard for industry storage systems > to follow and conform with. There have been a series of HCFS implementations > in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for > Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object > Storage, and more from storage service's providers on their own. > {*}Problems:{*}However, as indicated by introduction.md, there is no formal > suite to do compatibility assessment of a file system for all such HCFS > implementations. Thus, whether the functionality is well accomplished and > meets the core compatible expectations mainly relies on service provider's > own report. Meanwhile, Hadoop is also developing and new features are > continuously contributing to HCFS interfaces for existing implementations to > follow and update, in which case, Hadoop also needs a tool to quickly assess > if these features are supported or not for a specific HCFS implementation. > Besides, the known hadoop command line tool or hdfs shell is used to directly > interact with a HCFS storage system, where most commands correspond to > specific HCFS interfaces and work well. Still, there are cases that are > complicated and may not work, like expunge command. To check such commands > for an HCFS, we also need an approach to figure them out. > {*}Proposal:{*}Accordingly, we propose to define a formal HCFS compatibility > benchmark and provide corresponding tool to do the compatibility assessment > for an HCFS storage system. The benchmark and tool should consider both HCFS > interfaces and hdfs shell commands. Different scenarios require different > kinds of compatibilities. For such consideration, we could define different > suites in the benchmark. > *Benefits:* We intend the benchmark and tool to be useful for both storage > providers and storage users. For end users, it can be used to evalute the > compatibility level and determine if the storage system in question is > suitable for the required scenarios. For storage providers, it helps to > quickly generate an objective and reliable report about core functioins of > the storage service. As an instance, if the HCFS got a 100% on a suite named > 'tpcds', it is demonstrated that all functions needed by a tpcds program have > been well achieved. It is also a guide indicating how storage service > abilities can map to HCFS interfaces, such as storage class on S3. > Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org