[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591812#comment-16591812 ]
genericqa commented on HADOOP-15407: ------------------------------------ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HADOOP-15407 does not apply to HADOOP-15407. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-15407 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927853/HADOOP-15407-HADOOP-15407-008.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/15090/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Support Windows Azure Storage - Blob file system in Hadoop > ---------------------------------------------------------- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure > Affects Versions: 3.2.0 > Reporter: Esfandiar Manii > Assignee: Da Zhou > Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://<filesystem>@<account>.dfs.core.windows.net/<path>{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org