[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498779#comment-16498779 ]
Da Zhou commented on HADOOP-15407: ---------------------------------- Hi, [~ste...@apache.org], I'm fixing the patch according to the report. There is one findbug test I'm not sure: {noformat} Redundant nullcheck of stream, which is known to be non-null in org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(byte[], int, int) Redundant null check at AbfsHttpOperation.java:is known to be non-null in org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(byte[], int, int) Redundant null check at AbfsHttpOperation.java:[line 26] {noformat} Code is here, I don't understand why this is a redundant check, is it possible to suppress this "bug"? {code:borderStyle=solid} try (InputStream stream = this.connection.getInputStream()) { if (stream == null) { return; } boolean endOfStream = false; ... ... } {code} Thanks, Da > Support Windows Azure Storage - Blob file system in Hadoop > ---------------------------------------------------------- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure > Affects Versions: 3.2.0 > Reporter: Esfandiar Manii > Assignee: Esfandiar Manii > Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, > HADOOP-15407-HADOOP-15407.006.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://<filesystem>@<account>.dfs.core.windows.net/<path>{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org