[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

Steve Loughran (JIRA) Thu, 07 Jun 2018 11:03:47 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505018#comment-16505018
 ]


Steve Loughran commented on HADOOP-15407:
-----------------------------------------

h1. First attempt at testing

I found it very hard to get up and running. As in: I have one of the contract 
tests going, but nothing else yet. 

The testing docs will need to explain how to get started. The easier the setup 
process, the easier writing those docs become

I'd make writing that doc priority, as without that, getting the tests working 
will be a blocker to reviews.

Key things I had trouble with

* the difference between the wasb & dfs accounts
* what's needed in terms of pre-test store container setup. I think it's 
happening
automatically, but that's probably repeating the same problem we see with wasb: 
container leakage & the
need to periodically purge them all.  If that's the case, a new version 
of {{org.apache.hadoop.fs.azure.integration.CleanupTestContainers}} is needed, 
and
again, the docs.
* Lack of meaningful details on why a test setup failed other than "skipped". 
The attached patch
addresses that by including a message in the Assume clause. (side-note: I 
expect meaningful messages in
*all* Assume.assume clauses, as I try to do in my own contribs). 


I tried to get {{ITestAzureBlobFileSystemMkDir}} up and working and didn't get 
that far, timeouts.


Every test needs a timeout. This is to avoid messages like

{code}
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test
   (default-test) on project hadoop-azure: There was a timeout or other error 
in the fork -> [Help 1]
{code}

When maven kills a test, all the output is lost, and, as all test teardown is 
skipped,
things on remote stores left in a mess.

I've added one to {{DependencyInjectedTest}} where it will be found everywhere

This shows me what's hanging. I'm assuming its still test setup related, so
will look at my config options more. But the fact things are timing out
if the tests are misconfigured is a problem on its own.


{code}
"Thread-0" #13 prio=5 os_prio=31 tid=0x00007f97061b3000 nid=0x5803 waiting on 
condition [0x000070000f8ac000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:255)
        at 
com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:769)
        at 
com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:756)
        at 
org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobContainerWrapperImpl.exists(StorageInterfaceImpl.java:233)
        at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:856)
        at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1081)
        at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:538)
        at 
org.apache.hadoop.fs.azurebfs.DependencyInjectedTest.initialize(DependencyInjectedTest.java:132)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}


# I don't think falling back to anon should happen, at least with tests.
# I absolutely don't think that login failures should be something you retry on.

I see there's a call to {{suppressRetryPolicyInClientIfNeeded();}}, so
the tests need to make sure that's running. I think production-side code
needs to look at the auth codepath and make sure that its operations are all
fail fast.

Proposed: add a test for this. Create a config, remove the auth, try to do
anon access to your test containers. Expect it to fail fast.

Other aspects of that test: LambdaTestUtils.intercept() loves closures which 
return things other than void: the string value of the response is used in the 
assertion message created when the callback doesn't raise an exception.

So whenever you can, try and return values, especially those where toString() 
is meaningful.

(In some tests elsewhere I've added extra code to do that, say returning 
getFileStatus()) after
a call expected to fail, purely for the diagnostics.


> Support Windows Azure Storage - Blob file system in Hadoop
> ----------------------------------------------------------
>
>                 Key: HADOOP-15407
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15407
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/azure
>    Affects Versions: 3.2.0
>            Reporter: Esfandiar Manii
>            Assignee: Esfandiar Manii
>            Priority: Major
>         Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch, 
> HADOOP-15407-patch-atop-patch-007.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://<filesystem>@<account>.dfs.core.windows.net/<path>{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}·         Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}·         Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}·         Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}·         Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

Reply via email to