[ 
https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087208#comment-16087208
 ] 

Steve Loughran commented on HADOOP-14660:
-----------------------------------------

Interesting: numbers look good, and its the kind of problem which doesn't 
surface in testing unless you are playing with fault injection.

What's going to happen over long haul links, especially if there's a bit of 
unreliability in them? I'm thinking of (a) overeacting to transient failures, 
and conflict with the usual TCP congestion control 

I'm fixing up S3A to handle AWS throttling, which has to be in our client (503 
-> throttled exception -> custom retry policy with exponential backoff (maybe I 
should add jitter too)). I plan to actually log this as part of the 
instrumentation, so when you collect the results of a job, you can see that a 
lot of throttling took place. Is there a way to pick this up here? 


# updating SDK must be a separate, standalone patch, linked as depended on by 
HADOOP-9991; mark this throttling patch as depending on the SDK update. That 
helps identifying high-risk version updates, simplifies rollback, etc.
# new key should, if settable by users, listed in a public constant. I'd also 
prefer "enabled" as the element name, rather than "enable", though that's very 
subjective
# log @ debug when throttling is on & what the options are.
# {{BlobOperationDescriptor.getContentLengthIfKnown}}: add tests for parsing 
invalid input strings; they should be be rejected rather than trigger failures 
(which is probably what happens in this patch)
# Use SLF4J for all logging; we've moved off commons logging. This lets you do 
logging as {{LOG.info("timer is {}", timer)}} & have on-demand string 
construction only when the logging is needed
# if an outer class is tagged as {{InterfaceAudience.Private}}, no need to mark 
the inner classes/interfaces.
* are you confident that {{TimerTaskImpl}} is always stopped? And can you make 
sure the thread has a meaningful name, like "wasb-timer-container-mycontainer". 

h3. {{ClientThrottlingAnalyzer}}

* L81. you can use {{Preconditions.checkArgument}} here, and 
{{StringUtils.isNotEmpty()}} as the check for non-empty strings.
* L183: to avoid needless string construction, guard the log@ debug with a 
{{LOG.isDebugEnabled()}}

h3. {{TestBlobOperationDescriptor}}

* extend {{AbstractWasbTestBase}}
* output streams &c must be closed even when asserts are raised; 
try-with-resources can do this.
* {{AbstractWasbTestBase}} & others. Just say {{throws Exception}}; more 
flexible in future as tests change. 

h3. {{TestClientThrottlingAnalyzer}}


* In HADOOP-14553 I'm going to switch all possible tests to parallel runs, and 
don't want to add new sequential runs if possible. How will a parallel run 
impact timing estimates, especially if run over a long-haul link. Some of those 
assertions look britlle.
* lot of repetition in the tests: Can you refactor this out?





> wasb: improve throughput by 34% when account limit exceeded
> -----------------------------------------------------------
>
>                 Key: HADOOP-14660
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14660
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>            Reporter: Thomas
>            Assignee: Thomas
>         Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch
>
>
> Big data workloads frequently exceed the Azure Storage max ingress and egress 
> limits 
> (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits).  
> For example, the max ingress limit for a GRS account in the United States is 
> currently 10 Gbps.  When the limit is exceeded, the Azure Storage service 
> fails a percentage of incoming requests, and this causes the client to 
> initiate the retry policy.  The retry policy delays requests by sleeping, but 
> the sleep duration is independent of the client throughput and account limit. 
>  This results in low throughput, due to the high number of failed requests 
> and thrashing causes by the retry policy.
> To fix this, we introduce a client-side throttle which minimizes failed 
> requests and maximizes throughput.  Tests have shown that this improves 
> throughtput by ~34% when the storage account max ingress and/or egress limits 
> are exceeded. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to