[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875126#comment-15875126
 ] 

Hadoop QA commented on HADOOP-13904:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
37s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
59s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
32s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
36s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 21s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestKDiag |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13904 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12853606/HADOOP-13904-HADOOP-13345.004.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  findbugs  checkstyle  |
| uname | Linux 0b6f1a8132dd 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HADOOP-13345 / 8b37b6a |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11661/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11661/testReport/ |
| modules | C: hadoop-common-project/hadoop-common 

[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875077#comment-15875077
 ] 

Steve Loughran commented on HADOOP-13904:
-

bq. Are you concerned that the batchWriteItem() implementation in the Dynamo 
SDK returns auth failures as retryable entries from 
BatchWriteOutcome#getUnprocessedItems()? Everything I've read implies auth 
failures are not retryable and would be propagated as usual.

sounds like I'm the confused one, aren't I? 

OK, so this isn't catch + retry, it's all about handlign the specific 
unprocessed item message that gets sent back. In this case, ignore my comments, 
I'm wrong.

LTGM, +1, pending yetus

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch, HADOOP-13904-HADOOP-13345.003.patch, 
> HADOOP-13904-HADOOP-13345.004.patch, screenshot-1.png
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-20 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874982#comment-15874982
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

{quote}
there's some TODO in the code.
{quote} 

I'm removing those and noting places we should add metrics in the JIRA instead 
(HADOOP-13453).

{quote}
always good to have the javadoc for a constant to include the {{@value marker,
{quote}

Will do.  I'll roll another patch but first I think we should come to consensus 
on the next question:

{quote}
I think the retryBackoff logic should look a bit about what failed. 
{quote}

Here is the change to {{DynamoDBMetadataStore#processBatchWriteRequest()}}:

{noformat}
@@ -418,7 +437,7 @@ public void move(Collection pathsToDelete,
* @param itemsToPut new items to be put; can be null
*/
   private void processBatchWriteRequest(PrimaryKey[] keysToDelete,
-  Item[] itemsToPut) {
+  Item[] itemsToPut) throws IOException {
 final int totalToDelete = (keysToDelete == null ? 0 : keysToDelete.length);
 final int totalToPut = (itemsToPut == null ? 0 : itemsToPut.length);
 int count = 0;
@@ -449,13 +468,41 @@ private void processBatchWriteRequest(PrimaryKey[] 
keysToDelete,
   BatchWriteItemOutcome res = dynamoDB.batchWriteItem(writeItems);
   // Check for unprocessed keys in case of exceeding provisioned throughput
   Map unprocessed = res.getUnprocessedItems();
+  int retryCount = 0;
   while (unprocessed.size() > 0) {
+retryBackoff(retryCount++);
 res = dynamoDB.batchWriteItemUnprocessed(unprocessed);
 unprocessed = res.getUnprocessedItems();
   }
 }
   }
{noformat}

{quote}
At the very least, auth failures should be recognised and propagated. 
{quote}

I'm not following this comment.  I'm not catching any exceptions.  Auth 
failures would still be propagated, no?  Not sure what I'm missing here.

Are you concerned that the batchWriteItem() implementation in the Dynamo SDK 
returns auth failures as retryable entries from 
{{BatchWriteOutcome#getUnprocessedItems()}}?  Everything I've read implies auth 
failures are not retryable and would be propagated as usual.

Let's assume that I'm wrong, and the SDK returns auth failures as a list of 
unprocessed items (though [the 
docs|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#BatchOperations]
  say these should be retried).  How do I tell if an unprocessedItem is due to 
an auth failure? All I get is a 
{{com.amazonaws.services.dynamodbv2.model.WriteRequest}}, with no 
differentiator on why it needs to be retried.

My understanding is that auth failures are not "retriable" and thus cause an 
exception from batchWriteItem(), *not*, unprocessed items.





> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch, HADOOP-13904-HADOOP-13345.003.patch, 
> screenshot-1.png
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874507#comment-15874507
 ] 

Steve Loughran commented on HADOOP-13904:
-

* Again., there's some TODO in the code. Better to add a comment in the 
relevant JIRA to mention "and patch {{DynamoDBMetadataStore  & 
AbstractITestS3AMetadataStoreScale}}" as one of the work items. There are 557 
TODO entries in branch-2: don't add any more unless you are prepared to go 
through the old ones and fix a couple. ( to be fair, I think some are mine)
* {{S3GUARD_DDB_MAX_RETRIES_DEFAULT, MIN_RETRY_SLEEP_MSEC, ... }}: always good 
to have the javadoc for a constant to include the {{@value}} marker, so the 
javadocs show what the value is.

I think the {{retryBackoff}} logic should look a bit about what failed. At the 
very least, auth failures should be recognised and propagated. It's really 
annoying when auth problems trigger failure/retry, and again, too much of the 
hadoop stack gets this wrong (e.g ZOOKEEPER-2346; We can handle anything else 
which happens (assuming all connectivity errors are transient), but if the user 
can't log on, we should fail fast

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch, HADOOP-13904-HADOOP-13345.003.patch, 
> screenshot-1.png
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-17 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872401#comment-15872401
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

Jenkins above is clean except HADOOP-14030 (TestKDiag failure) which should be 
unrelated.

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch, HADOOP-13904-HADOOP-13345.003.patch, 
> screenshot-1.png
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869063#comment-15869063
 ] 

Hadoop QA commented on HADOOP-13904:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
25s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
53s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
32s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
38s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 13s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 11s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestKDiag |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13904 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12852943/HADOOP-13904-HADOOP-13345.003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  findbugs  checkstyle  |
| uname | Linux 9eed38575087 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HADOOP-13345 / 94287ce |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11640/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11640/testReport/ |
| modules | C: hadoop-common-project/hadoop-common 

[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-15 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868982#comment-15868982
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

Filed HADOOP-14087.

[~ste...@apache.org] any remaining concerns I didn't address here?  (See my 
[last 
reply|https://issues.apache.org/jira/browse/HADOOP-13904?focusedCommentId=15852047=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15852047]
 to your comment.)

I got Sean's +1 above but wanted to make sure you are also happy (assuming 
Yetus comes back clean)

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch, HADOOP-13904-HADOOP-13345.003.patch, 
> screenshot-1.png
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-15 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868949#comment-15868949
 ] 

Sean Mackrory commented on HADOOP-13904:


Works for me - I think we're safe. +1!

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch, screenshot-1.png
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-15 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868944#comment-15868944
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

I did not find the SDK code that implements retries, but I just attached a 
screenshot showing some of the testing I've done.

This is a >30 minute scale test run on a 1/1 capacity DDB table.  There were 
zero exceptions surfaced with the current patch: retry logic is working as 
expected.

I will at least attach a new patch rebased on latest code.

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch, screenshot-1.png
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-15 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868754#comment-15868754
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

Ok.. I started on this, again realizing what a crazy API choice it would have 
been for the DDB SDK to treat this "nothing was processed" throttling behavior 
as completely separate from the "some things were processed" throttling case.

Handling that would have been pretty complex, with the unprocessedItems only 
being used in the former case and not the latter (i.e. if you get partially 
throttled, then fully throttled on first retry, you have to retry the previous 
unprocessedItems list, etc.)

I digged deeper in the AWS SDK docs and found this:

{noformat}
ProvisionedThroughputExceededException - Your request rate is too high. The AWS 
SDKs for DynamoDB automatically retry requests that receive this exception. 
{noformat}

This is consistent with my testing: Even with a tiny provisioned throughput and 
abusive scale test, I never once saw this exception.  I did frequently see 
throttling via the UnprocessedItems return value.

So, I now think we should *not* try to handle that exception here.  I'm trying 
to find the SDK code to confirm this behavior.

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-14 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866260#comment-15866260
 ] 

Sean Mackrory commented on HADOOP-13904:


{quote}This patch essentially keeps existing exception behavior but just slows 
down batch work resubmittal. So I think it is an improvement, but we may have 
to add a higher-level retry loop for the ProvisionedThroughputExceededException 
case. Why they don't just return all items as unprocessed is beyond me.{quote}

I'm of the opinion that we should be catching that one. It seems required to 
reasonably and correctly handle the behavior as documented, even though we 
haven't seen that specific edge case. Everything else sounds good to me...

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-03 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852047#comment-15852047
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

Thank you for the review.

{quote}
Yetus is unhappy with it...is it in sync with the branch?
{quote}

See the end of my previous comment.  This is based on top of HADOOP-13876.  
That one is higher priority to get in than this one (this one is an efficiency 
issue AFAICT).

{quote}
the retry policy should really detect and reject the auth failures as 
non-retryable
{quote}

I think that already happens here. My retry implementation doesn't catch any 
exceptions.  I'd expect the DDB batch write API to throw exception if we hit an 
auth failure, which naturally bypasses the retry logic.

I've been digging through docs (including 
[SDK|http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/AmazonDynamoDB.html#batchWriteItem-com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest]
 and 
[REST|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#BatchOperations]
 doc for batch write) and I think this patch is correct, except, I just learned 
that batch write *may* throw {{ProvisionedThroughputExceededException}} if 
*none* of the items in the batch could be executed.  I could not reproduce this 
despite abusive testing on a 10 I/O unit provisioned table.

{quote}
(a) handle interruptions by interrupting thread again, 
{quote}

I wondered about this.  Currently I do not catch InterruptedException; I let it 
propagate.  Are you saying I should catch it, set the interrupt flag, break out 
of the retry loop, and continue execution?

{quote}
(b) handling any other exception by just returning false to the shouldRetry 
probe. 
{quote}

This Batch API is a bit of a special case.. Instead of just throwing exceptions 
for failures, it appears to propagate non-retryable exceptions, but translates 
retryable ones into BatchWriteItem "unprocessed items".  So this patch just 
slows down resubmittal of the retryable items.

This patch essentially keeps existing exception behavior but just slows down 
batch work resubmittal.  So I think it is an improvement, but we may have to 
add a higher-level retry loop for the 
{{ProvisionedThroughputExceededException}} case.  Why they don't just return 
all items as unprocessed is beyond me.


> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851644#comment-15851644
 ] 

Steve Loughran commented on HADOOP-13904:
-

Yetus is unhappy with it...is it in sync with the branch?

* that fix to line 196 of the pom should go into branch-2...submit a separate 
patch for that and I'll get it in

h2. {{retryBackoff}}
* the retry policy should really detect and reject the auth failures as 
non-retryable. Looking @ the s3a block output stream, we get away with it only 
because you don't get as far as completing a multipart write without having the 
credentials —though I should add the check there too, to failfast on situations 
like session credential expiry during a multiday streaming app.
* Take a look at {{S3aBlockOutputStream.shouldRetry}} for some things to 
consider: (a) handle interruptions by interrupting thread again, and (b) 
handling any other exception by just returning false to the shouldRetry probe. 
Why? it means the caller can fail with whatever exception caused the initial 
problem, which is presumably the most useful.

other than that, LGTM.

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849567#comment-15849567
 ] 

Hadoop QA commented on HADOOP-13904:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 12s{color} 
| {color:red} HADOOP-13904 does not apply to HADOOP-13345. Rebase required? 
Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13904 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12850597/HADOOP-13904-HADOOP-13345.002.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11558/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch, 
> HADOOP-13904-HADOOP-13345.002.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-02-01 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848103#comment-15848103
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

Following up on determining whether or not the DynamoDB SDK (or service) does 
exponential backoff for us when batched writes get throttled.  TL;DR it would 
be good to add a backoff timer here.

I created a table with 10/10 read/write units provisioned.  I ran this scale 
test and added a timer around the batch write retries:

{code}
 StopWatch sw = new StopWatch();
  long retryCount = 0;
  while (unprocessed.size() > 0) {
sw.start();
res = dynamoDB.batchWriteItemUnprocessed(unprocessed);
unprocessed = res.getUnprocessedItems();
sw.stop();
LOG.info("Retry {} took {} msec", retryCount++,
sw.now(TimeUnit.MILLISECONDS));
sw.reset();
  }
{code}

And it looks like we are not getting any backoff from the SDK or service:
{noformat}
 
2017-01-31 23:55:25,534 [JUn... cessBatchWriteRequest(461)) - Retry 0 took 412 
msec
2017-01-31 23:55:25,534 [JUn... cessBatchWriteRequest(461)) - Retry 1 took 374 
msec
2017-01-31 23:55:25,586 ... :processBatchWriteRequest(461)) - Retry 2 took 51 
msec
2017-01-31 23:55:25,643 ... :processBatchWriteRequest(461)) - Retry 3 took 56 
msec
2017-01-31 23:55:26,182 ... :processBatchWriteRequest(461)) - Retry 4 took 538 
msec
2017-01-31 23:55:26,626 ... :processBatchWriteRequest(461)) - Retry 5 took 444 
msec
2017-01-31 23:55:26,672 ... :processBatchWriteRequest(461)) - Retry 6 took 45 
msec
2017-01-31 23:55:27,183 ... :processBatchWriteRequest(461)) - Retry 7 took 511 
msec
2017-01-31 23:55:27,745 ... :processBatchWriteRequest(461)) - Retry 0 took 499 
msec
2017-01-31 23:55:28,130 ... :processBatchWriteRequest(461)) - Retry 1 took 385 
msec
2017-01-31 23:55:28,185 ... :processBatchWriteRequest(461)) - Retry 2 took 54 
msec
2017-01-31 23:55:28,239 ... :processBatchWriteRequest(461)) - Retry 3 took 53 
msec
2017-01-31 23:55:28,627 ... :processBatchWriteRequest(461)) - Retry 4 took 387 
msec
2017-01-31 23:55:28,676 ... :processBatchWriteRequest(461)) - Retry 5 took 49 
msec
{noformat}



> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-01-31 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848052#comment-15848052
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

The time it takes to run these varies depending on scale parameters.  These 
runs were on my laptop over home internet connection:

scale directory count = 2, operation count = 100
  28 seconds
scale directory count = 3, operation count = 100
  142 seconds

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-01-31 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847856#comment-15847856
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

Thank you for the helpful review [~ste...@apache.org]

{quote}
do these run iff `-Dscale` is set? 
{quote}

Yep.. They extend {{S3AScaleTestBase}}.

{quote}
clearMetadataStore(ms, count); may need to go into a finally clause of 
testMoves()
{quote}

Good call.  I'll do the same in testPut() as well.  I'll retest with those 
changes and get you timing details as well.

These could run in parallel if we ensured they had their own DDB table to work 
with.  I think it would be more simple just to mark them as serial for scale 
tests.  I'll add that as well.








> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-01-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830348#comment-15830348
 ] 

Steve Loughran commented on HADOOP-13904:
-

good to see this, and nice to see test first dev at work

# do these run iff `-Dscale` is set? I'd prefer that, as I've tried to split 
the slow stuff from the fast stuff in the s3a tests to date
# If you want a standard `NanoTimer.operationsPerSecond()` value for printing, 
feel free to add it

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-01-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829212#comment-15829212
 ] 

Hadoop QA commented on HADOOP-13904:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  4m 
52s{color} | {color:red} root in HADOOP-13345 failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
21s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
42s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 12m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13904 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12848205/HADOOP-13904-HADOOP-13345.001.patch
 |
| Optional Tests |  asflicense  mvnsite  compile  javac  javadoc  mvninstall  
unit  findbugs  checkstyle  |
| uname | Linux 4767df8557db 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HADOOP-13345 / f10114c |
| Default Java | 1.8.0_111 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11465/artifact/patchprocess/branch-mvninstall-root.txt
 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11465/testReport/ |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11465/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>

[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-01-18 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829172#comment-15829172
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

Also, note that the DynamoDBMetadataStore scale test fails during cleanup.  
This is because the internal path normalization depends on having a 
S3AFileSystem instance available.  However, I used the new 
{{MetadataStore#initialize(Configuration)}} interface, which does not set a 
S3AFileSystem instance in the DynamoDBMetadataStore.  This new initialize() 
logic for DynamoDBMetadataStore seems to be broken, so I will file a JIRA on 
that.

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-01-18 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829036#comment-15829036
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

Taking this JIRA since [~liuml07] is on vacation.  I'll post a patch for my 
MetadataStore scale tests.  Would like to get HADOOP-13589 (maven test 
profiles) in first.

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2017-01-18 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828829#comment-15828829
 ] 

Aaron Fabbri commented on HADOOP-13904:
---

I'm creating a new scale test on top of MetadataStore so we can drive some load 
against the DynamoDBMetadataStore and gain some confidence in our retry/backoff 
stability.  I have a basic put() workload working and am adding one around 
move(). move() makes good use of the DynamoDB batchWriteItem API, which may 
still need to have exponential backoff added in the retry loop in 
DynamoDBMetadataStore.



> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2016-12-19 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762606#comment-15762606
 ] 

Mingliang Liu commented on HADOOP-13904:


>From 
>http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.RetryAndBackoff
{quote}
The AWS SDKs implement automatic retry logic and exponential backoff. Most 
exponential backoff algorithms use jitter (randomized delay) to prevent 
successive collisions. Because you aren't trying to avoid such collisions in 
these cases, you do not need to use this random number. However, if you use 
concurrent clients, jitter can help your requests succeed faster. For more 
information, see the blog post for Exponential Backoff and Jitter.
{quote}
Currently the {{DynamoDBClientFactory}} uses the same max error retry as 
S3ClientFactory, whose default value is 20.

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy

2016-12-14 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749778#comment-15749778
 ] 

Mingliang Liu commented on HADOOP-13904:


In [HADOOP-13899], Steve said that:
{quote}
Presumably mocking is the simplest way to test this
{quote}
Agreed. DynamoDB Local does not throttle read or write activity. We have 
pre-defined DynamoDBClientFactory interface so mocking seems the simple way to 
go.

> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> 
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org