[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

Steve Loughran (JIRA) Wed, 07 Feb 2018 12:47:58 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356048#comment-16356048
 ]


Steve Loughran commented on HADOOP-13761:
-----------------------------------------

I've not seen that exception since; it was on ASF trunk/, so unless we've 
updated the SDK since then (we have, haven't we?), I don't know what's up

h3. DynamoDBMetadataStore

L693: leave @ debug


{{DynamoDBMetadataStore.updateParameters()}} should which to 
{{provisionTableBlocking()}}, so the CLI tool will not complete until the 
provisioning has happened. This will improve its ability to be used in scripts 
& tests.

h3. ITestDynamoDBMetadataStoreScale


* I like the fact the DB gets shrunk back after. Currently the CLI tests slowly 
leak capacity, even though it should, on my reading, clean up
* {{pathOfDepth}} has the base path "/scaleTestBWEP" . I think it should be 
getClass.getShortName()

This is going to be fun on a shared test run

# need to make sure that is not run in parallel
# need to doc that this is test must be run on a private DDB instance, not one 
shared across other buckets. Don't want other teams getting upset because their 
tests on a different bucket are failing.

For HADOOP-14918 I've been wondering if we should have tests explicitly declare 
a "test DDB table"; {{ITestS3GuardToolDynamoDB}} hard codes to 
"testDynamoDBInitDestroy", which relies on at most one user in a shared AWS a/c 
from running the test at the same time. This is one which can be created on 
demand, destroyed afterwards, so the test suite can do what it wants. This 
would line up for future tests of things like upgrades, TTL checking, etc. This 
scale test could share the same config option.



> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-13761
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13761
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>            Priority: Blocker
>         Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

Reply via email to