[ https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356048#comment-16356048 ]
Steve Loughran commented on HADOOP-13761: ----------------------------------------- I've not seen that exception since; it was on ASF trunk/, so unless we've updated the SDK since then (we have, haven't we?), I don't know what's up h3. DynamoDBMetadataStore L693: leave @ debug {{DynamoDBMetadataStore.updateParameters()}} should which to {{provisionTableBlocking()}}, so the CLI tool will not complete until the provisioning has happened. This will improve its ability to be used in scripts & tests. h3. ITestDynamoDBMetadataStoreScale * I like the fact the DB gets shrunk back after. Currently the CLI tests slowly leak capacity, even though it should, on my reading, clean up * {{pathOfDepth}} has the base path "/scaleTestBWEP" . I think it should be getClass.getShortName() This is going to be fun on a shared test run # need to make sure that is not run in parallel # need to doc that this is test must be run on a private DDB instance, not one shared across other buckets. Don't want other teams getting upset because their tests on a different bucket are failing. For HADOOP-14918 I've been wondering if we should have tests explicitly declare a "test DDB table"; {{ITestS3GuardToolDynamoDB}} hard codes to "testDynamoDBInitDestroy", which relies on at most one user in a shared AWS a/c from running the test at the same time. This is one which can be created on demand, destroyed afterwards, so the test suite can do what it wants. This would line up for future tests of things like upgrades, TTL checking, etc. This scale test could share the same config option. > S3Guard: implement retries for DDB failures and throttling; translate > exceptions > -------------------------------------------------------------------------------- > > Key: HADOOP-13761 > URL: https://issues.apache.org/jira/browse/HADOOP-13761 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.0.0-beta1 > Reporter: Aaron Fabbri > Assignee: Aaron Fabbri > Priority: Blocker > Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch > > > Following the S3AFileSystem integration patch in HADOOP-13651, we need to add > retry logic. > In HADOOP-13651, I added TODO comments in most of the places retry loops are > needed, including: > - open(path). If MetadataStore reflects recent create/move of file path, but > we fail to read it from S3, retry. > - delete(path). If deleteObject() on S3 fails, but MetadataStore shows the > file exists, retry. > - rename(src,dest). If source path is not visible in S3 yet, retry. > - listFiles(). Skip for now. Not currently implemented in S3Guard. I will > create a separate JIRA for this as it will likely require interface changes > (i.e. prefix or subtree scan). > We may miss some cases initially and we should do failure injection testing > to make sure we're covered. Failure injection tests can be a separate JIRA > to make this easier to review. > We also need basic configuration parameters around retry policy. There > should be a way to specify maximum retry duration, as some applications would > prefer to receive an error eventually, than waiting indefinitely. We should > also be keeping statistics when inconsistency is detected and we enter a > retry loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org