[ https://issues.apache.org/jira/browse/HADOOP-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-14381: ------------------------------------ Parent Issue: HADOOP-14531 (was: HADOOP-13204) > S3AUtils.translateException to map 503 reponse to => throttling failure > ----------------------------------------------------------------------- > > Key: HADOOP-14381 > URL: https://issues.apache.org/jira/browse/HADOOP-14381 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.8.0 > Reporter: Steve Loughran > > When AWS S3 returns "503", it means that the overall set of requests on a > part of an S3 bucket exceeds the permitted limit; the client(s) need to > throttle back or away for some rebalancing to complete. > The aws SDK retries 3 times on a 503, but then throws it up. Our code doesn't > do anything with that other than create a generic {{AWSS3IOException}}. > Proposed > * add a new exception, {{AWSOverloadedException}} > * raise it on a 503 from S3 (& for s3guard, on DDB complaints) > * have it include a link to a wiki page on the topic, as well as the path > * and any other diags > Code talking to S3 may then be able to catch this and choose to react. Some > retry with exponential backoff is the obvious option. Failing, well, that > could trigger task reattempts at that part of the query, then job retry > —which will again fail, *unless the number of tasks run in parallel is > reduced* > As this throttling is across all clients talking to the same part of a > bucket, fixing it is potentially a high level option. We can at least start > by reporting things better -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org