[
https://issues.apache.org/jira/browse/HADOOP-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466198
]
Mike Smith commented on HADOOP-882:
-----------------------------------
I've been working on a patch to handle these exceptions. There are three major
exceptions that need to be retried.
InternalError
RequestTimeout
OperationAborted
InternalError exception has reasonably high rate for PUT request! I have
finished the patch for Jets3tFileSystem.java which exponentially increases the
waiting time. But, I've been dealing with a very strange problem. When I get
the internalError (500) response, and I want to retry the request I keep
getting RequestTimeout response from S3. This is a client exception and it
shows that jets3t closes the connection after InternalError Exception! Even
when I try to restablish the connection I still get the same RequestTimeout
response. Following is the changed put() method in Jets3tFileSystem.java, I
have done similar changes for other methods as well, let me know if you see
something wrong:
private void put(String key, InputStream in, long length) throws IOException {
int attempts = 0;
while(true){
try{
S3Object object = new S3Object(key);
object.setDataInputStream(in);
object.setContentType("binary/octet-stream");
object.setContentLength(length);
s3Service.putObject(bucket, object);
break;
} catch (S3ServiceException e) {
if(!retry(e,++attempts)){
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
}
}
}
private boolean retry(S3ServiceException e,int attempts){
if(attempts > maxRetry) return false;
// for internal exception (500), retry is allowed
if(e.getErrorCode().equals(S3_INTERNAL_ERROR_CODE) ||
e.getErrorCode().equals(S3_OPERATION_ABORTED_CODE)){
LOG.info("retrying failed s3Service ["+e.getErrorCode()+"].
Delay: " +
retryDelay*attempts+" msec. Attempts:
"+attempts);
try{
Thread.sleep(retryDelay*attempts);
} catch(Exception ee){}
return true;
}
// allows retry for the socket timeout exception.
// connection needs to be restablished.
if(e.getErrorCode().equals(S3_REQUEST_TIMEOUT_CODE)){
try{
AWSCredentials awsCredentials = new AWSCredentials(accessKey,
secretAccessKey);
this.s3Service = new RestS3Service(awsCredentials);
} catch (S3ServiceException ee) {
// this exception will be taken care of later
}
LOG.info("retrying failed s3Service ["+e.getErrorCode()+"].
Attempts: "+attempts);
return true;
}
// for all other exceptions retrying is not allowed
// Maybe it would be better to keep retrying for all sorts of
exceptions!?
return false;
}
> S3FileSystem should retry if there is a communication problem with S3
> ---------------------------------------------------------------------
>
> Key: HADOOP-882
> URL: https://issues.apache.org/jira/browse/HADOOP-882
> Project: Hadoop
> Issue Type: Improvement
> Components: fs
> Affects Versions: 0.10.1
> Reporter: Tom White
> Assigned To: Tom White
>
> File system operations currently fail if there is a communication problem
> (IOException) with S3. All operations that communicate with S3 should retry a
> fixed number of times before failing.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira