[ 
https://issues.apache.org/jira/browse/HADOOP-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466198
 ] 

Mike Smith commented on HADOOP-882:
-----------------------------------

I've been working on a patch to handle these exceptions. There are three major 
exceptions that need to be retried. 

InternalError
RequestTimeout
OperationAborted

InternalError exception has reasonably high rate for PUT request! I have 
finished the patch for Jets3tFileSystem.java which exponentially increases the 
waiting time. But, I've been dealing with a very strange problem. When I get 
the internalError (500) response, and I want to retry the request I keep 
getting RequestTimeout response from S3. This is a client exception and it 
shows that jets3t closes the connection after InternalError Exception! Even 
when I try to restablish the connection I still get the same RequestTimeout 
response. Following is the changed put() method in Jets3tFileSystem.java, I 
have done similar changes for other methods as well, let me know if you see 
something wrong:


private void put(String key, InputStream in, long length) throws IOException {
      int attempts = 0;
      while(true){
          try{
              S3Object object = new S3Object(key);
              object.setDataInputStream(in);
              object.setContentType("binary/octet-stream");
              object.setContentLength(length); 
                            s3Service.putObject(bucket, object);                
  
                  break;
          }  catch (S3ServiceException e) {
                  if(!retry(e,++attempts)){                      
                          if (e.getCause() instanceof IOException) {
                                throw (IOException) e.getCause();
                              }
                              throw new S3Exception(e);  
                  }
          }
      }    
  }

private boolean retry(S3ServiceException e,int attempts){
          
          if(attempts > maxRetry) return false;
          
          // for internal exception (500), retry is allowed       
          if(e.getErrorCode().equals(S3_INTERNAL_ERROR_CODE) ||
                          e.getErrorCode().equals(S3_OPERATION_ABORTED_CODE)){  
          
                  LOG.info("retrying failed s3Service ["+e.getErrorCode()+"]. 
Delay: " +
                                retryDelay*attempts+" msec. Attempts: 
"+attempts);                
                  try{                            
                  Thread.sleep(retryDelay*attempts);
                  } catch(Exception ee){}
                  return true;
          }
          // allows retry for the socket timeout exception.
          // connection needs to be restablished.
          if(e.getErrorCode().equals(S3_REQUEST_TIMEOUT_CODE)){
                  try{
                  AWSCredentials awsCredentials = new AWSCredentials(accessKey, 
secretAccessKey);         
                  this.s3Service = new RestS3Service(awsCredentials);           
  
            } catch (S3ServiceException ee) {
                // this exception will be taken care of later 
            }
            LOG.info("retrying failed s3Service ["+e.getErrorCode()+"]. 
Attempts: "+attempts);
            return true;
          }
          
          // for all other exceptions retrying is not allowed
          // Maybe it would be better to keep retrying for all sorts of 
exceptions!?
          return false;
  }



> S3FileSystem should retry if there is a communication problem with S3
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-882
>                 URL: https://issues.apache.org/jira/browse/HADOOP-882
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>
> File system operations currently fail if there is a communication problem 
> (IOException) with S3. All operations that communicate with S3 should retry a 
> fixed number of times before failing.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to