Aaron Fabbri created HADOOP-13761:
-------------------------------------

             Summary: S3Guard: implement retries 
                 Key: HADOOP-13761
                 URL: https://issues.apache.org/jira/browse/HADOOP-13761
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
            Reporter: Aaron Fabbri


Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
retry logic.

In HADOOP-13651, I added TODO comments in most of the places retry loops are 
needed, including:

- open(path).  If MetadataStore reflects recent create/move of file path, but 
we fail to read it from S3, retry.
- delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
file exists, retry.
- rename(src,dest).  If source path is not visible in S3 yet, retry.
- listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
create a separate JIRA for this as it will likely require interface changes 
(i.e. prefix or subtree scan).

We may miss some cases initially and we should do failure injection testing to 
make sure we're covered.  Failure injection tests can be a separate JIRA to 
make this easier to review.

We also need basic configuration parameters around retry policy.  There should 
be a way to specify maximum retry duration, as some applications would prefer 
to receive an error eventually, than waiting indefinitely.  We should also be 
keeping statistics when inconsistency is detected and we enter a retry loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to