[ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775662#comment-16775662
 ] 

Ben Roling edited comment on HADOOP-15625 at 2/22/19 10:42 PM:
---------------------------------------------------------------

[~ste...@apache.org] I've uploaded a new patch.  The patch implements two 
configurations:
* fs.s3.change.detection.source - (eTag or versionId)
* fs.s3.change.detection.mode - (server, client, warn, none)

The default is implemented as eTag, server.

As you'll be able to see, I went ahead and implemented support for either eTag 
or versionId.  The S3AChangeDetectionPolicy class contains subclasses for each.

ITestS3ARemoteFileChanged is a parameterized test that covers all the 
permutations of the configurations.

I have run the tests against an S3 bucket with versioning both disabled and 
enabled.  The versionId mode tests are skipped if the test detects that the 
configured bucket doesn't have versioning enabled.  Also, the code will emit a 
warning message if it detects use of the versionId config on a bucket that 
doesn't have versioning enabled.

I have run all the "regular" tests on hadoop-aws to try to make sure there are 
no regressions.  To be transparent, I haven't run any of the scale tests or any 
other tests that require special configuration.  My failsafe report says there 
were 783 tests with 0 errors, 1 failure, and 211 skipped.  The 1 failure is 
ITestS3AConfiguration.testAutomaticProxyPortSelection(), which seems it has to 
be unrelated, but I will look a little closer.


was (Author: ben.roling):
[~ste...@apache.org] I've uploaded a new patch.  The patch implements two 
configurations:
* fs.s3.change.detection.source - (eTag or versionId)
* fs.s3.change.detection.source.mode - (server, client, warn, none)

The default is implemented as eTag, server.

As you'll be able to see, I went ahead and implemented support for either eTag 
or versionId.  The S3AChangeDetectionPolicy class contains subclasses for each.

ITestS3ARemoteFileChanged is a parameterized test that covers all the 
permutations of the configurations.

I have run the tests against an S3 bucket with versioning both disabled and 
enabled.  The versionId mode tests are skipped if the test detects that the 
configured bucket doesn't have versioning enabled.  Also, the code will emit a 
warning message if it detects use of the versionId config on a bucket that 
doesn't have versioning enabled.

I have run all the "regular" tests on hadoop-aws to try to make sure there are 
no regressions.  To be transparent, I haven't run any of the scale tests or any 
other tests that require special configuration.  My failsafe report says there 
were 783 tests with 0 errors, 1 failure, and 211 skipped.  The 1 failure is 
ITestS3AConfiguration.testAutomaticProxyPortSelection(), which seems it has to 
be unrelated, but I will look a little closer.

> S3A input stream to use etags to detect changed source files
> ------------------------------------------------------------
>
>                 Key: HADOOP-15625
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15625
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Major
>         Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch, HADOOP-15625-004.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to