[ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778676#comment-16778676
 ] 

Ben Roling edited comment on HADOOP-15625 at 2/26/19 11:13 PM:
---------------------------------------------------------------

I got wrapped up in some other things so didn't make quite as much progress on 
the TODO list as I would have liked, but I did upload a new patch with some 
progress:

* added fs.s3.change.detection.versionrequired config
* fixed failure to throw RemoteFileChangedException on multiple reads
* added config property documentation to index.md

The fix to throw RemoteFileChangedException on multiple reads currently means 
the 'warn' setting would result in potentially lots of warnings.  An 
S3AInputStream that detects a change would log a warn on every subsequent 
read() call, which would be noisy, at least within the job or task reading that 
file.  It probably does need to be revisited.

The documentation needs more work and I didn't get all the line length and 
javadoc style issues sorted.

I also didn't address core-site.xml.  To be clear there, you're talking about 
the src/test/resources/core-site.xml, right?

I'll probably have to get back at more of this tomorrow.

My branch is here if you're interested:
https://github.com/ben-roling/hadoop/tree/HADOOP-15625-stevel


was (Author: ben.roling):
I got wrapped up in some other things so didn't make quite as much progress on 
the TODO list as I would have liked, but I did upload a new patch with some 
progress:

* added fs.s3.change.detection.versionrequired config
* fixed failure to throw RemoteFileChangedException on multiple reads
* added config property documentation to index.md

The fix to throw RemoteFileChangedException on multiple reads currently means 
the 'warn' setting would result in potentially lots of warnings.  An 
S3AInputStream that detects a change would log a warn on every subsequent 
read() call, which would be noisy, at least within the job or task reading that 
file.  It probably does need to be revisited.

The documentation needs more work and I didn't get all the line length and 
javadoc style issues sorted.

I also didn't address core-site.xml.  To be clear there, you're talking about 
the src/test/resources/core-site.xml, right?

I'll probably have to get back at more of this tomorrow.

> S3A input stream to use etags to detect changed source files
> ------------------------------------------------------------
>
>                 Key: HADOOP-15625
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15625
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Major
>         Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to