[ https://issues.apache.org/jira/browse/HADOOP-16184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831574#comment-16831574 ]
Gabor Bota edited comment on HADOOP-16184 at 5/2/19 12:11 PM: -------------------------------------------------------------- Created a gist test for tombstone expiry: [https://gist.github.com/bgaborg/8fbb8daa4d28cdc0ad86f377a3007e4b] {noformat} Seq: create guarded; delete guarded; create raw (same path); open and read guarded;{noformat} This sequence won't be included in this issue. It will be included in HADOOP-16279 which aims for implement expiry for all metadata items. was (Author: gabor.bota): Created a gist test for tombstone expiry: [https://gist.github.com/bgaborg/8fbb8daa4d28cdc0ad86f377a3007e4b] {noformat} Seq: create guarded; delete guarded; create raw (same path); read guarded;{noformat} This sequence won't be included in this issue. It will be included in HADOOP-16279 which aims for implement expiry for all metadata items. > S3Guard: Handle OOB deletions and creation of a file which has a tombstone > marker > --------------------------------------------------------------------------------- > > Key: HADOOP-16184 > URL: https://issues.apache.org/jira/browse/HADOOP-16184 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.1.0 > Reporter: Gabor Bota > Assignee: Gabor Bota > Priority: Major > > When a file is deleted in S3 using S3Guard a tombstone marker will be added > for that file in the MetadataStore. If another process creates the file > without using S3Guard (as an out of band operation - OOB) the file still not > be visible for the client using S3Guard because of the deletion tombstone. > ---- > The whole of S3Guard is potentially brittle to > * OOB deletions: we skip it in HADOOP-15999, so no worse, but because the > S3AInputStream retries on FNFE, so as to "debounce" cached 404s, it's > potentially going to retry forever. > * OOB creation of a file which has a deletion tombstone marker. > The things this issue covers: > * Write a test to simulate that deletion problem, to see what happens. We > ought to have the S3AInputStream retry briefly on that initial GET failing, > but only on that initial one. (after setting "fs.s3a.retry.limit" to > something low & the interval down to 10ms or so to fail fast) > * Sequences > {noformat} > 1. create; delete; open; read -> fail after retry > 2. create; open, read, delete, read -> fail fast on the second read > {noformat} > The StoreStatistics of the filesystem's IGNORED_ERRORS stat will be increased > on the ignored error, so on sequence 1 will have increased, whereas on > sequence 2 it will not have. If either of these tests don't quite fail as > expected, we can disable the tests and continue, at least now with some tests > to simulate a condition we don't have a fix for. > * For both, we just need to have some model of how long it takes for > debouncing to stabilize. Then in this new check, if an FNFE is raised and the > check is happening > (modtime+ debounce-delay) then it's a real FNFE. > This issue is created based on [~ste...@apache.org] remarks and comments on > HADOOP-15999. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org