Aaron Fabbri created HADOOP-15780:
-------------------------------------

             Summary: S3Guard: document how to deal with non-S3Guard processes 
writing data to S3Guarded buckets
                 Key: HADOOP-15780
                 URL: https://issues.apache.org/jira/browse/HADOOP-15780
             Project: Hadoop Common
          Issue Type: Sub-task
    Affects Versions: 3.2.0
            Reporter: Aaron Fabbri


Our general policy for S3Guard is this: All modifiers of a bucket that is 
configured for use with S3Guard, must use S3Guard. Otherwise, the MetadataStore 
will not be properly updated as the S3 bucket changes and problems will arise.

There are limited circumstances in which may be safe to have an external 
(non-s3guard) process writing data.  There are also scenarios where it 
definitely breaks things.

I think we should start by documenting the cases that this works / does not 
work for. After we've enumerated that, we can suggest enhancements as needed to 
make this sort of configuration easier to use.

To get the ball rolling, some things that do not work:
- Deleting a path *p* with S3Guard, then writing a new file at path *p* without 
S3guard (will still have delete marker in S3Guard, making the file appear to be 
deleted but still visible in S3 due to false "eventual consistency") (as 
[~ste...@apache.org] and I have discussed)
- When fs.s3a.metadatastore.authoritative is true, adding files to directories 
without S3Guard, then listing with S3Guard may exclude externally-written files 
from listings.

(Note, there are also S3A interop issues with other non-S3A clients even 
without S3Guard, due to the unique way S3A interprets empty directory markers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to