Aaron Fabbri created HADOOP-15780: ------------------------------------- Summary: S3Guard: document how to deal with non-S3Guard processes writing data to S3Guarded buckets Key: HADOOP-15780 URL: https://issues.apache.org/jira/browse/HADOOP-15780 Project: Hadoop Common Issue Type: Sub-task Affects Versions: 3.2.0 Reporter: Aaron Fabbri
Our general policy for S3Guard is this: All modifiers of a bucket that is configured for use with S3Guard, must use S3Guard. Otherwise, the MetadataStore will not be properly updated as the S3 bucket changes and problems will arise. There are limited circumstances in which may be safe to have an external (non-s3guard) process writing data. There are also scenarios where it definitely breaks things. I think we should start by documenting the cases that this works / does not work for. After we've enumerated that, we can suggest enhancements as needed to make this sort of configuration easier to use. To get the ball rolling, some things that do not work: - Deleting a path *p* with S3Guard, then writing a new file at path *p* without S3guard (will still have delete marker in S3Guard, making the file appear to be deleted but still visible in S3 due to false "eventual consistency") (as [~ste...@apache.org] and I have discussed) - When fs.s3a.metadatastore.authoritative is true, adding files to directories without S3Guard, then listing with S3Guard may exclude externally-written files from listings. (Note, there are also S3A interop issues with other non-S3A clients even without S3Guard, due to the unique way S3A interprets empty directory markers). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org