Oh but s3Guard will not solve the atomicity problem, right? Reference: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/bk_cloud-data-access/content/ch03s07s01.html
*What S3Guard Cannot Do* *...* *Mimic the "directory rename is a single atomic transaction" behavior of a filesystem like HDFS. Directory renames are still slow and visible while in progress. This means that if the operations fail partway through, the source and destination paths may contain a mix (including some duplicate) copies of data files.* So that means that the directory will be "visible while in progress", and the reader might pick up the compacted directory even when all files haven't been copied. Thanks, Somani On Fri, Nov 9, 2018 at 10:25 PM Gopal Vijayaraghavan <gop...@apache.org> wrote: > > To me it looks like this problem will be solved by > > https://issues.apache.org/jira/browse/HIVE-20823, but until then, is > this > > broken or I have missed a crucial detail? > > Yes, S3Guard. > > > https://www.slideshare.net/hortonworks/s3guard-whats-in-your-consistency-model > > However, that's another daemon you need to run (+ provision DynamoDB etc). > > It is not the most convenient of setups to run on S3. > > Cheers, > Gopal > > > >