Thanks for your reply. To clarify, do you mean that the problem is solved already because Hive ACID looks at the '_orc_acid_version' in the directory before assuming a directory is ready to read? Or do you mean that it *could have *looked at the file before deciding to read it and that would have been one way to solve it? Asking because I couldn't find such a check, but I might have very well missed it.
However, the "open a txn for compact & commit it" is definitely neater. I agree, and it seems to be have been done this way in HIVE-20823 <https://issues.apache.org/jira/browse/HIVE-20823>. Thanks, Somani On Mon, Nov 26, 2018 at 1:57 PM Gopal Vijayaraghavan <gop...@apache.org> wrote: > > > Oh but s3Guard will not solve the atomicity problem, right? > > S3Guard does solve the atomicity problem, because compactors don't just > rename directories. > > The basic consistency needed for ACID is - list after delete and list > after create (which S3 does not have). > > They also place a file named '_orc_acid_version' in the directory. > > This happens after rename() returns. > > fs.rename(fileStatus.getPath(), newPath); > AcidUtils.OrcAcidVersion.writeVersionFile(newPath, fs); > > With S3Guard, all that is needed is to check for that file (& if it is > missing it is not a complete compacted dir yet). > > However, the "open a txn for compact & commit it" is definitely neater. > > > So that means that the directory will be "visible while in progress", and > > the reader might pick up the compacted directory even when all files > > haven't been copied. > > In another thread today, I mentioned how ACID is built on top of ignoring > directories, it can do that easily. > > The Parquet or Avro transactional system in Hive boils down to a > PathFilter with some numbers in the path. > > Cheers, > Gopal > > >