Thanks for your reply.

To clarify, do you mean that the problem is solved already because Hive
ACID looks at the  '_orc_acid_version' in the directory before assuming a
directory is ready to read? Or do you mean that it *could have *looked at
the file before deciding to read it and that would have been one way to
solve it? Asking because I couldn't find such a check, but I might have
very well missed it.

However, the "open a txn for compact & commit it" is definitely neater.

I agree, and it seems to be have been done this way in HIVE-20823
<https://issues.apache.org/jira/browse/HIVE-20823>.

Thanks,
Somani

On Mon, Nov 26, 2018 at 1:57 PM Gopal Vijayaraghavan <gop...@apache.org>
wrote:

>
> >    Oh but s3Guard will not solve the atomicity problem, right?
>
> S3Guard does solve the atomicity problem, because compactors don't just
> rename directories.
>
> The basic consistency needed for ACID is - list after delete and list
> after create (which S3 does not have).
>
> They also place a file named '_orc_acid_version' in the directory.
>
> This happens after rename() returns.
>
>         fs.rename(fileStatus.getPath(), newPath);
>         AcidUtils.OrcAcidVersion.writeVersionFile(newPath, fs);
>
> With S3Guard, all that is needed is to check for that file (& if it is
> missing it is not a complete compacted dir yet).
>
> However, the "open a txn for compact & commit it" is definitely neater.
>
> > So that means that the directory will be "visible while in progress", and
> >  the reader might pick up the compacted directory even when all files
> > haven't been copied.
>
> In another thread today, I mentioned how ACID is built on top of ignoring
> directories, it can do that easily.
>
> The Parquet or Avro transactional system in Hive boils down to a
> PathFilter with some numbers in the path.
>
> Cheers,
> Gopal
>
>
>

Reply via email to