Hocheol Park created HIVE-22413:
-----------------------------------
Summary: Avoid dirty read when reading the ACID table while
compaction is running
Key: HIVE-22413
URL: https://issues.apache.org/jira/browse/HIVE-22413
Project: Hive
Issue Type: Bug
Components: Transactions
Reporter: Hocheol Park
There is a problem that dirty read occurs when reading the ACID table while
base or delta directories are being created by the compactor. Especially it is
highly likely to occur in the S3 storage because the “move” logic of S3 is
“copy and delete”, and it takes a long time to copy if the size of files are
large or bucketing count is large.
So here’s the logic to avoid this problem. If “_tmp” prefixed directories are
existed in the partition directory on the process of listing the child
directories when reading the ACID table, compare the names of the directory in
the “_tmp” one and skip it in case of the same. Then it will read the files
before merging, no difference on the results.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)