[ 
https://issues.apache.org/jira/browse/HIVE-22413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962817#comment-16962817
 ] 

Abhishek Somani commented on HIVE-22413:
----------------------------------------

[~pvary] an issue with HIVE-20823 is that it is in 4.0.0(master) only. 
Backporting it to Hive 2/Hive 3 is not feasible as it is a major design change. 
I think we need an interim solution for S3/other blobstores in older Hive 
versions. 

We solved this in a different way ourselves. At the end of compaction, we 
insert a \_compaction_done file in the compacted directory, and the readers 
have been modified (in getAcidState()) to ignore base/delta directories till 
this file is visible. 

> Avoid dirty read when reading the ACID table while compaction is running
> ------------------------------------------------------------------------
>
>                 Key: HIVE-22413
>                 URL: https://issues.apache.org/jira/browse/HIVE-22413
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Hocheol Park
>            Priority: Major
>         Attachments: HIVE-22413.1.patch
>
>
> There is a problem that dirty read occurs when reading the ACID table while 
> base or delta directories are being created by the compactor. Especially it 
> is highly likely to occur in the S3 storage because the “move” logic of S3 is 
> “copy and delete”, and it takes a long time to copy if the size of files are 
> large or bucketing count is large.
> So here’s the logic to avoid this problem. If “_tmp” prefixed directories are 
> existed in the partition directory on the process of listing the child 
> directories when reading the ACID table, compare the names of the directory 
> in the “_tmp” one and skip it in case of the same. Then it will read the 
> files before merging, no difference on the results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to