[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240004#comment-14240004
 ] 

Jihong Liu commented on HIVE-8966:
----------------------------------

I see. Basically there are two solutions. One is that when get the delta list, 
we don't include the current delta if it has open tranaction. So uptate the 
AcidUtil.getAcidState() directly. The other is what I posted here. We first get 
the delta list, then when do compaction, we don't compact the last one if there 
is open transaction. The first solution is better as long as changing 
getAcidState() doesn't affact other existing code, since it is a public static 
method. 
By the way, we should only do that to the current delta (the delta with the 
largest transaction id), not to all deltas which have open transactions. If I 
am correct, the base file will be named based on the largest transaction id in 
the deltas. So if the latest delta is closed, but an early delta has an open 
transaction, we should not do anything. So simply let the compaction fail. 
Otherwise, the base will be named by the last transaction id, and all early 
deltas will be removed. That will cause data lost. This is my understanding, 
please correct me, it it is not correct. Thanks

> Delta files created by hive hcatalog streaming cannot be compacted
> ------------------------------------------------------------------
>
>                 Key: HIVE-8966
>                 URL: https://issues.apache.org/jira/browse/HIVE-8966
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.14.0
>         Environment: hive
>            Reporter: Jihong Liu
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: 0.14.1
>
>         Attachments: HIVE-8966.patch
>
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in 
> each delta directory. Where "n" is the bucket number. But the 
> compactor.CompactorMR think this file also needs to compact. However this 
> file of course cannot be compacted, so compactor.CompactorMR will not 
> continue to do the compaction. 
> Did a test, after removed the bucket_n_flush_length file, then the "alter 
> table partition compact" finished successfully. If don't delete that file, 
> nothing will be compacted. 
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to