[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240004#comment-14240004 ]
Jihong Liu commented on HIVE-8966: ---------------------------------- I see. Basically there are two solutions. One is that when get the delta list, we don't include the current delta if it has open tranaction. So uptate the AcidUtil.getAcidState() directly. The other is what I posted here. We first get the delta list, then when do compaction, we don't compact the last one if there is open transaction. The first solution is better as long as changing getAcidState() doesn't affact other existing code, since it is a public static method. By the way, we should only do that to the current delta (the delta with the largest transaction id), not to all deltas which have open transactions. If I am correct, the base file will be named based on the largest transaction id in the deltas. So if the latest delta is closed, but an early delta has an open transaction, we should not do anything. So simply let the compaction fail. Otherwise, the base will be named by the last transaction id, and all early deltas will be removed. That will cause data lost. This is my understanding, please correct me, it it is not correct. Thanks > Delta files created by hive hcatalog streaming cannot be compacted > ------------------------------------------------------------------ > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog > Affects Versions: 0.14.0 > Environment: hive > Reporter: Jihong Liu > Assignee: Alan Gates > Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)