[
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240004#comment-14240004
]
Jihong Liu commented on HIVE-8966:
----------------------------------
I see. Basically there are two solutions. One is that when get the delta list,
we don't include the current delta if it has open tranaction. So uptate the
AcidUtil.getAcidState() directly. The other is what I posted here. We first get
the delta list, then when do compaction, we don't compact the last one if there
is open transaction. The first solution is better as long as changing
getAcidState() doesn't affact other existing code, since it is a public static
method.
By the way, we should only do that to the current delta (the delta with the
largest transaction id), not to all deltas which have open transactions. If I
am correct, the base file will be named based on the largest transaction id in
the deltas. So if the latest delta is closed, but an early delta has an open
transaction, we should not do anything. So simply let the compaction fail.
Otherwise, the base will be named by the last transaction id, and all early
deltas will be removed. That will cause data lost. This is my understanding,
please correct me, it it is not correct. Thanks
> Delta files created by hive hcatalog streaming cannot be compacted
> ------------------------------------------------------------------
>
> Key: HIVE-8966
> URL: https://issues.apache.org/jira/browse/HIVE-8966
> Project: Hive
> Issue Type: Bug
> Components: HCatalog
> Affects Versions: 0.14.0
> Environment: hive
> Reporter: Jihong Liu
> Assignee: Alan Gates
> Priority: Critical
> Fix For: 0.14.1
>
> Attachments: HIVE-8966.patch
>
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in
> each delta directory. Where "n" is the bucket number. But the
> compactor.CompactorMR think this file also needs to compact. However this
> file of course cannot be compacted, so compactor.CompactorMR will not
> continue to do the compaction.
> Did a test, after removed the bucket_n_flush_length file, then the "alter
> table partition compact" finished successfully. If don't delete that file,
> nothing will be compacted.
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)