[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

Jihong Liu (JIRA) Sat, 06 Dec 2014 20:43:22 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237047#comment-14237047
 ]


Jihong Liu commented on HIVE-8966:
----------------------------------

Solution: 
if the last delta has any file which is in bucket file pattern, but actually is 
non bucket file, don’t compact this delta. When a transaction is not close, a 
delta will have a file like bucket_n_flash_length, which is non bucket file. 
Actually for any reason, if the last delta has a file with bucket file pattern 
but not compactable, we should ignore this delta. Since after compaction, the 
delta will be removed. So if the whole delta cannot be compacted, leave it as 
what it is. So in the above scenario, the second delta will not be compacted. 
And the cleaner will not remove it because it has higher transaction id than 
the new created compaction file(base or delta). 
The reason we only do the above for the last delta is to consider the case that 
two or more transaction batches may be created and the last one is close first. 
Then if the last delta gets compacted, the transaction id in the base will be 
big, so all deltas will be removed by cleaner. So data could be lost. In this 
case, in the list of deltas for compaction, at least one delta has that 
bucket_n_flash_length file inside. Since we do not ignore it, the compaction 
will be auto-fail, so nothing happen, no data lost. In this case, the 
compaction can only be done after all transaction batches are closed. Although 
it is not so good, at least no data lost.
The patch is attached. It adds one method to test whether needs to remove the 
last delta from the delta list. And before process the delta list, run that 
method.  After applying this patch, no data is lost. We can do either major or 
minor compaction meanwhile keeping loading data in the same time.


> Delta files created by hive hcatalog streaming cannot be compacted
> ------------------------------------------------------------------
>
>                 Key: HIVE-8966
>                 URL: https://issues.apache.org/jira/browse/HIVE-8966
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.14.0
>         Environment: hive
>            Reporter: Jihong Liu
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: 0.14.1
>
>         Attachments: HIVE-8966.patch
>
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in 
> each delta directory. Where "n" is the bucket number. But the 
> compactor.CompactorMR think this file also needs to compact. However this 
> file of course cannot be compacted, so compactor.CompactorMR will not 
> continue to do the compaction. 
> Did a test, after removed the bucket_n_flush_length file, then the "alter 
> table partition compact" finished successfully. If don't delete that file, 
> nothing will be compacted. 
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

Reply via email to