[ 
https://issues.apache.org/jira/browse/HIVE-24481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24481:
----------------------------------
    Labels: Compaction pull-request-available  (was: Compaction)

> Skipped compaction can cause data corruption with streaming
> -----------------------------------------------------------
>
>                 Key: HIVE-24481
>                 URL: https://issues.apache.org/jira/browse/HIVE-24481
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Peter Varga
>            Assignee: Peter Varga
>            Priority: Major
>              Labels: Compaction, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Timeline:
> 1. create a partitioned table, add one static partition
> 2. transaction 1 writes delta_1, and aborts
> 3. create streaming connection, with batch 3, withStaticPartitionValues with 
> the existing partition
> 4. beginTransaction, write, commitTransaction
> 5. beginTransaction, write, abortTransaction
> 6. beingTransaction, write, commitTransaction
> 7. close connection, count of the table is 2
> 8. run manual minor compaction on the partition. it will skip compaction, 
> because deltacount =1 but clean, because there is aborted txn1
> 9. cleaner will remove both aborted record from txn_components
> 10. wait for acidhousekeeper to remove empty aborted txns
> 11. select * from table return *3* records, reading the aborted record



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to