[ 
https://issues.apache.org/jira/browse/HIVE-29567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-29567:
----------------------------------
    Description: 
*Root Cause   *                                                                 
                                                                                
                                           
                                                                                
                                                                                
                                           
  HIVE-28238 introduced a data loss bug in query-based major compaction. The 
compaction INSERT query reads 0 rows because it treats the empty base directory 
created by the preceding CREATE TABLE step as a valid (committed) base, 
ignoring all delta directories.

Before HIVE-28238, the *buggy* VALID_TXNS_KEY set by AcidCompactionService was 
never actually used for data reads. Each compaction subquery *opened its own 
implicit transaction* through the Driver, and the compaction chain of events 
masked the bug


  was:
Root Cause                                                                      
                                                                                
                                         
                                                                                
                                                                                
                                           
  HIVE-28238 introduced a data loss bug in query-based major compaction. The 
compaction INSERT query reads 0 rows because it treats the empty base       
  directory created by the preceding CREATE TABLE step as a valid (committed) 
base, ignoring all delta directories.

Before HIVE-28238, the buggy VALID_TXNS_KEY set by AcidCompactionService was 
never actually used for data reads. Each compaction subquery opened its own 
implicit transaction through the Driver, and the compaction chain of events 
masked the bug



> Data loss bug in query-based major compaction
> ---------------------------------------------
>
>                 Key: HIVE-29567
>                 URL: https://issues.apache.org/jira/browse/HIVE-29567
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 4.3.0
>            Reporter: Denys Kuzmenko
>            Priority: Major
>
> *Root Cause   *                                                               
>                                                                               
>                                                
>                                                                               
>                                                                               
>                                                
>   HIVE-28238 introduced a data loss bug in query-based major compaction. The 
> compaction INSERT query reads 0 rows because it treats the empty base 
> directory created by the preceding CREATE TABLE step as a valid (committed) 
> base, ignoring all delta directories.
> Before HIVE-28238, the *buggy* VALID_TXNS_KEY set by AcidCompactionService 
> was never actually used for data reads. Each compaction subquery *opened its 
> own implicit transaction* through the Driver, and the compaction chain of 
> events masked the bug



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to