Denys Kuzmenko created HIVE-29567:
-------------------------------------

             Summary: Data loss bug in query-based major compaction
                 Key: HIVE-29567
                 URL: https://issues.apache.org/jira/browse/HIVE-29567
             Project: Hive
          Issue Type: Bug
            Reporter: Denys Kuzmenko


Root Cause                                                                      
                                                                                
                                         
                                                                                
                                                                                
                                           
  HIVE-28238 introduced a data loss bug in query-based major compaction. The 
compaction INSERT query reads 0 rows because it treats the empty base       
  directory created by the preceding CREATE TABLE step as a valid (committed) 
base, ignoring all delta directories.

Before HIVE-28238, the buggy VALID_TXNS_KEY set by AcidCompactionService was 
never actually used for data reads. Each compaction subquery opened its own 
implicit transaction through the Driver, and the compaction chain of events 
masked the bug




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to