[jira] [Created] (HIVE-25915) Query based MINOR compaction fails with NPE if the data is loaded into the ACID table

Jira Mon, 31 Jan 2022 03:44:06 -0800

László Végh created HIVE-25915:
----------------------------------

             Summary: Query based MINOR compaction fails with NPE if the data 
is loaded into the ACID table
                 Key: HIVE-25915
                 URL: https://issues.apache.org/jira/browse/HIVE-25915
             Project: Hive
          Issue Type: Bug
          Components: Hive
            Reporter: László Végh



Steps to reproduce:

Create a table with import:

{{CREATE TABLE temp_acid(id string, value string) CLUSTERED BY(id) INTO 10 
BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true');}}

{{insert into temp_acid values 
('1','one'),('2','two'),('3','three'),('4','four'),('5','five'),('6','six'),('7','seven'),('8','eight'),('9','nine'),('10','ten'),('11','eleven'),('12','twelve'),('13','thirteen'),('14','fourteen'),('15','fifteen'),('16','sixteen'),('17','seventeen'),('18','eighteen'),('19','nineteen'),('20','twenty');}}
{{}}

{{export table temp_acid to '/tmp/temp_acid';}}

{{{}i{}}}{{{}mport table imported from '/tmp/temp_acid';{}}}

If the data is loaded or imported into the table they way it is described 
above, the rows in the ORC file don't contain the ACID metadata. The 
query-based MINOR compaction fails on this kind of table, because when the 
FileSinkOperator tries to read out the bucket metadata from the rows it will 
throw a NPE. But deleting and updating a table like this is possible. So 
somehow the bucketId can be calculated for rows like this.
The non-query based MINOR compaction works fine on a table like this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25915) Query based MINOR compaction fails with NPE if the data is loaded into the ACID table

Reply via email to