Alessandro Solimando created HIVE-26150:
-------------------------------------------

             Summary: OrcRawRecordMerger reads each row twice
                 Key: HIVE-26150
                 URL: https://issues.apache.org/jira/browse/HIVE-26150
             Project: Hive
          Issue Type: Bug
          Components: ORC, Transactions
    Affects Versions: 4.0.0-alpha-2
            Reporter: Alessandro Solimando


OrcRawRecordMerger reads each row twice, the issue does not surface since the 
merger is only used with the parameter "collapseEvents" as true, which filters 
out one of the two rows.

collapseEvents true and false should produce the same result, since in current 
acid implementation, each event has a distinct rowid, so two identical rows 
cannot be there, this is the case only for the bug.

In order to reproduce the issue, it is sufficient to set the second parameter 
to false 
[here|https://github.com/apache/hive/blob/61d4ff2be48b20df9fd24692c372ee9c2606babe/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2103-L2106],
 and run tests in TestOrcRawRecordMerger and observe two tests failing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to