Alessandro Solimando created HIVE-26147: -------------------------------------------
Summary: OrcRawRecordMerger throws NPE when hive.acid.key.index is missing for an acid file Key: HIVE-26147 URL: https://issues.apache.org/jira/browse/HIVE-26147 Project: Hive Issue Type: Bug Components: ORC, Transactions Affects Versions: 4.0.0-alpha-2 Reporter: Alessandro Solimando Assignee: Alessandro Solimando When _hive.acid.key.index_ is missing for an acid ORC file _OrcRawRecordMerger_ throws as follows: {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795) ~[hive-exec-4.0.0-alpha-2-SNAPS HOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:1053) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4. 0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a lpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4 .0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769) ~[hive-exec-4.0.0-alpha -2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0- alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha -2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2- SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA PSHOT] at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha -2-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489) ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar: 4.0.0-alpha-2-SNAPSHOT] ... 24 more {noformat} For this situation to happen, the ORC file must have more than one stripe, and the offset of the element to seek should be locate it beyond the first stripe but before the tail one, as the code clearly suggests: {code:java} if (firstStripe != 0) { minKey = keyIndex[firstStripe - 1]; } if (!isTail) { maxKey = keyIndex[firstStripe + stripeCount - 1]; } {code} However, in the context of the detection of the original issue, the NPE was triggered even by a simple "select *" over a table with ORC files missing the _hive.acid.key.index_ metadata information, but it was never failing for ORC files with a single stripe. The file was generated after a major compaction of acid and non-acid data. In order to force an offset located in a stripe in the middle, one can use the following query, knowing in what stripe a particular value exists: {code:sql} select * from $table where c = $value {code} _OrcRawRecordMerger_ should simply leave as "null" the min and max keys when the _hive.acid.key.index_ metadata is missing. -- This message was sent by Atlassian Jira (v8.20.1#820001)