Alessandro Solimando created HIVE-26147:
-------------------------------------------
Summary: OrcRawRecordMerger throws NPE when hive.acid.key.index is
missing for an acid file
Key: HIVE-26147
URL: https://issues.apache.org/jira/browse/HIVE-26147
Project: Hive
Issue Type: Bug
Components: ORC, Transactions
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando
When _hive.acid.key.index_ is missing for an acid ORC file _OrcRawRecordMerger_
throws as follows:
{noformat}
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795)
~[hive-exec-4.0.0-alpha-2-SNAPS
HOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:1053)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.
0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a
lpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4
.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769)
~[hive-exec-4.0.0-alpha
-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-
alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-
SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA
PSHOT]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
at
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489)
~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:
4.0.0-alpha-2-SNAPSHOT]
... 24 more
{noformat}
For this situation to happen, the ORC file must have more than one stripe, and
the offset of the element to seek should be locate it beyond the first stripe
but before the tail one, as the code clearly suggests:
{code:java}
if (firstStripe != 0) {
minKey = keyIndex[firstStripe - 1];
}
if (!isTail) {
maxKey = keyIndex[firstStripe + stripeCount - 1];
}
{code}
However, in the context of the detection of the original issue, the NPE was
triggered even by a simple "select *" over a table with ORC files missing the
_hive.acid.key.index_ metadata information, but it was never failing for ORC
files with a single stripe. The file was generated after a major compaction of
acid and non-acid data.
In order to force an offset located in a stripe in the middle, one can use the
following query, knowing in what stripe a particular value exists:
{code:sql}
select * from $table where c = $value
{code}
_OrcRawRecordMerger_ should simply leave as "null" the min and max keys when
the _hive.acid.key.index_ metadata is missing.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)