Dmitriy Fingerman created ORC-1393:
--------------------------------------
Summary: Too short incorrect length of uncompressed stream causes
Key: ORC-1393
URL: https://issues.apache.org/jira/browse/ORC-1393
Project: ORC
Issue Type: Bug
Reporter: Dmitriy Fingerman
This issue is the root cause of the issue reported in HIVE-27128.
Before 'ORC-516 - Update InStream for column compression',
InStream.UncompressedStream class had 'length' field and the length was
modifiable in reset() method.
The reset() method was used in SettableUncompressedStream class in
setBuffers() method:
{code:java}
public void setBuffers(DiskRangeInfo diskRangeInfo) {
reset(diskRangeInfo.getDiskRanges(), diskRangeInfo.getTotalLength());
setOffset(diskRangeInfo.getDiskRanges());
}{code}
After Orc version upgrade in Hive to 1.6.7., and since
SettableUncompressedStream class was removed from Hive, Hive manages it own
version of SettableUncompressedStream which doesn't pass new length to
UncompressedStream when calling reset:
{code:java}
public void setBuffers(DiskRangeInfo diskRangeList) {
reset(diskRangeList.getDiskRanges());
setOffset(diskRangeList.getDiskRanges());
} {code}
When investigating the issue reported in HIVE-27128 and comparing the lengths
of the InStream.UncompressedStream prior to the upgrade of ORC version in Hive
to 1.6.7. (which included ORC-516) and after I noticed that the issue happens
with ORC-516 changes because the length of the InStream.UncompressedStream is
set once for all row groups, while without those changes the length is dynamic
and sometimes is set to bigger value than the initial value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)