Dmitriy Fingerman created ORC-1393:
--------------------------------------

             Summary: Too short incorrect length of uncompressed stream causes 
                 Key: ORC-1393
                 URL: https://issues.apache.org/jira/browse/ORC-1393
             Project: ORC
          Issue Type: Bug
            Reporter: Dmitriy Fingerman


This issue is the root cause of the issue reported in HIVE-27128.

Before 'ORC-516 - Update InStream for column compression', 
InStream.UncompressedStream class had 'length' field and the length was 
modifiable in reset() method. 

The reset() method was used in SettableUncompressedStream class in

setBuffers() method:

 
{code:java}
public void setBuffers(DiskRangeInfo diskRangeInfo) {
  reset(diskRangeInfo.getDiskRanges(), diskRangeInfo.getTotalLength());
  setOffset(diskRangeInfo.getDiskRanges());
}{code}
After Orc version upgrade in Hive to 1.6.7., and since 
SettableUncompressedStream class was removed from Hive, Hive manages it own 
version of SettableUncompressedStream which doesn't pass new length to 
UncompressedStream when calling reset:

 

 
{code:java}
public void setBuffers(DiskRangeInfo diskRangeList) {
  reset(diskRangeList.getDiskRanges());
  setOffset(diskRangeList.getDiskRanges());
} {code}
When investigating the issue reported in HIVE-27128 and comparing the lengths 
of the InStream.UncompressedStream prior to the upgrade of ORC version in Hive 
to 1.6.7. (which included ORC-516) and after I noticed that the issue happens 
with ORC-516 changes because the length of the InStream.UncompressedStream is 
set once for all row groups, while without those changes the length is dynamic 
and sometimes is set to bigger value than the initial value.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to