[ 
https://issues.apache.org/jira/browse/ORC-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned ORC-1393:
----------------------------------

    Assignee: Dmitriy Fingerman

> Wrong length of uncompressed stream causes EOFException when reading
> --------------------------------------------------------------------
>
>                 Key: ORC-1393
>                 URL: https://issues.apache.org/jira/browse/ORC-1393
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Dmitriy Fingerman
>            Assignee: Dmitriy Fingerman
>            Priority: Major
>
> This issue is the root cause of the issue reported in HIVE-27128.
> Before 'ORC-516 - Update InStream for column compression', 
> InStream.UncompressedStream class had 'length' field and the length was 
> modifiable in reset() method. 
> The reset() method was used in SettableUncompressedStream class in 
> setBuffers() method:
>  
> {code:java}
> public void setBuffers(DiskRangeInfo diskRangeInfo) {
>   reset(diskRangeInfo.getDiskRanges(), diskRangeInfo.getTotalLength());
>   setOffset(diskRangeInfo.getDiskRanges());
> }{code}
> After Orc version upgrade in Hive to 1.6.7., and since 
> SettableUncompressedStream class was removed from Hive, Hive manages it own 
> version of SettableUncompressedStream which doesn't pass new length to 
> UncompressedStream when calling reset (because UncompressedStream doesn't 
> accept new length any more in the reset method):
>  
> {code:java}
> public void setBuffers(DiskRangeInfo diskRangeList) {
>   reset(diskRangeList.getDiskRanges());
>   setOffset(diskRangeList.getDiskRanges());
> } {code}
> When investigating the issue reported in HIVE-27128 and comparing the lengths 
> of the InStream.UncompressedStream prior to the upgrade of ORC version in 
> Hive to 1.6.7. (which included ORC-516) and after I noticed that the issue 
> happens with ORC-516 changes because the length of the 
> InStream.UncompressedStream is set once for all row groups, while without 
> those changes the length is dynamic and sometimes is set to bigger value than 
> the initial value.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to