[
https://issues.apache.org/jira/browse/ORC-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun reassigned ORC-1393:
----------------------------------
Assignee: Dmitriy Fingerman
> Wrong length of uncompressed stream causes EOFException when reading
> --------------------------------------------------------------------
>
> Key: ORC-1393
> URL: https://issues.apache.org/jira/browse/ORC-1393
> Project: ORC
> Issue Type: Bug
> Reporter: Dmitriy Fingerman
> Assignee: Dmitriy Fingerman
> Priority: Major
>
> This issue is the root cause of the issue reported in HIVE-27128.
> Before 'ORC-516 - Update InStream for column compression',
> InStream.UncompressedStream class had 'length' field and the length was
> modifiable in reset() method.
> The reset() method was used in SettableUncompressedStream class in
> setBuffers() method:
>
> {code:java}
> public void setBuffers(DiskRangeInfo diskRangeInfo) {
> reset(diskRangeInfo.getDiskRanges(), diskRangeInfo.getTotalLength());
> setOffset(diskRangeInfo.getDiskRanges());
> }{code}
> After Orc version upgrade in Hive to 1.6.7., and since
> SettableUncompressedStream class was removed from Hive, Hive manages it own
> version of SettableUncompressedStream which doesn't pass new length to
> UncompressedStream when calling reset (because UncompressedStream doesn't
> accept new length any more in the reset method):
>
> {code:java}
> public void setBuffers(DiskRangeInfo diskRangeList) {
> reset(diskRangeList.getDiskRanges());
> setOffset(diskRangeList.getDiskRanges());
> } {code}
> When investigating the issue reported in HIVE-27128 and comparing the lengths
> of the InStream.UncompressedStream prior to the upgrade of ORC version in
> Hive to 1.6.7. (which included ORC-516) and after I noticed that the issue
> happens with ORC-516 changes because the length of the
> InStream.UncompressedStream is set once for all row groups, while without
> those changes the length is dynamic and sometimes is set to bigger value than
> the initial value.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)