[
https://issues.apache.org/jira/browse/HIVE-26809?focusedWorklogId=841157&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-841157
]
ASF GitHub Bot logged work on HIVE-26809:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 23/Jan/23 15:32
Start Date: 23/Jan/23 15:32
Worklog Time Spent: 10m
Work Description: abstractdog commented on code in PR #3833:
URL: https://github.com/apache/hive/pull/3833#discussion_r1084204246
##########
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java:
##########
@@ -224,7 +237,252 @@ private static void skipCompressedIndex(boolean
isCompressed, PositionProvider i
index.getNext();
}
- protected static class StringStreamReader extends StringTreeReader
+ public static class StringDictionaryTreeReaderHive extends TreeReader {
Review Comment:
I was trying to understand the scenario here and the way I see this: the
current PR code is not the proper one as we end up Hive on ORC 1.8.x but
without an important optimization introduced in ORC-1060, so if we have to copy
some ORC code anyway, let's have ORC-1060 at least here (sometimes I feel we
need to port changes on separate jiras, but here we can merge them together)
I see that the basic confusion comes from the fact that in ORC we have a
common StringTreeReader which encapsulates different kinds of string readers
like StringDirectTreeReader, StringDictionaryTreeReader, but in hive's
StringStreamReader we have dictionary-related properties like
_dictionaryStream, _lengthStream, which is confusing...if we're already
subclassing ORC tree readers, we should follow it like:
```
HIVE -> ORC
StringStreamReader -> StringTreeReader (as it is now)
StringDictionaryStreamReader -> StringDictionaryTreeReader
StringDirectStreamReader -> StringDirectTreeReader
```
this is a change that should be done regardless of ORC 1.8 upgrade in my
opinion, and prior to ORC 1.8 upgrade
once we follow ORC tree class hierarchy, we have a better chance to adapt
changes like ORC-1060, where e.g. only the dictionary reader has been changed
guys, if you agree with this, let's address the above problem in a separate
hive ticket first, let's take the time, it's worth the time, especially if
turns out that the ORC 1.8 upgrade becomes a clearer thing
Issue Time Tracking
-------------------
Worklog Id: (was: 841157)
Time Spent: 5h 50m (was: 5h 40m)
> Upgrade ORC to 1.8.1
> --------------------
>
> Key: HIVE-26809
> URL: https://issues.apache.org/jira/browse/HIVE-26809
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 4.0.0
> Reporter: Dmitriy Fingerman
> Assignee: Dmitriy Fingerman
> Priority: Major
> Labels: pull-request-available
> Time Spent: 5h 50m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)