abstractdog commented on code in PR #3833:
URL: https://github.com/apache/hive/pull/3833#discussion_r1084204246
##########
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java:
##########
@@ -224,7 +237,252 @@ private static void skipCompressedIndex(boolean
isCompressed, PositionProvider i
index.getNext();
}
- protected static class StringStreamReader extends StringTreeReader
+ public static class StringDictionaryTreeReaderHive extends TreeReader {
Review Comment:
I was trying to understand the scenario here and the way I see this: the
current PR code is not the proper one as we end up Hive on ORC 1.8.x but
without an important optimization introduced in ORC-1060, so if we have to copy
some ORC code anyway, let's have ORC-1060 at least here (sometimes I feel we
need to port changes on separate jiras, but here we can merge them together)
I see that the basic confusion comes from the fact that in ORC we have a
common StringTreeReader which encapsulates different kinds of string readers
like StringDirectTreeReader, StringDictionaryTreeReader, but in hive's
StringStreamReader we have dictionary-related properties like
_dictionaryStream, _lengthStream, which is confusing...if we're already
subclassing ORC tree readers, we should follow it like:
```
HIVE -> ORC
StringStreamReader -> StringTreeReader (as it is now)
StringDictionaryStreamReader -> StringDictionaryTreeReader
StringDirectStreamReader -> StringDirectTreeReader
```
this is a change that should be done regardless of ORC 1.8 upgrade in my
opinion, and prior to ORC 1.8 upgrade
once we follow ORC tree class hierarchy, we have a better chance to adapt
changes like ORC-1060, where e.g. only the dictionary reader has been changed
guys, if you agree with this, let's address the above problem in a separate
hive ticket first, it's worth spending the time on it, especially if turns out
that the ORC 1.8 upgrade becomes a clearer thing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]