[GitHub] [hive] difin commented on a diff in pull request #3833: HIVE-26809: Upgrade ORC to 1.8.1.

GitBox Mon, 16 Jan 2023 08:17:08 -0800


difin commented on code in PR #3833:
URL: https://github.com/apache/hive/pull/3833#discussion_r1071406567



##########
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java:
##########
@@ -224,7 +237,252 @@ private static void skipCompressedIndex(boolean 
isCompressed, PositionProvider i
     index.getNext();
   }
 
-  protected static class StringStreamReader extends StringTreeReader
+  public static class StringDictionaryTreeReaderHive extends TreeReader {

Review Comment:
   This is added as a fix to many failed CI tests that happened without this 
fix.
   In more detail: Hive implements its own TreeReaderFactory. In ORC project, 
ORC-1060 - "Reduce memory usage when vectorized reading dictionary string 
encoding columns" introduced changes to StringDictionaryTreeReader which were 
causing exceptions in Hive EncodedTreeReaderFactory when attempting to upgrade 
to ORC 1.8.1. To handle that I added changes to Hive's EncodedTreeReaderFactory 
to use StringDictionaryTreeReader version as without ORC-1060.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] difin commented on a diff in pull request #3833: HIVE-26809: Upgrade ORC to 1.8.1.

Reply via email to