Gopal V created HIVE-11665: ------------------------------ Summary: ORC StringDictionaryReader should not used Chunked buffers Key: HIVE-11665 URL: https://issues.apache.org/jira/browse/HIVE-11665 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Prasanth Jayachandran
ORC String Dictionary Reader is slow due to the chunking of the input stream. {code} private void readDictionaryStream(InStream in) throws IOException { if (in != null) { // Guard against empty dictionary stream. if (in.available() > 0) { dictionaryBuffer = new DynamicByteArray(64, in.available()); dictionaryBuffer.readAll(in); // Since its start of strip invalidate the cache. dictionaryBufferInBytesCache = null; } in.close(); } else { dictionaryBuffer = null; } } {code} The fact that the data is chunked offers no advantage for the read-path where there is no grow() operation for memory savings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)