Prasanth J created HIVE-7832:
--------------------------------
Summary: Do ORC dictionary check at a finer level and preserve
encoding across stripes
Key: HIVE-7832
URL: https://issues.apache.org/jira/browse/HIVE-7832
Project: Hive
Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Currently ORC dictionary check happens while writing the stripe. Just before
writing stripe if ratio of dictionary entries to total non-null rows is greater
than threshold then the dictionary is discarded. Also, the decision of using
dictionary or not is preserved across stripes. This sometimes leads to costly
insertion cost of O(logn) for each stripes when there are too many distinct
keys.
--
This message was sent by Atlassian JIRA
(v6.2#6252)