yongqian created ORC-2131:
-----------------------------
Summary: Set default of orc.stripe.size.check.ratio and
orc.dictionary.max.size.bytes to 0
Key: ORC-2131
URL: https://issues.apache.org/jira/browse/ORC-2131
Project: ORC
Issue Type: Improvement
Reporter: yongqian
Assignee: yongqian
Background
After enabling the optimizations related to {{orc.stripe.size.check.ratio}} and
{{{}orc.dictionary.max.size.bytes{}}}, we observed that ORC files written with
the current defaults are about 10%–20% larger than before. For example,
datasets that were previously ~1.0–1.1 TB grow to ~1.2 TB with the current
defaults, causing noticeable storage and I/O cost increase.
Current defaults
* {{{}orc.dictionary.max.size.bytes{}}}: 16MB (16 * 1024 * 1024) — turns off
dictionary encoding when dictionary size exceeds this limit.
* {{{}orc.stripe.size.check.ratio{}}}: 2.0 — flushes a stripe when tree writer
size exceeds (ratio × orc.stripe.size).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)