[ 
https://issues.apache.org/jira/browse/ORC-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl resolved ORC-2131.
-------------------------
    Fix Version/s: 3.0.0
                   2.3.1
       Resolution: Fixed

Issue resolved by pull request 2580
[https://github.com/apache/orc/pull/2580]

> Set default of orc.stripe.size.check.ratio and orc.dictionary.max.size.bytes 
> to 0 
> ----------------------------------------------------------------------------------
>
>                 Key: ORC-2131
>                 URL: https://issues.apache.org/jira/browse/ORC-2131
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: yongqian
>            Assignee: yongqian
>            Priority: Major
>             Fix For: 3.0.0, 2.3.1
>
>
> Background
> After enabling the optimizations related to {{orc.stripe.size.check.ratio}} 
> and {{{}orc.dictionary.max.size.bytes{}}}, we observed that ORC files written 
> with the current defaults are about 10%–20% larger than before. For example, 
> datasets that were previously ~1.0–1.1 TB grow to ~1.2 TB with the current 
> defaults, causing noticeable storage and I/O cost increase.
> Current defaults
>  * {{{}orc.dictionary.max.size.bytes{}}}: 16MB (16 * 1024 * 1024) — turns off 
> dictionary encoding when dictionary size exceeds this limit.
>  * {{{}orc.stripe.size.check.ratio{}}}: 2.0 — flushes a stripe when tree 
> writer size exceeds (ratio × orc.stripe.size).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to