[
https://issues.apache.org/jira/browse/ORC-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun resolved ORC-1986.
--------------------------------
Fix Version/s: 2.3.0
Resolution: Fixed
Issue resolved by pull request 2371
[https://github.com/apache/orc/pull/2371]
> Trigger flush stripe for large input rows
> -----------------------------------------
>
> Key: ORC-1986
> URL: https://issues.apache.org/jira/browse/ORC-1986
> Project: ORC
> Issue Type: Improvement
> Reporter: Wan Kun
> Assignee: Wan Kun
> Priority: Major
> Fix For: 2.3.0
>
>
> For large input rows, the stripe may excessively large , requiring more
> memory for both reading and writing one strip.
> We can check the tree write size in bytes and flush the strip even when the
> input rows count is less than 5000.
> {code:java}
> Stripes:
> Stripe: offset: 3 data: 347494188 rows: 5120 tail: 244 index: 2304
> Stream: column 0 section ROW_INDEX start: 3 length 12
> Stream: column 1 section ROW_INDEX start: 15 length 110
> Stream: column 2 section ROW_INDEX start: 125 length 893
> Stream: column 3 section ROW_INDEX start: 1018 length 31
> Stream: column 4 section ROW_INDEX start: 1049 length 65
> Stream: column 5 section ROW_INDEX start: 1114 length 923
> Stream: column 6 section ROW_INDEX start: 2037 length 25
> Stream: column 7 section ROW_INDEX start: 2062 length 155
> Stream: column 8 section ROW_INDEX start: 2217 length 28
> Stream: column 9 section ROW_INDEX start: 2245 length 31
> Stream: column 10 section ROW_INDEX start: 2276 length 31
> Stream: column 1 section DATA start: 2307 length 81853
> Stream: column 1 section LENGTH start: 84160 length 2191
> Stream: column 2 section DATA start: 86351 length 345862763
> Stream: column 2 section LENGTH start: 345949114 length 13736
> Stream: column 3 section DATA start: 345962850 length 22
> Stream: column 3 section LENGTH start: 345962872 length 6
> Stream: column 3 section DICTIONARY_DATA start: 345962878 length 5
> Stream: column 4 section PRESENT start: 345962883 length 200
> Stream: column 4 section DATA start: 345963083 length 6322
> Stream: column 4 section LENGTH start: 345969405 length 495
> Stream: column 4 section DICTIONARY_DATA start: 345969900 length 2919
> Stream: column 5 section DATA start: 345972819 length 1507883
> Stream: column 5 section LENGTH start: 347480702 length 7346
> Stream: column 6 section DATA start: 347488048 length 22
> Stream: column 6 section LENGTH start: 347488070 length 6
> Stream: column 6 section DICTIONARY_DATA start: 347488076 length 0
> Stream: column 7 section DATA start: 347488076 length 5795
> Stream: column 7 section LENGTH start: 347493871 length 301
> Stream: column 7 section DICTIONARY_DATA start: 347494172 length 2187
> Stream: column 8 section DATA start: 347496359 length 22
> Stream: column 8 section LENGTH start: 347496381 length 6
> Stream: column 8 section DICTIONARY_DATA start: 347496387 length 4
> Stream: column 9 section DATA start: 347496391 length 58
> Stream: column 9 section LENGTH start: 347496449 length 6
> Stream: column 9 section DICTIONARY_DATA start: 347496455 length 7
> Stream: column 10 section DATA start: 347496462 length 22
> Stream: column 10 section LENGTH start: 347496484 length 6
> Stream: column 10 section DICTIONARY_DATA start: 347496490 length 5
> Encoding column 0: DIRECT
> Encoding column 1: DIRECT_V2
> Encoding column 2: DIRECT_V2
> Encoding column 3: DICTIONARY_V2[1]
> Encoding column 4: DICTIONARY_V2[661]
> Encoding column 5: DIRECT_V2
> Encoding column 6: DICTIONARY_V2[1]
> Encoding column 7: DICTIONARY_V2[682]
> Encoding column 8: DICTIONARY_V2[1]
> Encoding column 9: DICTIONARY_V2[2]
> Encoding column 10: DICTIONARY_V2[1]
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)