Wan Kun created ORC-1986:
----------------------------
Summary: Trigger flush stripe for large input rows
Key: ORC-1986
URL: https://issues.apache.org/jira/browse/ORC-1986
Project: ORC
Issue Type: Improvement
Reporter: Wan Kun
For large input rows, the stripe may very large and needs more memory to read
and write each strip, we can check the tree write size in bytes and flush the
strip enen the input rows count is less than 5000.
{code:java}
Stripes:
Stripe: offset: 3 data: 347494188 rows: 5120 tail: 244 index: 2304
Stream: column 0 section ROW_INDEX start: 3 length 12
Stream: column 1 section ROW_INDEX start: 15 length 110
Stream: column 2 section ROW_INDEX start: 125 length 893
Stream: column 3 section ROW_INDEX start: 1018 length 31
Stream: column 4 section ROW_INDEX start: 1049 length 65
Stream: column 5 section ROW_INDEX start: 1114 length 923
Stream: column 6 section ROW_INDEX start: 2037 length 25
Stream: column 7 section ROW_INDEX start: 2062 length 155
Stream: column 8 section ROW_INDEX start: 2217 length 28
Stream: column 9 section ROW_INDEX start: 2245 length 31
Stream: column 10 section ROW_INDEX start: 2276 length 31
Stream: column 1 section DATA start: 2307 length 81853
Stream: column 1 section LENGTH start: 84160 length 2191
Stream: column 2 section DATA start: 86351 length 345862763
Stream: column 2 section LENGTH start: 345949114 length 13736
Stream: column 3 section DATA start: 345962850 length 22
Stream: column 3 section LENGTH start: 345962872 length 6
Stream: column 3 section DICTIONARY_DATA start: 345962878 length 5
Stream: column 4 section PRESENT start: 345962883 length 200
Stream: column 4 section DATA start: 345963083 length 6322
Stream: column 4 section LENGTH start: 345969405 length 495
Stream: column 4 section DICTIONARY_DATA start: 345969900 length 2919
Stream: column 5 section DATA start: 345972819 length 1507883
Stream: column 5 section LENGTH start: 347480702 length 7346
Stream: column 6 section DATA start: 347488048 length 22
Stream: column 6 section LENGTH start: 347488070 length 6
Stream: column 6 section DICTIONARY_DATA start: 347488076 length 0
Stream: column 7 section DATA start: 347488076 length 5795
Stream: column 7 section LENGTH start: 347493871 length 301
Stream: column 7 section DICTIONARY_DATA start: 347494172 length 2187
Stream: column 8 section DATA start: 347496359 length 22
Stream: column 8 section LENGTH start: 347496381 length 6
Stream: column 8 section DICTIONARY_DATA start: 347496387 length 4
Stream: column 9 section DATA start: 347496391 length 58
Stream: column 9 section LENGTH start: 347496449 length 6
Stream: column 9 section DICTIONARY_DATA start: 347496455 length 7
Stream: column 10 section DATA start: 347496462 length 22
Stream: column 10 section LENGTH start: 347496484 length 6
Stream: column 10 section DICTIONARY_DATA start: 347496490 length 5
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
Encoding column 2: DIRECT_V2
Encoding column 3: DICTIONARY_V2[1]
Encoding column 4: DICTIONARY_V2[661]
Encoding column 5: DIRECT_V2
Encoding column 6: DICTIONARY_V2[1]
Encoding column 7: DICTIONARY_V2[682]
Encoding column 8: DICTIONARY_V2[1]
Encoding column 9: DICTIONARY_V2[2]
Encoding column 10: DICTIONARY_V2[1]
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)