Gang Wu created ORC-1264:
----------------------------
Summary: [C++] Add a writer option to align compression block with
row group boundary
Key: ORC-1264
URL: https://issues.apache.org/jira/browse/ORC-1264
Project: ORC
Issue Type: Improvement
Components: C++
Reporter: Gang Wu
Assignee: Gang Wu
To reduce unnecessary I/O and decompression when PPD is in effect, we can
enforce the compression block to be aligned with the row group boundary. It can
help avoid unnecessary I/O and decompression of the filtered row groups before
the survived row group within the same compression block. This implementation
does not break the format specs and should be transparent to any downstream
implementation. The caveat may be worse file size which depends on the data
distribution and applied compression algorithm. Therefore we should make it
optional and enable it per the user's choice.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)