Gang Wu created ORC-1264:
----------------------------

             Summary: [C++] Add a writer option to align compression block with 
row group boundary
                 Key: ORC-1264
                 URL: https://issues.apache.org/jira/browse/ORC-1264
             Project: ORC
          Issue Type: Improvement
          Components: C++
            Reporter: Gang Wu
            Assignee: Gang Wu


To reduce unnecessary I/O and decompression when PPD is in effect, we can 
enforce the compression block to be aligned with the row group boundary. It can 
help avoid unnecessary I/O and decompression of the filtered row groups before 
the survived row group within the same compression block. This implementation 
does not break the format specs and should be transparent to any downstream 
implementation. The caveat may be worse file size which depends on the data 
distribution and applied compression algorithm. Therefore we should make it 
optional and enable it per the user's choice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to