[
https://issues.apache.org/jira/browse/ORC-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948064#comment-17948064
]
hezuojiao commented on ORC-1264:
--------------------------------
[https://orc.apache.org/specification/ORCv2/
|https://orc.apache.org/specification/ORCv2/]In the Row Group Index section
{code:java}
Note that for columns with multiple streams, the order of stream positions in
the RowIndex is fixed, which may be different to the actual data stream
placement, and it is the same as Column Encodings section we described
above.{code}
> [C++] Add a writer option to align compression block with row group boundary
> ----------------------------------------------------------------------------
>
> Key: ORC-1264
> URL: https://issues.apache.org/jira/browse/ORC-1264
> Project: ORC
> Issue Type: Improvement
> Components: C++
> Reporter: Gang Wu
> Assignee: Hao Zou
> Priority: Major
> Fix For: 2.1.0
>
>
> To reduce unnecessary I/O and decompression when PPD is in effect, we can
> enforce the compression block to be aligned with the row group boundary. It
> can help avoid unnecessary I/O and decompression of the filtered row groups
> before the survived row group within the same compression block. This
> implementation does not break the format specs and should be transparent to
> any downstream implementation. The caveat may be worse file size which
> depends on the data distribution and applied compression algorithm. Therefore
> we should make it optional and enable it per the user's choice.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)