[
https://issues.apache.org/jira/browse/ORC-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948028#comment-17948028
]
hezuojiao commented on ORC-1264:
--------------------------------
[~wgtmac] hi, I noticed that the MR added an interface
`Reader::getRowGroupIndex` to get position information of row groups. how to
use the interface to obtain the specific position of each data stream? I want
to utilize the interface for row group level `perbuffer` in execution engine,
but I found that it must be used in conjunction with encoded streams.
Thanks.
> [C++] Add a writer option to align compression block with row group boundary
> ----------------------------------------------------------------------------
>
> Key: ORC-1264
> URL: https://issues.apache.org/jira/browse/ORC-1264
> Project: ORC
> Issue Type: Improvement
> Components: C++
> Reporter: Gang Wu
> Assignee: Hao Zou
> Priority: Major
> Fix For: 2.1.0
>
>
> To reduce unnecessary I/O and decompression when PPD is in effect, we can
> enforce the compression block to be aligned with the row group boundary. It
> can help avoid unnecessary I/O and decompression of the filtered row groups
> before the survived row group within the same compression block. This
> implementation does not break the format specs and should be transparent to
> any downstream implementation. The caveat may be worse file size which
> depends on the data distribution and applied compression algorithm. Therefore
> we should make it optional and enable it per the user's choice.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)