[
https://issues.apache.org/jira/browse/PARQUET-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855019#comment-16855019
]
Wes McKinney commented on PARQUET-458:
--------------------------------------
There's multiple issues here preventing the library from reading the pages yet:
In DataPageV2
* the encoded rep/def levels prefix is not included in the data, it's part of
the page header, so this logic is incorrect:
https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_reader.cc#L53
* the compressed data size in the page header refers only to the portion of the
page after the definition_levels_num_bytes and repetition_levels_num_bytes from
the page header
I started working on a patch, I'll see if I can get something up in the next
week or so
> [C++] Implement support for DataPageV2
> --------------------------------------
>
> Key: PARQUET-458
> URL: https://issues.apache.org/jira/browse/PARQUET-458
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-cpp
> Reporter: Wes McKinney
> Assignee: Wes McKinney
> Priority: Minor
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)