[ 
https://issues.apache.org/jira/browse/PARQUET-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855019#comment-16855019
 ] 

Wes McKinney commented on PARQUET-458:
--------------------------------------

There's multiple issues here preventing the library from reading the pages yet:

In DataPageV2

* the encoded rep/def levels prefix is not included in the data, it's part of 
the page header, so this logic is incorrect: 
https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_reader.cc#L53
* the compressed data size in the page header refers only to the portion of the 
page after the definition_levels_num_bytes and repetition_levels_num_bytes from 
the page header

I started working on a patch, I'll see if I can get something up in the next 
week or so

> [C++] Implement support for DataPageV2
> --------------------------------------
>
>                 Key: PARQUET-458
>                 URL: https://issues.apache.org/jira/browse/PARQUET-458
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>            Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to