[ https://issues.apache.org/jira/browse/ARROW-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644350#comment-17644350 ]
Antoine Pitrou commented on ARROW-13240: ---------------------------------------- [~jorgecarleitao] Could you try to check if that still happens with the latest PyArrow? > [C++][Parquet] Page statistics not written in v2? > ------------------------------------------------- > > Key: ARROW-13240 > URL: https://issues.apache.org/jira/browse/ARROW-13240 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Jorge Leitão > Priority: Major > > While working in integration tests of parquet2 against pyarrow, I noticed > that page statistics are only written by pyarrow when using version 1. > I do not have an easy way to reproduce this within pyarrow as I am not sure > how to access individual pages from a column chunk, but it is something that > I observe when trying to integrate. > The row group stats are still written, this only affects page statistics. > pyarrow call: > ``` > pa.parquet.write_table( > t, > path, > version="2.0", > data_page_version="2.0", > write_statistics=True, > ) > ``` > changing version to "1.0" does not impact this behavior, suggesting that the > specific option causing this behavior is the data_page_version="2.0". -- This message was sent by Atlassian Jira (v8.20.10#820010)