alamb opened a new pull request, #6257: URL: https://github.com/apache/arrow-rs/pull/6257
# Which issue does this PR close? Closes https://github.com/apache/arrow-rs/issues/6256 # Rationale for this change See https://github.com/apache/arrow-rs/issues/6256. Current behavior: * parquet-rs writer always has the null count when writing statistics, but writes `None` to thrift when the null count is zero * parquet-rs reader treats a missing null count (`None`) as `Some(0)` (aka that it is known there are no nulls) THis is inconsistent with the parquet spec as well as what parquet-java and parquet-cpp do # What changes are included in this PR? * Update parquet reader/writer to follow the spec * Add error checking for values that are too large to fit into `i64` * documented that older versions of parquet-rs wrote `None`. * added tests # Are there any user-facing changes? Yes Changes * parquet-rs writer always writes `Some(..)` to thrift * parquet-rs reader correctly returns `None` (aka that it is unknown if there are nulls) * documented that older versions of parquet-rs wrote none. This change means the generated parquet files are slightly larger (as now they encode `Some(0)` for null counts) but the behavior is more correct and consistent. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!--- If there are any breaking changes to public APIs, please add the `breaking change` label. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org