alamb opened a new pull request, #6257:
URL: https://github.com/apache/arrow-rs/pull/6257

   # Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-rs/issues/6256
   
   
   # Rationale for this change
    
   
   See https://github.com/apache/arrow-rs/issues/6256.
   
   Current behavior:
   * parquet-rs writer always has the null count when writing statistics, but 
writes `None` to thrift when the null count is zero
   * parquet-rs reader treats a missing null count (`None`) as `Some(0)` (aka 
that it is known there are no nulls)
   
   THis is inconsistent with the parquet spec as well as what parquet-java and 
parquet-cpp do
   
   
   
   # What changes are included in this PR?
   
   * Update parquet reader/writer to follow the spec
   * Add error checking for values that are too large to fit into `i64`
   * documented that older versions of parquet-rs wrote `None`.
   * added tests 
   
   
   # Are there any user-facing changes?
   
   Yes
   
   Changes
   * parquet-rs writer always writes `Some(..)` to thrift
   * parquet-rs reader correctly returns `None` (aka that it is unknown if 
there are nulls)
   * documented that older versions of parquet-rs wrote none.
   
   This change means the generated parquet files are slightly larger (as now 
they encode `Some(0)` for null counts) but the behavior is more correct and 
consistent.
   
   
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking 
change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to