alamb commented on code in PR #6216:
URL: https://github.com/apache/arrow-rs/pull/6216#discussion_r1718593954
##########
parquet/src/file/statistics.rs:
##########
@@ -246,11 +245,7 @@ pub fn to_thrift(stats: Option<&Statistics>) ->
Option<TStatistics> {
let mut thrift_stats = TStatistics {
max: None,
min: None,
- null_count: if stats.has_nulls() {
- Some(stats.null_count() as i64)
- } else {
- None
- },
+ null_count: stats.null_count_opt().map(|value| value as i64),
Review Comment:
I agree the new behavior is desired, but I think it changes what values are
written to parquet files (specifically the parquet metadata will now have the
thrift equivalent of `Some(0)` rather than the equivalent of `None`. I filed
https://github.com/apache/arrow-rs/issues/6256 to track
As this PR is already quite large, I think we should split it into two parts:
1. The API changes
2. The change for writing the metadata
I plan to update this PR to revert the changes to the metadata writing, and
will then make a follow on PR to discuss / propose changing the statistics that
are written to the file
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]