alamb commented on code in PR #6216: URL: https://github.com/apache/arrow-rs/pull/6216#discussion_r1715903094
########## parquet/tests/arrow_writer_layout.rs: ########## @@ -189,7 +189,7 @@ fn test_primitive() { pages: (0..8) .map(|_| Page { rows: 250, - page_header_size: 36, + page_header_size: 38, Review Comment: why are these changes needed? ########## parquet/src/file/statistics.rs: ########## @@ -385,18 +380,13 @@ impl Statistics { /// Returns number of null values for the column. /// Note that this includes all nulls when column is part of the complex type. Review Comment: ```suggestion /// Returns number of null values for the column, if known. /// Note that this includes all nulls when column is part of the complex type. ``` ########## parquet/src/file/statistics.rs: ########## @@ -538,37 +528,47 @@ impl<T: ParquetValueType> ValueStatistics<T> { /// /// Panics if min value is not set, e.g. all values are `null`. /// Use `has_min_max_set` method to check that. - pub fn min(&self) -> &T { + pub(crate) fn min_unchecked(&self) -> &T { self.min.as_ref().unwrap() } + /// Returns min value of the statistics. Review Comment: ```suggestion /// Returns min value of the statistics, if known. ``` ########## parquet/src/column/writer/mod.rs: ########## @@ -769,8 +769,8 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> { } Some(stat) => { // Check if min/max are still ascending/descending across pages - let new_min = stat.min(); - let new_max = stat.max(); + let new_min = stat.min_unchecked(); Review Comment: I think it is guaranteed to be set at this point (this is the code that is responsible for writing the statistics) ########## parquet/src/file/statistics.rs: ########## @@ -601,6 +601,7 @@ impl<T: ParquetValueType> ValueStatistics<T> { /// using signed comparison. This resulted in an undefined ordering for unsigned /// quantities, such as booleans and unsigned integers. /// + // TODO: I don't see `min_value` or `max_value` in this struct. Should these be something else? Review Comment: I don't think we should worry about making a nicer API for the older versions of parquet. We can do so if/when someone has such a file and a real usecase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org