paleolimbot commented on PR #46649: URL: https://github.com/apache/arrow/pull/46649#issuecomment-2931029109
> Why was it done that way, if emptiness is a useful information to have? The PR where we discussed this is https://github.com/apache/parquet-format/pull/494 ...the consensus was that checking the `null_count` for a column chunk against the number of rows in the row group would catch the most common case (row group is all null). We then discovered that we don't currently write null counts for unsorted logical types, but hopefully we can fix that ( https://github.com/apache/arrow/pull/46275 ). > And is there a point in exposing emptiness in our geostats APIs? We use the same API for producing and consuming GeoStatistics (this was modelled after the regular Statistics). We could move the write path only use internals although I am not sure this would be less confusing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
