alamb commented on PR #8225: URL: https://github.com/apache/arrow-rs/pull/8225#issuecomment-3304190280
> Thank you all for taking a look! > > > I think it would be really nice if users who didn't need to read geometry types could avoid paying the cost of that support (e.g. keep their binary and code size down). > > I think that this is a great idea for the Arrow reader and writer (e.g., conversion to/from GeoArrow and writing statistics); however, can we at least provide access to the type annotation and statistics here? It seemed like that's where this PR was headed and I don't think the overhead of that is particularly onerous (I know I'm new here though!). > > As a concrete target, I want to use this PR to prune row groups here: > > https://github.com/apache/sedona-db/blob/653ab44bdd2923b5c395828f93de7fc3085ff6c2/rust/sedona-geoparquet/src/file_opener.rs#L186-L195 > > This is a place where I already have access to all the things I need (e.g., ParquetMetadata, file key/value metadata) and I don't really want or need that to be done for me. All I need is for `row_group_metadata.column(j).statistics()` to let me look at GeoStatistics. For sure -- if there are types that make sense to add (and always compile) to the main parquet crate sounds good to me. Reasonable rust structures for bounding box statistics sounds like it could fit this model What I am trying to avoid is having a bunch more code / binary size for users of the parquet crate if they aren't going to use geometry types, unless they opt in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
