paleolimbot commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1772555343
##########
src/main/thrift/parquet.thrift:
##########
@@ -1084,6 +1290,9 @@ struct ColumnIndex {
* Same as repetition_level_histograms except for definitions levels.
**/
7: optional list<i64> definition_level_histograms;
+
+ /** A list containing statistics of GEOMETRY logical type for each page */
+ 8: optional list<GeometryStatistics> geometry_stats;
Review Comment:
I can't speak to data pages since I am not familiar with that level of the
specification; however, these are absolutely essential at the column chunk
level. I will say that even for very small objects, knowing the bounding box is
typically worth it (e.g., nearly all spatial formats cache this information for
*every single geometry object*). This is because many geometry operations,
particularly with polygons, are incredibly expensive and can often be skipped
for features that don't intersect.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]