wgtmac commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1773566114
##########
src/main/thrift/parquet.thrift:
##########
@@ -1084,6 +1290,9 @@ struct ColumnIndex {
* Same as repetition_level_histograms except for definitions levels.
**/
7: optional list<i64> definition_level_histograms;
+
+ /** A list containing statistics of GEOMETRY logical type for each page */
+ 8: optional list<GeometryStatistics> geometry_stats;
Review Comment:
Page level stats are super useful in a needle-in-the-haystack search.
Computation on geometry type can be very slow due to its mathematical
complexity. Page-level stats such as bounding box can help filtering out
unnecessary pages because computation on bounding box is faster in order of
magnitude than on complex polygons.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]