raunaqmorarka commented on code in PR #216:
URL: https://github.com/apache/parquet-format/pull/216#discussion_r1343751395
##########
src/main/thrift/parquet.thrift:
##########
@@ -216,13 +216,22 @@ struct Statistics {
/** count of distinct values occurring */
4: optional i64 distinct_count;
/**
- * Min and max values for the column, determined by its ColumnOrder.
+ * lower and upper bound values for the column, determined by its
ColumnOrder.
+ * These may be the actual minimum and maximum values found on a column
chunk,
+ * but can also be (more compact) values that do not exist on a column
chunk.
+ * For example, instead of storing "Blart Versenwald III", a writer may set
+ * min_value="B", max_value="C". Such more compact values must still be
valid
+ * values within the column's logical type.
*
* Values are encoded using PLAIN encoding, except that variable-length byte
* arrays do not include a length prefix.
*/
5: optional binary max_value;
6: optional binary min_value;
+ /** If true, max_value is the actual maximum value found on a column chunk
**/
+ 7: optional bool is_max_value_exact;
+ /** If true, min_value is the actual minimum value found on a column chunk
**/
+ 8: optional bool is_min_value_exact;
Review Comment:
I think these fields should be empty whenever max_value/min_value are
themselves empty. Some writer implementations may choose to leave this empty
even after populating min/max, in that case the readers should assume that the
value is not exact for safety.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]