danielcweeks commented on code in PR #14234:
URL: https://github.com/apache/iceberg/pull/14234#discussion_r2764772400
##########
format/spec.md:
##########
@@ -707,6 +707,91 @@ For `geography` only, xmin (X value of `lower_bounds`) may
be greater than xmax
When calculating upper and lower bounds for `geometry` and `geography`, null
or NaN values in a coordinate dimension are skipped; for example, POINT (1 NaN)
contributes a value to X but no values to Y, Z, or M dimension bounds. If a
dimension has only null or NaN values, that dimension is omitted from the
bounding box. If either the X or Y dimension is missing then the bounding box
itself is not produced.
+#### Content Stats
+
+Content stats have been introduced with v4 and hold stats in a
`struct<struct<...>>` where each nested struct holds the stats for an
individual field of a table. The different field stats types are defined in the
next section.
+
+##### Field Stats Types
+
+The struct that holds individual stats for a particular field of a table
consists of the following fields:
+
+| Name | Type | Offset from field ID of base struct
| required | Description
|
+|------------------|---------------------|-------------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| value_count | `long` | 1
| false | Number of values in the column (including null and NaN values)
|
+| null_value_count | `long` | 2
| false | Number of null values in the column
|
+| nan_value_count | `long` | 3
| false | Number of NaN values in the column
|
+| avg_value_count | `int` | 4
| false | The avg value count for variable-length types (string/binary)
|
+| max_value_count | `long` | 5
| false | The max value count for variable-length types (string/binary)
|
Review Comment:
Should this be `avg_value_length` and `max_value_length`. I don't think
this represents a count.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]