szehon-ho commented on code in PR #10981:
URL: https://github.com/apache/iceberg/pull/10981#discussion_r1902285780
##########
format/spec.md:
##########
@@ -584,8 +589,8 @@ The schema of a manifest file is a struct called
`manifest_entry` with the follo
| _optional_ | _optional_ | _optional_ | **`110 null_value_counts`** |
`map<121: int, 122: long>` |
Map from column id to number of null values in the column
|
| _optional_ | _optional_ | _optional_ | **`137 nan_value_counts`** |
`map<138: int, 139: long>` |
Map from column id to number of NaN values in the column
|
| _optional_ | _optional_ | _optional_ | **`111 distinct_counts`** |
`map<123: int, 124: long>` |
Map from column id to number of distinct values in the column; distinct counts
must be derived using values in the file by counting or using sketches, but not
using methods like merging existing distinct counts |
-| _optional_ | _optional_ | _optional_ | **`125 lower_bounds`** |
`map<126: int, 127: binary>` |
Map from column id to lower bound in the column serialized as binary [1]. Each
value must be less than or equal to all non-null, non-NaN values in the column
for the file [2] |
-| _optional_ | _optional_ | _optional_ | **`128 upper_bounds`** |
`map<129: int, 130: binary>` |
Map from column id to upper bound in the column serialized as binary [1]. Each
value must be greater than or equal to all non-null, non-Nan values in the
column for the file [2] |
+| _optional_ | _optional_ | _optional_ | **`125 lower_bounds`** |
`map<126: int, 127: binary>` |
Map from column id to lower bound in the column serialized as binary [1]. Each
value must be less than or equal to all non-null, non-NaN values in the column
for the file [2]. See [7] for`geometry` and [8] for `geography`. |
+| _optional_ | _optional_ | _optional_ | **`128 upper_bounds`** |
`map<129: int, 130: binary>` |
Map from column id to upper bound in the column serialized as binary [1]. Each
value must be greater than or equal to all non-null, non-Nan values in the
column for the file [2]. See [9] for `geometry` and [10] for `geography`. |
Review Comment:
OK so I changed again to this, let me know if it makes sense, thanks again
for patient review.
```
7. `geometry` and `geography`: this is a point: X, Y, Z, and M are the lower
/ upper bound of all component points of all geometries in file. For the X and
Y values only, the lower_bound's values (xmin/ymin) may be greater than the
upper_bound's value (xmax/ymax). In this X case, a geometry in the file may
match if it contains an X such that `x >= xmin` OR `x <= xmax`, and in this Y
case if `y >= ymin` OR `y <= ymax`. In geographic terminology, the concepts of
`xmin`, `xmax`, `ymin`, and `ymax` are also known as `westernmost`,
`easternmost`, `northernmost` and `southernmost`.
8. `geography` further restricts these points to the canonical ranges of
[-180 180] for X and [-90 90] for Y.
```
Here I gave preference to X, Y language as its more clear without defining
other concepts, but I do mention the north, east, south, west terminology in
the note.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]