rdblue commented on code in PR #11240:
URL: https://github.com/apache/iceberg/pull/11240#discussion_r1819892263
##########
format/spec.md:
##########
@@ -585,13 +589,19 @@ The schema of a manifest file is a struct called
`manifest_entry` with the follo
| _optional_ | _optional_ | _optional_ | **`132 split_offsets`** |
`list<133: long>` |
Split offsets for the data file. For example, all row group offsets in a
Parquet file. Must be sorted ascending
|
| | _optional_ | _optional_ | **`135 equality_ids`** |
`list<136: int>` |
Field ids used to determine row equality in equality delete files. Required
when `content=2` and should be null otherwise. Fields with ids listed in this
column must be present in the delete file |
| _optional_ | _optional_ | _optional_ | **`140 sort_order_id`** |
`int` |
ID representing sort order for this file [3].
|
-| | | _optional_ | **`142 first_row_id`** |
`long` |
The `_row_id` for the first row in the data file. See [First Row ID
Inheritance](#first-row-id-inheritance)
|
+| | | _optional_ | **`142 first_row_id`** |
`long` |
The `_row_id` for the first row in the data file. See [First Row ID
Inheritance](#first-row-id-inheritance)
|
+| | _optional_ | _optional_ | **`143 referenced_data_file`** |
`string` |
Fully qualified location (URI with FS scheme) of a data file that all deletes
reference [4]
|
+| | | _optional_ | **`144 content_offset`** |
`long` |
The offset in the file where the content starts [5]
|
+| | | _optional_ | **`145 content_size_in_bytes`** |
`long` |
The length of a referenced content stored in the file; required if
`content_offset` is present [5]
|
+
Notes:
1. Single-value serialization for lower and upper bounds is detailed in
Appendix D.
2. For `float` and `double`, the value `-0.0` must precede `+0.0`, as in the
IEEE 754 `totalOrder` predicate. NaNs are not permitted as lower or upper
bounds.
3. If sort order ID is missing or unknown, then the order is assumed to be
unsorted. Only data files and equality delete files should be written with a
non-null order id. [Position deletes](#position-delete-files) are required to
be sorted by file and position, not a table order, and should set sort order id
to null. Readers must ignore sort order id for position delete files.
-4. The following field ids are reserved on `data_file`: 141.
+4. Position delete metadata can use `referenced_data_file` when all deletes
tracked by the entry are in a single data file. Setting the referenced file is
required for deletion vectors.
+5. The `content_offset` and `content_size_in_bytes` fields are used to
reference a specific blob for direct access to a deletion vector. The values
must exactly match the `offset` and `length` stored in the Puffin footer for
the deletion vector blob.
Review Comment:
Updated to work around the fact that these aren't actually required in the
table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]