danielcweeks commented on code in PR #12580:
URL: https://github.com/apache/iceberg/pull/12580#discussion_r2010581774
##########
format/spec.md:
##########
@@ -367,37 +367,35 @@ Iceberg tables must not use field ids greater than
2147483447 (`Integer.MAX_VALU
The set of metadata columns is:
-| Field id, name | Type | Description
|
-|----------------------------------|---------------|--------------------------------------------------------------------------------------------------------|
-| **`2147483646 _file`** | `string` | Path of the file in which
a row is stored |
-| **`2147483645 _pos`** | `long` | Ordinal position of a row
in the source data file, starting at `0` |
-| **`2147483644 _deleted`** | `boolean` | Whether the row has been
deleted |
-| **`2147483643 _spec_id`** | `int` | Spec ID used to track the
file containing a row |
-| **`2147483642 _partition`** | `struct` | Partition to which a row
belongs |
-| **`2147483546 file_path`** | `string` | Path of a file, used in
position-based delete files |
-| **`2147483545 pos`** | `long` | Ordinal position of a
row, used in position-based delete files
|
-| **`2147483544 row`** | `struct<...>` | Deleted row values, used
in position-based delete files |
-| **`2147483543 _change_type`** | `string` | The
record type in the changelog (INSERT, DELETE, UPDATE_BEFORE, or UPDATE_AFTER)
|
-| **`2147483542 _change_ordinal`** | `int` | The
order of the change
|
-| **`2147483541 _commit_snapshot_id`** | `long` | The
snapshot ID in which the change occured
|
-| **`2147483540 _row_id`** | `long` | A unique
long assigned when row-lineage is enabled, see [Row Lineage](#row-lineage)
|
-| **`2147483539 _last_updated_sequence_number`** | `long` | The
sequence number which last updated this row when row-lineage is enabled, see
[Row Lineage](#row-lineage) |
+| Field id, name | Type | Description
|
+|----------------------------------|---------------|-----------------------------------------------------------------------------------|
+| **`2147483646 _file`** | `string` | Path of the file in which
a row is stored |
+| **`2147483645 _pos`** | `long` | Ordinal position of a row
in the source data file, starting at `0` |
+| **`2147483644 _deleted`** | `boolean` | Whether the row has been
deleted |
+| **`2147483643 _spec_id`** | `int` | Spec ID used to track the
file containing a row |
+| **`2147483642 _partition`** | `struct` | Partition to which a row
belongs |
+| **`2147483546 file_path`** | `string` | Path of a file, used in
position-based delete files |
+| **`2147483545 pos`** | `long` | Ordinal position of a
row, used in position-based delete files |
+| **`2147483544 row`** | `struct<...>` | Deleted row values, used
in position-based delete files |
+| **`2147483543 _change_type`** | `string` | The
record type in the changelog (INSERT, DELETE, UPDATE_BEFORE, or UPDATE_AFTER) |
+| **`2147483542 _change_ordinal`** | `int` | The
order of the change |
+| **`2147483541 _commit_snapshot_id`** | `long` | The
snapshot ID in which the change occured |
+| **`2147483540 _row_id`** | `long` | A unique
long assigned for row lineage, see [Row Lineage](#row-lineage) |
+| **`2147483539 _last_updated_sequence_number`** | `long` | The
sequence number which last updated this row, see [Row Lineage](#row-lineage) |
#### Row Lineage
-In v3 and later, an Iceberg table can track row lineage fields for all newly
created rows. Row lineage is enabled by setting the field `row-lineage` to
true in the table's metadata. When enabled, engines must maintain the
`next-row-id` table field and the following row-level fields when writing data
files:
+In v3 and later, an Iceberg table must track row lineage fields for all newly
created rows. Engines must maintain the `next-row-id` table field and the
following row-level fields when writing data files:
-* `_row_id` a unique long identifier for every row within the table. The value
is assigned via inheritance when a row is first added to the table and the
existing value is explicitly written when the row is copied into a new file.
-* `_last_updated_sequence_number` the sequence number of the commit that last
updated a row. The value is inherited when a row is first added or modified and
the existing value is explicitly written when the row is written to a different
data file but not modified.
+* `_row_id` a unique long identifier for every row within the table. The value
is assigned via inheritance when a row is first added to the table and the
existing value should be explicitly written when the row is copied into a new
file.
+* `_last_updated_sequence_number` the sequence number of the commit that last
updated a row. The value is inherited when a row is first added or modified and
the existing value must be explicitly written when the row is written to a
different data file but not modified.
Review Comment:
We may want to add "if the value is not preserved, the row modification must
include a delete for the existing `_row_id` and an add representing the new
value.
Should we include that additional language to clarify the meaning of
`should`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]