RussellSpitzer commented on code in PR #11130:
URL: https://github.com/apache/iceberg/pull/11130#discussion_r1792211491
##########
format/spec.md:
##########
@@ -684,34 +797,38 @@ The atomic operation used to commit metadata depends on
how tables are tracked a
Table metadata consists of the following fields:
-| v1 | v2 | Field | Description |
-| ---------- | ---------- | ----- | ----------- |
-| _required_ | _required_ | **`format-version`** | An integer version number
for the format. Currently, this can be 1 or 2 based on the spec.
Implementations must throw an exception if a table's version is higher than the
supported version. |
-| _optional_ | _required_ | **`table-uuid`** | A UUID that identifies the
table, generated when the table is created. Implementations must throw an
exception if a table's UUID does not match the expected UUID after refreshing
metadata. |
-| _required_ | _required_ | **`location`**| The table's base location. This is
used by writers to determine where to store data files, manifest files, and
table metadata files. |
-| | _required_ | **`last-sequence-number`**| The table's highest
assigned sequence number, a monotonically increasing long that tracks the order
of snapshots in a table. |
-| _required_ | _required_ | **`last-updated-ms`**| Timestamp in milliseconds
from the unix epoch when the table was last updated. Each table metadata file
should update this field just before writing. |
-| _required_ | _required_ | **`last-column-id`**| An integer; the highest
assigned column ID for the table. This is used to ensure columns are always
assigned an unused ID when evolving schemas. |
-| _required_ | | **`schema`**| The table’s current schema.
(**Deprecated**: use `schemas` and `current-schema-id` instead) |
-| _optional_ | _required_ | **`schemas`**| A list of schemas, stored as
objects with `schema-id`. |
-| _optional_ | _required_ | **`current-schema-id`**| ID of the table's current
schema. |
-| _required_ | | **`partition-spec`**| The table’s current
partition spec, stored as only fields. Note that this is used by writers to
partition data, but is not used when reading because reads use the specs stored
in manifest files. (**Deprecated**: use `partition-specs` and `default-spec-id`
instead) |
-| _optional_ | _required_ | **`partition-specs`**| A list of partition specs,
stored as full partition spec objects. |
-| _optional_ | _required_ | **`default-spec-id`**| ID of the "current" spec
that writers should use by default. |
-| _optional_ | _required_ | **`last-partition-id`**| An integer; the highest
assigned partition field ID across all partition specs for the table. This is
used to ensure partition fields are always assigned an unused ID when evolving
specs. |
-| _optional_ | _optional_ | **`properties`**| A string to string map of table
properties. This is used to control settings that affect reading and writing
and is not intended to be used for arbitrary metadata. For example,
`commit.retry.num-retries` is used to control the number of commit retries. |
-| _optional_ | _optional_ | **`current-snapshot-id`**| `long` ID of the
current table snapshot; must be the same as the current ID of the `main` branch
in `refs`. |
-| _optional_ | _optional_ | **`snapshots`**| A list of valid snapshots. Valid
snapshots are snapshots for which all data files exist in the file system. A
data file must not be deleted from the file system until the last snapshot in
which it was listed is garbage collected. |
-| _optional_ | _optional_ | **`snapshot-log`**| A list (optional) of timestamp
and snapshot ID pairs that encodes changes to the current snapshot for the
table. Each time the current-snapshot-id is changed, a new entry should be
added with the last-updated-ms and the new current-snapshot-id. When snapshots
are expired from the list of valid snapshots, all entries before a snapshot
that has expired should be removed. |
-| _optional_ | _optional_ | **`metadata-log`**| A list (optional) of timestamp
and metadata file location pairs that encodes changes to the previous metadata
files for the table. Each time a new metadata file is created, a new entry of
the previous metadata file location should be added to the list. Tables can be
configured to remove oldest metadata log entries and keep a fixed-size log of
the most recent entries after a commit. |
-| _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored
as full sort order objects. |
-| _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id
of the table. Note that this could be used by writers, but is not used when
reading because reads use the specs stored in manifest files. |
-| | _optional_ | **`refs`** | A map of snapshot references. The map
keys are the unique snapshot reference names in the table, and the map values
are snapshot reference objects. There is always a `main` branch reference
pointing to the `current-snapshot-id` even if the `refs` map is null. |
-| _optional_ | _optional_ | **`statistics`** | A list (optional) of [table
statistics](#table-statistics). |
-| _optional_ | _optional_ | **`partition-statistics`** | A list (optional) of
[partition statistics](#partition-statistics). |
+| v1 | v2 | v3 | Field |
Description
|
+| ---------- | ----------
|------------|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| _required_ | _required_ | _required_ | **`format-version`** | An
integer version number for the format. Currently, this can be 1 or 2 based on
the spec. Implementations must throw an exception if a table's version is
higher than the supported version.
|
+| _optional_ | _required_ | _required_ | **`table-uuid`** | A UUID
that identifies the table, generated when the table is created. Implementations
must throw an exception if a table's UUID does not match the expected UUID
after refreshing metadata.
|
+| _required_ | _required_ | _required_ | **`location`** | The
table's base location. This is used by writers to determine where to store data
files, manifest files, and table metadata files.
|
+| | _required_ | _required_ | **`last-sequence-number`** | The
table's highest assigned sequence number, a monotonically increasing long that
tracks the order of snapshots in a table.
|
+| _required_ | _required_ | _required_ | **`last-updated-ms`** |
Timestamp in milliseconds from the unix epoch when the table was last updated.
Each table metadata file should update this field just before writing.
|
+| _required_ | _required_ | _required_ | **`last-column-id`** | An
integer; the highest assigned column ID for the table. This is used to ensure
columns are always assigned an unused ID when evolving schemas.
|
+| _required_ | | | **`schema`** | The
table’s current schema. (**Deprecated**: use `schemas` and `current-schema-id`
instead)
|
+| _optional_ | _required_ | _required_ | **`schemas`** | A list
of schemas, stored as objects with `schema-id`.
|
+| _optional_ | _required_ | _required_ | **`current-schema-id`** | ID of
the table's current schema.
|
+| _required_ | | | **`partition-spec`** | The
table’s current partition spec, stored as only fields. Note that this is used
by writers to partition data, but is not used when reading because reads use
the specs stored in manifest files. (**Deprecated**: use `partition-specs` and
`default-spec-id` instead)
|
+| _optional_ | _required_ | _required_ | **`partition-specs`** | A list
of partition specs, stored as full partition spec objects.
|
+| _optional_ | _required_ | _required_ | **`default-spec-id`** | ID of
the "current" spec that writers should use by default.
|
+| _optional_ | _required_ | _required_ | **`last-partition-id`** | An
integer; the highest assigned partition field ID across all partition specs for
the table. This is used to ensure partition fields are always assigned an
unused ID when evolving specs.
|
+| _optional_ | _optional_ | _optional_ | **`properties`** | A
string to string map of table properties. This is used to control settings that
affect reading and writing and is not intended to be used for arbitrary
metadata. For example, `commit.retry.num-retries` is used to control the number
of commit retries.
|
+| _optional_ | _optional_ | _optional_ | **`current-snapshot-id`** | `long`
ID of the current table snapshot; must be the same as the current ID of the
`main` branch in `refs`.
|
+| _optional_ | _optional_ | _optional_ | **`snapshots`** | A list
of valid snapshots. Valid snapshots are snapshots for which all data files
exist in the file system. A data file must not be deleted from the file system
until the last snapshot in which it was listed is garbage collected.
|
+| _optional_ | _optional_ | _optional_ | **`snapshot-log`** | A list
(optional) of timestamp and snapshot ID pairs that encodes changes to the
current snapshot for the table. Each time the current-snapshot-id is changed, a
new entry should be added with the last-updated-ms and the new
current-snapshot-id. When snapshots are expired from the list of valid
snapshots, all entries before a snapshot that has expired should be removed.
|
+| _optional_ | _optional_ | _optional_ | **`metadata-log`** | A list
(optional) of timestamp and metadata file location pairs that encodes changes
to the previous metadata files for the table. Each time a new metadata file is
created, a new entry of the previous metadata file location should be added to
the list. Tables can be configured to remove oldest metadata log entries and
keep a fixed-size log of the most recent entries after a commit. |
+| _optional_ | _required_ | _required_ | **`sort-orders`** | A list
of sort orders, stored as full sort order objects.
|
+| _optional_ | _required_ | _required_ | **`default-sort-order-id`** | Default
sort order id of the table. Note that this could be used by writers, but is not
used when reading because reads use the specs stored in manifest files.
|
+| | _optional_ | _optional_ | **`refs`** | A map
of snapshot references. The map keys are the unique snapshot reference names in
the table, and the map values are snapshot reference objects. There is always a
`main` branch reference pointing to the `current-snapshot-id` even if the
`refs` map is null.
|
Review Comment:
I don't think there are any negatives to that. What was the reason it wasn't
required in V3?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]