This is an automated email from the ASF dual-hosted git repository.
russellspitzer pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new 7738e1d722 Spec: Fix table of content generation (#11067)
7738e1d722 is described below
commit 7738e1d7228474e36f661cfa1a15a2e8f8410bcd
Author: Ajantha Bhat <[email protected]>
AuthorDate: Sat Oct 26 02:07:05 2024 +0530
Spec: Fix table of content generation (#11067)
---
format/spec.md | 94 +++++++++++++++++++++++++++++-----------------------------
1 file changed, 47 insertions(+), 47 deletions(-)
diff --git a/format/spec.md b/format/spec.md
index 601cbcc3bc..6b80e876ed 100644
--- a/format/spec.md
+++ b/format/spec.md
@@ -30,13 +30,13 @@ Versions 1 and 2 of the Iceberg spec are complete and
adopted by the community.
The format version number is incremented when new features are added that will
break forward-compatibility---that is, when older readers would not read newer
table features correctly. Tables may continue to be written with an older
version of the spec to ensure compatibility by not using features that are not
yet implemented by processing engines.
-#### Version 1: Analytic Data Tables
+### Version 1: Analytic Data Tables
Version 1 of the Iceberg spec defines how to manage large analytic tables
using immutable file formats: Parquet, Avro, and ORC.
All version 1 data and metadata files are valid after upgrading a table to
version 2. [Appendix E](#version-2) documents how to default version 2 fields
when reading version 1 metadata.
-#### Version 2: Row-level Deletes
+### Version 2: Row-level Deletes
Version 2 of the Iceberg spec adds row-level updates and deletes for analytic
tables with immutable files.
@@ -44,7 +44,7 @@ The primary change in version 2 adds delete files to encode
rows that are delete
In addition to row-level deletes, version 2 makes some requirements stricter
for writers. The full set of changes are listed in [Appendix E](#version-2).
-#### Version 3: Extended Types and Capabilities
+### Version 3: Extended Types and Capabilities
Version 3 of the Iceberg spec extends data types and existing metadata
structures to add new capabilities:
@@ -75,7 +75,7 @@ Data files in snapshots are tracked by one or more manifest
files that contain a
The manifests that make up a snapshot are stored in a manifest list file. Each
manifest list stores metadata about manifests, including partition stats and
data file counts. These stats are used to avoid reading manifests that are not
required for an operation.
-#### Optimistic Concurrency
+### Optimistic Concurrency
An atomic swap of one table metadata file for another provides the basis for
serializable isolation. Readers use the snapshot that was current when they
load the table metadata and are not affected by changes until they refresh and
pick up a new metadata location.
@@ -85,7 +85,7 @@ If the snapshot on which an update is based is no longer
current, the writer mus
The conditions required by a write to successfully commit determines the
isolation level. Writers can select what to validate and can make different
isolation guarantees.
-#### Sequence Numbers
+### Sequence Numbers
The relative age of data and delete files relies on a sequence number that is
assigned to every successful commit. When a snapshot is created for a commit,
it is optimistically assigned the next sequence number, and it is written into
the snapshot's metadata. If the commit fails and must be retried, the sequence
number is reassigned and written into new snapshot metadata.
@@ -94,7 +94,7 @@ All manifests, data files, and delete files created for a
snapshot inherit the s
Inheriting the sequence number from manifest metadata allows writing a new
manifest once and reusing it in commit retries. To change a sequence number for
a retry, only the manifest list must be rewritten -- which would be rewritten
anyway with the latest set of manifests.
-#### Row-level Deletes
+### Row-level Deletes
Row-level deletes are stored in delete files.
@@ -106,7 +106,7 @@ There are two ways to encode a row-level delete:
Like data files, delete files are tracked by partition. In general, a delete
file must be applied to older data files with the same partition; see [Scan
Planning](#scan-planning) for details. Column metrics can be used to determine
whether a delete file's rows overlap the contents of a data file or a scan
range.
-#### File System Operations
+### File System Operations
Iceberg only requires that file systems support the following operations:
@@ -121,9 +121,9 @@ Tables do not require random-access writes. Once written,
data and metadata file
Tables do not require rename, except for tables that use atomic rename to
implement the commit operation for new metadata files.
-# Specification
+## Specification
-### Terms
+#### Terms
* **Schema** -- Names and types of fields in a table.
* **Partition spec** -- A definition of how partition values are derived from
data fields.
@@ -133,7 +133,7 @@ Tables do not require rename, except for tables that use
atomic rename to implem
* **Data file** -- A file that contains rows of a table.
* **Delete file** -- A file that encodes rows of a table that are deleted by
position or data values.
-### Writer requirements
+#### Writer requirements
Some tables in this spec have columns that specify requirements for tables by
version. These requirements are intended for writers when adding metadata files
(including manifests files and manifest lists) to a table with the given
version.
@@ -158,19 +158,19 @@ Readers should be more permissive because v1 metadata
files are allowed in v2 ta
Readers may be more strict for metadata JSON files because the JSON files are
not reused and will always match the table version. Required fields that were
not present in or were optional in prior versions may be handled as required
fields. For example, a v2 table that is missing `last-sequence-number` can
throw an exception.
-### Writing data files
+#### Writing data files
All columns must be written to data files even if they introduce redundancy
with metadata stored in manifest files (e.g. columns with identity partition
transforms). Writing all columns provides a backup in case of corruption or
bugs in the metadata layer.
Writers are not allowed to commit files with a partition spec that contains a
field with an unknown transform.
-## Schemas and Data Types
+### Schemas and Data Types
A table's **schema** is a list of named columns. All data types are either
primitives or nested types, which are maps, lists, or structs. A table schema
is also a struct type.
For the representations of these types in Avro, ORC, and Parquet file formats,
see Appendix A.
-### Nested Types
+#### Nested Types
A **`struct`** is a tuple of typed values. Each field in the tuple is named
and has an integer id that is unique in the table schema. Each field can be
either optional or required, meaning that values can (or cannot) be null.
Fields may be any type. Fields may have an optional comment or doc string.
Fields can have [default values](#default-values).
@@ -178,7 +178,7 @@ A **`list`** is a collection of values with some element
type. The element field
A **`map`** is a collection of key-value pairs with a key type and a value
type. Both the key field and value field each have an integer id that is unique
in the table schema. Map keys are required and map values can be either
optional or required. Both map keys and map values may be any type, including
nested types.
-### Primitive Types
+#### Primitive Types
Supported primitive types are defined in the table below. Primitive types
added after v1 have an "added by" version that is the first spec version in
which the type is allowed. For example, nanosecond-precision timestamps are
part of the v3 spec; using v3 types in v1 or v2 tables can break forward
compatibility.
@@ -211,7 +211,7 @@ Notes:
For details on how to serialize a schema to JSON, see Appendix C.
-### Default values
+#### Default values
Default values can be tracked for struct fields (both nested structs and the
top-level schema's struct). There can be two defaults with a field:
@@ -227,7 +227,7 @@ All columns of `unknown` type must default to null.
Non-null values for `initial
Default values are attributes of fields in schemas and serialized with fields
in the JSON format. See [Appendix C](#appendix-c-json-serialization).
-### Schema Evolution
+#### Schema Evolution
Schemas may be evolved by type promotion or adding, deleting, renaming, or
reordering fields in structs (both nested structs and the top-level schema’s
struct).
@@ -275,7 +275,7 @@ Struct evolution requires the following rules for default
values:
* If a field value is missing from a struct's `write-default`, the field's
`write-default` must be used for the field
-#### Column Projection
+##### Column Projection
Columns in Iceberg data files are selected by field id. The table schema's
column names and order may change after a data file is written, and projection
must be done using field ids.
@@ -307,7 +307,7 @@ Field mapping fields are constrained by the following rules:
For details on serialization, see [Appendix C](#name-mapping-serialization).
-### Identifier Field IDs
+#### Identifier Field IDs
A schema can optionally track the set of primitive fields that identify rows
in a table, using the property `identifier-field-ids` (see JSON encoding in
Appendix C).
@@ -316,7 +316,7 @@ Two rows are the "same"---that is, the rows represent the
same entity---if the i
Identifier fields may be nested in structs but cannot be nested within maps or
lists. Float, double, and optional fields cannot be used as identifier fields
and a nested field cannot be used as an identifier field if it is nested in an
optional struct, to avoid null values in identifiers.
-### Reserved Field IDs
+#### Reserved Field IDs
Iceberg tables must not use field ids greater than 2147483447
(`Integer.MAX_VALUE - 200`). This id range is reserved for metadata columns
that can be used in user data schemas, like the `_file` column that holds the
file path in which a row was stored.
@@ -335,7 +335,7 @@ The set of metadata columns is:
| **`2147483543 _row_id`** | `long` | A unique long assigned
when row-lineage is enabled, see [Row Lineage](#row-lineage)
|
| **`2147483542 _last_updated_sequence_number`** | `long` | The
sequence number which last updated this row when row-lineage is enabled [Row
Lineage](#row-lineage) |
-### Row Lineage
+#### Row Lineage
In v3 and later, an Iceberg table can track row lineage fields for all newly
created rows. Row lineage is enabled by setting the field `row-lineage` to
true in the table's metadata. When enabled, engines must maintain the
`next-row-id` table field and the following row-level fields when writing data
files:
@@ -347,7 +347,7 @@ These fields are assigned and updated by inheritance
because the commit sequence
When row lineage is enabled, new snapshots cannot include [Equality
Deletes](#equality-delete-files). Row lineage is incompatible with equality
deletes because lineage values must be maintained, but equality deletes are
used to avoid reading existing data before writing changes.
-#### Row lineage assignment
+##### Row lineage assignment
Row lineage fields are written when row lineage is enabled. When not enabled,
row lineage fields (`_row_id` and `_last_updated_sequence_number`) must not be
written to data files. The rest of this section applies when row lineage is
enabled.
@@ -368,7 +368,7 @@ When an existing row is moved to a different data file for
any reason, writers a
3. If the write has not modified the row, the existing non-null
`_last_updated_sequence_number` value must be copied to the new data file
-#### Row lineage example
+##### Row lineage example
This example demonstrates how `_row_id` and `_last_updated_sequence_number`
are assigned for a snapshot when row lineage is enabled. This starts with a
table with row lineage enabled and a `next-row-id` of 1000.
@@ -409,7 +409,7 @@ Files `data2` and `data3` are written with `null` for
`first_row_id` and are ass
When the new snapshot is committed, the table's `next-row-id` must also be
updated (even if the new snapshot is not in the main branch). Because 225 rows
were added (`added1`: 100 + `added2`: 0 + `added3`: 125), the new value is
1,000 + 225 = 1,225:
-### Enabling Row Lineage for Non-empty Tables
+##### Enabling Row Lineage for Non-empty Tables
Any snapshot without the field `first-row-id` does not have any lineage
information and values for `_row_id` and `_last_updated_sequence_number` cannot
be assigned accurately.
@@ -419,7 +419,7 @@ null should be explicitly written. After this point, rows
are treated as if they
and assigned `row_id` and `_last_updated_sequence_number` as if they were new
rows.
-## Partitioning
+### Partitioning
Data files are stored in manifests with a tuple of partition values that are
used in scans to filter out files that cannot contain records that match the
scan’s filter predicate. Partition values for a data file must be the same for
all records stored in the data file. (Manifests store data files from any
partition, as long as the partition spec is the same for the data files.)
@@ -440,7 +440,7 @@ Two partition specs are considered equivalent with each
other if they have the s
Partition field IDs must be reused if an existing partition spec contains an
equivalent field.
-### Partition Transforms
+#### Partition Transforms
| Transform name | Description
| Source types
| Result type |
|-------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------|
@@ -458,7 +458,7 @@ All transforms must return `null` for a `null` input value.
The `void` transform may be used to replace the transform in an existing
partition field so that the field is effectively dropped in v1 tables. See
partition evolution below.
-### Bucket Transform Details
+#### Bucket Transform Details
Bucket partition transforms use a 32-bit hash of the source value. The 32-bit
hash implementation is the 32-bit Murmur3 hash, x86 variant, seeded with 0.
@@ -475,7 +475,7 @@ Notes:
For hash function details by type, see Appendix B.
-### Truncate Transform Details
+#### Truncate Transform Details
| **Type** | **Config** | **Truncate specification**
| **Examples** |
|---------------|-----------------------|------------------------------------------------------------------|----------------------------------|
@@ -493,7 +493,7 @@ Notes:
4. In contrast to strings, binary values do not have an assumed encoding and
are truncated to `L` bytes.
-### Partition Evolution
+#### Partition Evolution
Table partitioning can be evolved by adding, removing, renaming, or reordering
partition spec fields.
@@ -510,7 +510,7 @@ In v1, partition field IDs were not tracked, but were
assigned sequentially star
3. Only add partition fields at the end of the previous partition spec
-## Sorting
+### Sorting
Users can sort their data within partitions by columns to gain performance.
The information on how the data is sorted can be declared per data or delete
file, by a **sort order**.
@@ -530,7 +530,7 @@ Sorting floating-point numbers should produce the following
behavior: `-NaN` < `
A data or delete file is associated with a sort order by the sort order's id
within [a manifest](#manifests). Therefore, the table must declare all the sort
orders for lookup. A table could also be configured with a default sort order
id, indicating how the new data should be sorted by default. Writers should use
this default sort order to sort the data on write, but are not required to if
the default order is prohibitively expensive, as it would be for streaming
writes.
-## Manifests
+### Manifests
A manifest is an immutable Avro file that lists data files or delete files,
along with each file’s partition data tuple, metrics, and tracking information.
One or more manifest files are used to store a [snapshot](#snapshots), which
tracks all of the files in a table at some point in time. Manifests are tracked
by a [manifest list](#manifest-lists) for each table snapshot.
@@ -598,7 +598,7 @@ The `partition` struct stores the tuple of partition values
for each file. Its t
The column metrics maps are used when filtering to select both data and delete
files. For delete files, the metrics must store bounds and counts for all
deleted rows, or must be omitted. Storing metrics for deleted rows ensures that
the values can be used during job planning to find delete files that must be
merged during a scan.
-### Manifest Entry Fields
+#### Manifest Entry Fields
The manifest entry fields are used to keep track of the snapshot in which
files were added or logically deleted. The `data_file` struct is nested inside
of the manifest entry so that it can be easily passed to job planning without
the manifest entry fields.
@@ -616,7 +616,7 @@ Notes:
1. Technically, data files can be deleted when the last snapshot that contains
the file as “live” data is garbage collected. But this is harder to detect and
requires finding the diff of multiple snapshots. It is easier to track what
files are deleted in a snapshot and delete them when that snapshot expires. It
is not recommended to add a deleted file back to a table. Adding a deleted file
can lead to edge cases where incremental deletes can break table snapshots.
2. Manifest list files are required in v2, so that the `sequence_number` and
`snapshot_id` to inherit are always available.
-### Sequence Number Inheritance
+#### Sequence Number Inheritance
Manifests track the sequence number when a data or delete file was added to
the table.
@@ -629,7 +629,7 @@ Inheriting sequence numbers through the metadata tree
allows writing a new manif
When reading v1 manifests with no sequence number column, sequence numbers for
all files must default to 0.
-### First Row ID Inheritance
+#### First Row ID Inheritance
Row ID inheritance is used when row lineage is enabled. When not enabled, a
data file's `first_row_id` must always be set to `null`. The rest of this
section applies when row lineage is enabled.
@@ -639,7 +639,7 @@ When reading, the `first_row_id` is assigned by replacing
`null` with the manife
The `first_row_id` is only inherited for added data files. The inherited value
must be written into the data file metadata for existing and deleted entries.
The value of `first_row_id` for delete files is always `null`.
-## Snapshots
+### Snapshots
A snapshot consists of the following fields:
@@ -673,7 +673,7 @@ Manifests for a snapshot are tracked by a manifest list.
Valid snapshots are stored as a list in table metadata. For serialization, see
Appendix C.
-### Snapshot Row IDs
+#### Snapshot Row IDs
When row lineage is not enabled, `first-row-id` must be omitted. The rest of
this section applies when row lineage is enabled.
@@ -811,13 +811,13 @@ When expiring snapshots, retention policies in table and
snapshot references are
2. The snapshot is not one of the first `min-snapshots-to-keep` in the
branch (including the branch's referenced snapshot)
5. Expire any snapshot not in the set of snapshots to retain.
-## Table Metadata
+### Table Metadata
Table metadata is stored as JSON. Each table metadata change creates a new
table metadata file that is committed by an atomic operation. This operation is
used to ensure that a new version of table metadata replaces the version on
which it was based. This produces a linear history of table versions and
ensures that concurrent writes are not lost.
The atomic operation used to commit metadata depends on how tables are tracked
and is not standardized by this spec. See the sections below for examples.
-### Table Metadata Fields
+#### Table Metadata Fields
Table metadata consists of the following fields:
@@ -853,7 +853,7 @@ For serialization details, see Appendix C.
When a new snapshot is added, the table's `next-row-id` should be updated to
the previous `next-row-id` plus the sum of `record_count` for all data files
added in the snapshot (this is also equal to the sum of `added_rows_count` for
all manifests added in the snapshot). This ensures that `next-row-id` is always
higher than any assigned row ID in the table.
-### Table Statistics
+#### Table Statistics
Table statistics files are valid [Puffin files](puffin-spec.md). Statistics
are informational. A reader can choose to
ignore statistics information. Statistics support is not required to read the
table correctly. A table can contain
@@ -881,7 +881,7 @@ Blob metadata is a struct with the following fields:
| _optional_ | _optional_ | **`properties`** | `map<string, string>` |
Additional properties associated with the statistic. Subset of Blob properties
in the Puffin file. |
-### Partition Statistics
+#### Partition Statistics
Partition statistics files are based on [partition statistics file
spec](#partition-statistics-file).
Partition statistics are not required for reading or planning and readers may
ignore them.
@@ -897,7 +897,7 @@ Partition statistics file must be registered in the table
metadata file to be co
| _required_ | _required_ | **`statistics-path`** | `string` | Path of the
partition statistics file. See [Partition statistics
file](#partition-statistics-file). |
| _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the
partition statistics file. |
-#### Partition Statistics File
+##### Partition Statistics File
Statistics information for each unique partition tuple is stored as a row in
any of the data file format of the table (for example, Parquet or ORC).
These rows must be sorted (in ascending manner with NULL FIRST) by `partition`
field to optimize filtering rows while scanning.
@@ -934,7 +934,7 @@ The unified partition type looks like `Struct<field#1,
field#2, field#3>`.
and then the table has evolved into `spec#1` which has just one field
`{field#2}`.
The unified partition type looks like `Struct<field#1, field#2>`.
-## Commit Conflict Resolution and Retry
+### Commit Conflict Resolution and Retry
When two commits happen at the same time and are based on the same version,
only one commit will succeed. In most cases, the failed commit can be applied
to the new current version of table metadata and retried. Updates verify the
conditions under which they can be applied to a new version and retry if those
conditions are met.
@@ -944,7 +944,7 @@ When two commits happen at the same time and are based on
the same version, only
* Table schema updates and partition spec changes must validate that the
schema has not changed between the base version and the current version.
-### File System Tables
+#### File System Tables
_Note: This file system based scheme to commit a metadata file is
**deprecated** and will be removed in version 4 of this spec. The scheme is
**unsafe** in object stores and local file systems._
@@ -963,7 +963,7 @@ Notes:
1. The file system table scheme is implemented in
[HadoopTableOperations](../javadoc/{{ icebergVersion
}}/index.html?org/apache/iceberg/hadoop/HadoopTableOperations.html).
-### Metastore Tables
+#### Metastore Tables
The atomic swap needed to commit new versions of table metadata can be
implemented by storing a pointer in a metastore or database that is updated
with a check-and-put operation [1]. The check-and-put validates that the
version of the table that a write is based on is still current and then makes
the new metadata from the write the current version.
@@ -980,7 +980,7 @@ Notes:
1. The metastore table scheme is partly implemented in
[BaseMetastoreTableOperations](../javadoc/{{ icebergVersion
}}/index.html?org/apache/iceberg/BaseMetastoreTableOperations.html).
-## Delete Formats
+### Delete Formats
This section details how to encode row-level deletes in Iceberg delete files.
Row-level deletes are not supported in v1.
@@ -991,7 +991,7 @@ Row-level delete files are tracked by manifests, like data
files. A separate set
Both position and equality deletes allow encoding deleted row values with a
delete. This can be used to reconstruct a stream of changes to a table.
-### Position Delete Files
+#### Position Delete Files
Position-based delete files identify deleted rows by file and position in one
or more data files, and may optionally contain the deleted row.
@@ -1016,7 +1016,7 @@ The rows in the delete file must be sorted by `file_path`
then `pos` to optimize
* Sorting by `file_path` allows filter pushdown by file in columnar storage
formats.
* Sorting by `pos` allows filtering rows while scanning, to avoid keeping
deletes in memory.
-### Equality Delete Files
+#### Equality Delete Files
Equality delete files identify deleted rows in a collection of data files by
one or more column values, and may optionally contain additional columns of the
deleted row.
@@ -1068,7 +1068,7 @@ equality_ids=[1, 2]
If a delete column in an equality delete file is later dropped from the table,
it must still be used when applying the equality deletes. If a column was added
to a table and later used as a delete column in an equality delete file, the
column value is read for older data files using normal projection rules
(defaults to `null`).
-### Delete File Stats
+#### Delete File Stats
Manifests hold the same statistics for delete files and data files. For delete
files, the metrics describe the values that were deleted.