JanKaul opened a new issue, #6420: URL: https://github.com/apache/iceberg/issues/6420
### Feature Request / Improvement # Iceberg Materialized View Spec ## Background and Motivation A materialized view precomputes results of a query to be used as a logical table. When queried the materialized view serves the precomputed results reducing the query latency. The cost of query execution is pushed to the precomputation step and is amortized over the query executions. The big open-source query engines [Trino](https://trino.io/) and [Spark](https://spark.apache.org/) have either recently added ([link](https://trino.io/docs/current/connector/iceberg.html#materialized-views)) or are in the process of adding materialized views. Currently the materialized views are implemented as an [iceberg view](https://iceberg.apache.org/view-spec/) with an underlying storage table. The metadata required for view maintenance is stored as a property of the underlying storage table. The iceberg table format is becoming an important building block in modern data lakes and lakehouses. In addition to open-source query-engines, support from commercial cloud data warehouses like Snowflake, Bigquery and Dremio is available or underway. Iceberg therefore plays a crucial role in enabling data federation between different data lakes and warehouses. ## Current limitations 1. No formal specification Currently materialized views are lacking an open, accessible definition of the format. This makes it difficult to implement iceberg materialized views for new query-engines and consequently hinders adoption. 2. No process for evolution Without a formal specification it is difficult to manage the evolution of the format accross different query-engines. There is no central place where requests can be brought forward. A specification can help with maintaining backward compatibility. 3. Catalog entries for view and storage table When using a common view and a storage table to implement materialized views, you can either show or hide the storage table in the catalog. If the storage table is visible in the catalog there is a view and a table entry where logically there would be only one entry for the materialized view. If the storage table is not made visible in the catalog, it is difficult to assure atomic commits to the storage table. 4. Limited configuration Generally one can imagine different configurations for the materialized views. These include "freshness" guarantees for serving data, update strategies, the storage table format and partitioning. It would be benefitial to make the configuration explicit by including it in the format specification instead of including it as part of view or table properties. ## Goal A common metadata format for materialized views enabling materialized views to be created, read and updated by different query engines. ## Overview MV (Materialized view) metadata storage mirrors how Iceberg table and view metadata is stored and retrieved. MV metadata is maintained in metadata files. All changes to the MV state create a new MV metadata file and completely replace the old metadata using an atomic swap. Like Iceberg tables and views, this atomic swap is delegated to the metastore that tracks tables, views and/or materialized views by name. The MV metadata file tracks the schema, partinioning config, snapshots, custom properties, current and past versions, as well as other metadata. A snapshot represents the precomputed state of a MV at some time and is used to access the complete set of data files in the MV. Similar to tables, the data files associated with a MV snapshot are tracked by manifest files. ### Metadata Location An atomic swap of one MV (Materialized view) metadata file for another provides the basis for making atomic changes. Readers use the version of the MV that was current when they loaded the MV metadata and are not affected by changes until they refresh and pick up a new metadata location. Writers create MV metadata files optimistically, assuming that the current metadata location will not be changed before the writer’s commit. Once a writer has created an update, it commits by swapping the MV's metadata file pointer from the base location to the new location. ## Specification (DRAFT!) ### Terms - **Schema** -- Names and types of fields in a materiallized view. - **Version** -- The state of a materialized view at some point in time. - **Partition spec** -- A definition of how partition values are derived from data fields. - **Snapshot** -- The state of a materialized view at some point in time, including the set of all data files. - **Manifest list** -- A file that lists manifest files; one per snapshot. - **Manifest** -- A file that lists data or delete files; a subset of a snapshot. - **Data file** -- A file that contains rows of a materialized view. - **Delete file** -- A file that encodes rows of a table that are deleted by position or data values. ### Materialized View Metadata The materialized view metadata fields are a superset of the required fields of the v2 table metadata and the v1 view metadata: | v1 | Field Name | Description | | ---------- | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | _required_ | **`format-version`** | An integer version number for the materialized view format. Currently, this must be 1. Implementations must throw an exception if the materialized view's version is higher than the supported version. | | _required_ | **`location`** | The materialized view's base location. This is used by writers to determine where to store data files, manifest files, and materialized view metadata files. | | _required_ | **`uuid`** | A UUID that identifies the materialized view, generated when the materialized view is created. Implementations must throw an exception if a materialized view's UUID does not match the expected UUID after refreshing metadata. | | _required_ | **`versions`** | An array of structs describing the known versions of the materialized view. The number of versions to retain is controlled by the table property: “version.history.num-entries”. See section [Versions](#versions). | | _required_ | **`current_version_id`** | Current version of the materialized view. Set to ‘1’ when the view is first created. | | _required_ | **`version_log`** | A list of timestamp and version ID pairs that encodes changes to the current version for the materialized view. Each time the current-version-id is changed, a new entry should be added with the last-updated-ms and the new current-version-id. | | _required_ | **`last-sequence-number`** | The materialized view's highest assigned sequence number, a monotonically increasing long that tracks the order of snapshots in a materialized view. | | _required_ | **`last-updated-ms`** | Timestamp in milliseconds from the unix epoch when the materialized view was last updated. Each materialized view metadata file should update this field just before writing. | | _required_ | **`last-column-id`** | An integer; the highest assigned column ID for the materialized view. This is used to ensure columns are always assigned an unused ID when evolving schemas. | | _required_ | **`schemas`** | A list of schemas, stored as objects with `schema-id`. | | _required_ | **`current-schema-id`** | ID of the materialized view's current schema. | | _required_ | **`partition-specs`** | A list of partition specs, stored as full partition spec objects. | | _required_ | **`default-spec-id`** | ID of the "current" spec that writers should use by default. | | _required_ | **`last-partition-id`** | An integer; the highest assigned partition field ID across all partition specs for the materialized view. This is used to ensure partition fields are always assigned an unused ID when evolving specs. | | _required_ | **`sort-orders`** | A list of sort orders, stored as full sort order objects. | | _required_ | **`default-sort-order-id`** | Default sort order id of the materialized view. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. | | _required_ | **`refreshes`** | A list of refresh operations. | | _required_ | **`current-refresh-id`** | Id of the last refresh operation that defines the current state of the data files. | | _optional_ | **`snapshots`** | A list of valid snapshots. Valid snapshots are snapshots for which all data files exist in the file system. A data file must not be deleted from the file system until the last snapshot in which it was listed is garbage collected. | | _optional_ | **`current-snapshot-id`** | `long` ID of the current materialized view snapshot; must be the same as the current ID of the `main` branch in `refs`. | | _optional_ | **`snapshot-log`** | A list (optional) of timestamp and snapshot ID pairs that encodes changes to the current snapshot for the materialized view. Each time the current-snapshot-id is changed, a new entry should be added with the last-updated-ms and the new current-snapshot-id. When snapshots are expired from the list of valid snapshots, all entries before a snapshot that has expired should be removed. | | _optional_ | **`properties`** | A string to string map of materialized view properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata. For example, `commit.retry.num-retries` is used to control the number of commit retries. | | _optional_ | **`metadata-log`** | A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the materialized view. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit. | | _optional_ | **`refs`** | A map of snapshot references. The map keys are the unique snapshot reference names in the materialized view, and the map values are snapshot reference objects. There is always a `main` branch reference pointing to the `current-snapshot-id` even if the `refs` map is null. | | _optional_ | **`statistics`** | A list (optional) of [materialized view statistics](https://iceberg.apache.org/spec/#table-statistics). | | | | | ### Refreshes Refresh information is stored as a list of `refresh operation` records. Each `refresh operation` has the following structure: | v1 | Field Name | Description | | ---------- | --------------------- | ----------------------------------------------------------------------------- | | _required_ | **`refresh-id`** | ID of the refresh operation when the materialized view is refreshed. | | _required_ | **`version-id`** | Version id of the materialized view when the refresh operation was performed. | | _required_ | **`base-tables`** | A map of strings (table identifiers) to `base-table` records. | | _optional_ | **`sequence-number`** | Sequence number of the snapshot that contains the refreshed data files. | Refreshes could be handled in different ways. For a normal execution the refresh list could consist of only one entry, which gets overwritted on every refresh operation. If "timetravel" is enabled for the materialized view, a new `refresh operation` record can be inserted on every refresh. Together with the `sequence-number` field, this could be used to track the evolution of data files over the refresh history. ### Base table A `base table` record can have different forms based on the common field "type". The other fields don't necessarily have to be the same. #### Iceberg-Metastore | v1 | Field Name | Description | | ---------- | ------------------------ | ------------------------------------------------------------------------------------------------ | | _required_ | **`type`** | type="iceberg-metastore" | | _required_ | **`identifier`** | Identifier of the base table in the metastore. | | _required_ | **`snapshot-reference`** | Snapshot id of the base table when the refresh operation was performed. | | _optional_ | **`properties`** | A string to string map of base table properties. Could be used to specify a different metastore. | #### Iceberg-FileSystem | v1 | Field Name | Description | | ---------- | ------------------------ | ---------------------------------------------------------------------------------------------- | | _required_ | **`type`** | type="iceberg-filesystem" | | _required_ | **`identifier`** | Path to the directory of the base table. | | _required_ | **`snapshot-reference`** | Version of the base table when the refresh operation was performed. | | _optional_ | **`properties`** | A string to string map of base table properties. Could be used for a different storage system. | #### DeltaLake-FileSystem (optional) | v1 | Field Name | Description | | ---------- | ------------------------ | ---------------------------------------------------------------------------------------------- | | _required_ | **`type`** | type="deltalake-filesystem" | | _required_ | **`identifier`** | Path to the directory of the base table. | | _required_ | **`snapshot-reference`** | Delta table version of the base table when the refresh operation was performed. | | _optional_ | **`properties`** | A string to string map of base table properties. Could be used for a different storage system. | ### [Snapshots](https://iceberg.apache.org/spec/#snapshots) ### [Versions](https://iceberg.apache.org/view-spec/#versions) ### [Version Log](https://iceberg.apache.org/view-spec/#version-log) ### [Schemas](https://iceberg.apache.org/spec/#schemas-and-data-types) ### [Partition Spec](https://iceberg.apache.org/spec/#partitioning) ### [Sort Order](https://iceberg.apache.org/spec/#sorting) ### Query engine None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
