[Proposal] REST Spec: Server-side Metadata Tables

Yufei Gu Wed, 03 Jul 2024 13:45:22 -0700

Hi folks,

I'd like to discuss a new proposal to support server-side metadata tables.

One of Iceberg's most advantageous features is the ability to inspect a
table using metadata tables. For instance, we can query snapshots just like
we query data rows using the following command: SELECT * FROM
prod.db.table.snapshots;

With the REST catalog, we can simplify this process further by providing
metadata directly from REST endpoints. Here are several benefits of this
approach:

- Engine Independence: The metadata tables do not rely on a specific
implementation of an engine. The REST server returns the results directly.
For example, the Rust Iceberg does not need to implement its own logic to
query the snapshot table if it connects to a server with this capability.
This reduces the complexity and development effort required for different
clients and engines.
- Enabled New Use Cases: A catalog UI or Lakehouse UI can present a
table's metadata (e.g., snapshot/partition list) without relying on an
engine like Trino. This opens up possibilities for lightweight UIs and
tools that can directly interact with the REST endpoints to retrieve and
display metadata.
- Enhanced Performance: With server-side caching, the server-side
metadata tables will perform better. Caching reduces the need to repeatedly
compute or retrieve metadata, leading to faster response times and reduced
load on the underlying storage systems.

Here is the proposal in google doc:
https://docs.google.com/document/d/1MVLwyMQtZ-7jewsQ0PuTvtJbpfl4HCoVdbowMqFTmfc/edit?usp=sharing

Estimated read time: 5 mins

Would really appreciate any feedback on this topic and proposal!

Yufei

[Proposal] REST Spec: Server-side Metadata Tables

Reply via email to