Thanks Ryan! Your point about avoiding first-class metadata requirements is exactly the design principle here. Labels let each catalog surface what it knows without the spec dictating what catalogs must track.
To build on this, I put together a POC showing the approach works across the ecosystem. Key design principles that held up in practice: - No new requirements on catalogs. Labels are optional in the response. A catalog that doesn't serve labels returns the same response as today. - Catalog-scoped, not table state. Every catalog we tried already has internal metadata separate from Iceberg properties — Polaris has internalProperties, UC has uc_properties, Lakekeeper has namespace properties in PostgreSQL. Labels just give this existing metadata a standard way through the protocol. - No property overriding. Labels are explicitly separate from table properties. Properties configure behavior, labels describe context. Engines know which is which. What built: - Spec change: https://github.com/apache/iceberg/pull/15750 - PyIceberg client: https://github.com/apache/iceberg-python/pull/3191 Catalog implementations: - Polaris: https://github.com/apache/polaris/pull/4048 (labels from internalProperties) - Unity Catalog OSS: https://github.com/unitycatalog/unitycatalog/pull/1417 (labels from uc_properties) - Lakekeeper: https://github.com/lakekeeper/lakekeeper/pull/1676 (labels from namespace properties) Full demo: https://github.com/laskoviymishka/irc-labels Three catalogs, two languages (Java + Rust), 40-95 lines each. The pattern is the same everywhere, each catalog already has internal metadata that doesn't belong in table properties. Labels give it a standard way out through the protocol. The Polaris implementation also addresses https://github.com/apache/polaris/issues/3222 - the community has been asking for a way to surface business metadata alongside table loads. Labels solve this without adding any requirements beyond an optional field. Beyond ownership and classification, the demo also shows labels enabling AI agent table selection (agents reason about tables using semantic labels instead of guessing from column names) and governance via trusted engine (ClickHouse reading sensitivity labels to auto-generate masking policies). Happy to discuss the spec design or any of the implementation details. Andrei On Fri, Mar 6, 2026 at 11:25 PM Ryan Blue <[email protected]> wrote: > I think that this is a reasonable way to solve some persistent issues that > we've seen. > > Many catalogs track additional metadata that is not part of the table spec > (or others) like "owner", and right now there is no way to exchange or > share that information. I'm also hesitant to start including it as > first-class metadata because that puts additional requirements on catalogs > that may not align. For instance, Tabular had no concept of a table "owner" > and instead used default grants at the schema level. I like that this > solution allows catalogs to provide information in a generic way that > doesn't add requirements in the REST spec. And it is an alternative to > overriding table properties with catalog-managed information, which I think > is an anti-pattern. > > Thanks, Andrei! I think this is a good idea. > > On Thu, Mar 5, 2026 at 2:04 PM Andrei Tserakhau via dev < > [email protected]> wrote: > >> Hi all, >> >> `LoadTableResponse` returns table metadata — schema, snapshots, file >> locations — but catalogs have operational context about tables that has no >> standard place to go: cost attribution, ownership, governance hints, >> semantic metadata. Right now catalogs have two options: >> >> 1. Properties — durable, commit-versioned table state. Good for >> persistent metadata; wrong for ephemeral catalog context. >> 2. Custom fields — catalog-specific extensions with no interoperability. >> Each catalog invents its own structure; engines have no basis to read them. >> >> The community has already identified this gap. Polaris opened an issue >> [1] requesting a standard extension point in the IRC protocol for >> catalog-managed metadata. Two earlier threads [2][3] explored column-level >> metadata, though in the context of table format changes. >> >> We propose adding an optional `labels` field to `LoadTableResponse` for >> catalog-managed metadata. Labels are string key-value pairs generated >> per-request from the catalog's internal systems; nothing is written to >> table files. Engines may use or ignore them entirely. Labels give catalog >> providers a standard channel to surface context to any client without >> bilateral custom integrations for every catalog-engine pair. >> >> Details: >> - GitHub Issue: apache/iceberg#15521 >> - Design Document: [4] >> >> Please review the proposal and share your feedback. >> >> Thanks, >> Andrei >> >> [1]: https://github.com/apache/polaris/issues/3222 >> [2]: https://lists.apache.org/thread/vwrc3m534gfyfjnsfflwtgkg158yzrb4 >> [3]: https://lists.apache.org/thread/yflg8w1h87qgwc4s3qtog4l8nx8nk8m0 >> [4]: >> https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit?usp=sharing >> >
