Hi folks,

I'm new to the Iceberg community, currently contributing to Polaris OSS on
the tagging design. Before going deeper into a design doc, I want to
surface the direction on this list and invite early input from people with
more context on how IRC-level concepts get shaped here.

Polaris users are asking for a classification primitive that covers
compliance (PII, sensitivity, data domain), ownership and cost attribution,
and AI or semantic hints on columns. My read is that we will build this
regardless, but designing it inside Polaris alone reduces its value.
Governance tools would need per-catalog adapters. If the shape is
standardized at the IRC level, the ecosystem benefits far more broadly.

Across catalogs and governance platforms, the tag concept has independently
converged on a similar shape: a first-class Tag entity with identity (name
+ namespace), optional schema (allowed values, inheritability), and
attachments to objects carrying a value. Snowflake tags, Unity Catalog
governed tags, Google Cloud Dataplex tag templates, Apache Atlas
classifications, Apache Gravitino tags, and DataHub tags all expose this
pattern, across ownership, FinOps, AI reasoning, and governance use cases.
When independent products converge, my read is that the shape is the
natural decomposition rather than a vendor-specific artifact.

Two adjacent efforts are already in flight. The read-restrictions proposal (
apache/iceberg#13879 <https://github.com/apache/iceberg/issues/13879>)
delivers enforcement to engines. A Tag proposal would complement it as the
classification input side, so catalogs can resolve tag-driven enforcement
internally and deliver the outcome via read-restrictions. The labels
proposal (apache/iceberg#15521
<https://github.com/apache/iceberg/issues/15521>) serves
generic catalog-managed metadata. My read is that a first-class Tag with
identity and lifecycle is distinct from labels; they solve different
problems and can coexist.

At a high level, I think the minimum valuable scope in the IRC spec is: a
Tag entity with CRUD at the namespace level, tag attachments with target
and value applied to tables, columns via field-id, views, and namespaces, a
reverse lookup endpoint for "find objects with tag X", tag attachment
retrieval via a dedicated endpoint, and a small set of normative clauses on
privilege enforcement, visibility filtering, and rename atomicity. Resolved
tags do not need to live in LoadTableResult.

Things I'd like to keep out of the core spec as layered extensions, not
first pass: typed multi-field per-attachment values (Atlas, Dataplex;
addable non-breaking later), a Governed-vs-Standard type distinction (Unity
Catalog's pattern can be expressed through configuration), and
tag-to-policy binding (belongs in a separate Policy authoring phase).

What I'm asking: early feedback on whether this direction fits the IRC
roadmap, pointers to prior discussions I may have missed, and interest in
co-championing from contributors outside Polaris. I'll follow up with a
full design doc in the coming week. An issue placeholder is at
apache/iceberg#16165 <https://github.com/apache/iceberg/issues/16165> for
tracking.

-ej

Reply via email to