Hi community,

I wanted to update you on the offline conversations between Yufei, JB,
Dennis, and me.

Overall, I am good to move forward with this proposal although I have some
concerns. My specific concerns are:

1. Semantic Drift

2. Lack of Reusability

#1 - Semantic Drift: This proposal adds a catalog entity that houses an OSI
Semantic Model. The OSI Semantic Model contains Datasets which represent a
table or a view with additional attributes [1]. In this proposal, there is
currently no way to centralize a dataset’s semantic attributes. If a user
wants to have two semantic models refer to a single dataset, they must
duplicate the semantic attributes. In my opinion, this goes against the
“inconsistent definitions, duplicated effort” that Yufei mentioned above.

There are two alternatives that could handle this:

1. Store semantic attributes on the table or view, then dynamically
generate the OSI Semantic Model from the referenced datasets

2. Work with the OSI Team to propose hierarchical Semantic Models

The second alternative is backwards compatible with this proposal, but
requires a change to the OSI Specification. The first can be done today but
would be more costly to implement. The first option aligns better with the
current converters in the OSI repository [2]. However, it could be made
backward-compatible with the current proposal by adding an additional
parameter to the GET for Semantic Models.

#2 - Lack of Reusability: There are several attributes stored on Datasets
which would be helpful for other consumers. For example, in OSI, Datasets
and Fields have descriptions. These seem equivalent to a comment in an
Iceberg Table Property or a doc field on the Schema’s NestedField. These
comments are already widely supported by current Iceberg consumers and the
current Polaris OSI Converter actually leverages this already [2]. Rather
than reinventing a new attribute, we could use the ones there.

Now, this is opinionated and a user might want an Iceberg Table Property to
be different from their Semantic Model. The current proposal moves forward
with storing a different attribute.

Given that the concerns above can be handled in a backwards-compatible
manner, I believe the value of this work is better than waiting for a
perfect solution. The perfect is the enemy of the good in this case.

Go community,


Adam

[1] -
https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md

[2] -
https://github.com/open-semantic-interchange/OSI/tree/main/converters/polaris


On Fri, May 29, 2026 at 6:35 PM Yufei Gu <[email protected]> wrote:

> Hi folks,
>
> As AI agents, BI tools, notebooks, and query engines increasingly consume
> the same data, semantic definitions such as metrics and dimensions are
> often duplicated across multiple systems. This leads to inconsistent
> definitions, duplicated effort, and governance challenges. The rise of AI
> agents further amplifies this problem, as agents rely on semantic context
> to understand data and reason about business concepts. Without a shared
> semantic layer, organizations often end up maintaining multiple versions of
> the same business definitions across tools and applications.
>
> JB and I would like to start a discussion on adding semantic layer support
> to Apache Polaris so semantic models can be defined once, governed
> centrally, and consumed consistently across tools. The proposal[1]
> introduces semantic models as a first class Polaris entity using the Open
> Semantic Interchange (OSI)[2] specification[3]. At a high level, the
> proposal adds:
>
>    - A new SEMANTIC_MODEL entity type
>    - CRUD APIs for semantic models
>    - Schema validation and authorization
>
> Polaris remains a metadata service and does not execute metrics or semantic
> queries.
> Feedback on the overall direction, design, and OSI adoption would be
> greatly appreciated.
>
> 1.
>
> https://docs.google.com/document/d/1ZdI-1w_5LbyCMhvUhLCtOt-N1Z89L2P-oiGLaYayCZg/edit?usp=sharing
> 2. https://open-semantic-interchange.org
> 3.
>
> https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md
>
>
> Yufei
>

Reply via email to