Howdy Adnan,

1. Re: Semantic Drift: I agree with your concept of a Dataset model being
some sort of nested objects underneath a Table/View. To me, that makes a
lot more sense than just having it in the OSI spec outside of the table or
view. That being said, I think we are aligned to move forward with this
proposal and adjust as necessary.
2. Re: Descriptions & Iceberg Properties: I am unsure if the purposes are
different for a Semantic Model Dataset Description and the Iceberg Table
Property comment. Firstly, this is the approach the OSI community has taken
with their converters. [1] Secondly, the Iceberg Table Property comment is
defined as "a table-level description that documents the business meaning
and usage context." [2] and the Semantic Model Dataset Description is
defined as a "Human-readable description" [3]. These two seem to serve the
same purpose. Now, you are right that Generic Tables do not support a
comment property, however, I wonder if that is more about a missing
component from Generic Table rather than an issue with using the comments
as already defined. Table comments are pretty standard across the database
world: "COMMENT ON TABLE employees IS 'Stores corporate employee
profiles';"  is something you can do in Snowflake, PostgreSQL, Oracle,
Databricks, etc, etc. That being said, I don't want to impede this proposal
as we can always adjust when we get user feedback.

[1] -
https://github.com/open-semantic-interchange/OSI/tree/main/converters/polaris#export-osi--polaris-1
[2] -
https://iceberg.apache.org/docs/latest/configuration/#informational-properties
[3] -
https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md#schema-1

Go community,

Adam

On Fri, Jun 12, 2026 at 8:29 PM Adnan Hemani via dev <[email protected]>
wrote:

> Hi Adam,
>
> I am definitely not completely up-to-date on this proposal so excuse me if
> I'm missing something here. A few points I'd like to double click on:
>
> * I agree with your point about Semantic Drift - and we should work
> towards allowing the reuse of dataset information across semantic models.
> I'd prefer we try Option 2 to build this directly into OSI first and if
> that does not make sense, we can then consider dynamically generating
> Semantic Models from within Polaris. Alternatively, if it's possible to
> build the Dataset model into nested objects underneath a Table/View in
> Polaris, that might also make sense to me.
> * I'm not sure we should rely on Iceberg Properties to model the dataset.
> Although re-using it is surely tempting, I don't think we should take a
> dependency on this approach which was not built for this purpose.
> Additionally, this may cause issues for our Generic Table support for OSI
> model, which don't have those table properties. Conceptually, keeping the
> Semantic information within Polaris rather than the data plane still seems
> right to me.
>
> Happy to see this proposal moving forward!
>
> Best,
> Adnan Hemani
>
> On Fri, Jun 12, 2026 at 8:03 AM Adam Christian <
> [email protected]> wrote:
>
> > Hi community,
> >
> > I wanted to update you on the offline conversations between Yufei, JB,
> > Dennis, and me.
> >
> > Overall, I am good to move forward with this proposal although I have
> some
> > concerns. My specific concerns are:
> >
> > 1. Semantic Drift
> >
> > 2. Lack of Reusability
> >
> > #1 - Semantic Drift: This proposal adds a catalog entity that houses an
> OSI
> > Semantic Model. The OSI Semantic Model contains Datasets which represent
> a
> > table or a view with additional attributes [1]. In this proposal, there
> is
> > currently no way to centralize a dataset’s semantic attributes. If a user
> > wants to have two semantic models refer to a single dataset, they must
> > duplicate the semantic attributes. In my opinion, this goes against the
> > “inconsistent definitions, duplicated effort” that Yufei mentioned above.
> >
> > There are two alternatives that could handle this:
> >
> > 1. Store semantic attributes on the table or view, then dynamically
> > generate the OSI Semantic Model from the referenced datasets
> >
> > 2. Work with the OSI Team to propose hierarchical Semantic Models
> >
> > The second alternative is backwards compatible with this proposal, but
> > requires a change to the OSI Specification. The first can be done today
> but
> > would be more costly to implement. The first option aligns better with
> the
> > current converters in the OSI repository [2]. However, it could be made
> > backward-compatible with the current proposal by adding an additional
> > parameter to the GET for Semantic Models.
> >
> > #2 - Lack of Reusability: There are several attributes stored on Datasets
> > which would be helpful for other consumers. For example, in OSI, Datasets
> > and Fields have descriptions. These seem equivalent to a comment in an
> > Iceberg Table Property or a doc field on the Schema’s NestedField. These
> > comments are already widely supported by current Iceberg consumers and
> the
> > current Polaris OSI Converter actually leverages this already [2]. Rather
> > than reinventing a new attribute, we could use the ones there.
> >
> > Now, this is opinionated and a user might want an Iceberg Table Property
> to
> > be different from their Semantic Model. The current proposal moves
> forward
> > with storing a different attribute.
> >
> > Given that the concerns above can be handled in a backwards-compatible
> > manner, I believe the value of this work is better than waiting for a
> > perfect solution. The perfect is the enemy of the good in this case.
> >
> > Go community,
> >
> >
> > Adam
> >
> > [1] -
> >
> >
> https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md
> >
> > [2] -
> >
> >
> https://github.com/open-semantic-interchange/OSI/tree/main/converters/polaris
> >
> >
> > On Fri, May 29, 2026 at 6:35 PM Yufei Gu <[email protected]> wrote:
> >
> > > Hi folks,
> > >
> > > As AI agents, BI tools, notebooks, and query engines increasingly
> consume
> > > the same data, semantic definitions such as metrics and dimensions are
> > > often duplicated across multiple systems. This leads to inconsistent
> > > definitions, duplicated effort, and governance challenges. The rise of
> AI
> > > agents further amplifies this problem, as agents rely on semantic
> context
> > > to understand data and reason about business concepts. Without a shared
> > > semantic layer, organizations often end up maintaining multiple
> versions
> > of
> > > the same business definitions across tools and applications.
> > >
> > > JB and I would like to start a discussion on adding semantic layer
> > support
> > > to Apache Polaris so semantic models can be defined once, governed
> > > centrally, and consumed consistently across tools. The proposal[1]
> > > introduces semantic models as a first class Polaris entity using the
> Open
> > > Semantic Interchange (OSI)[2] specification[3]. At a high level, the
> > > proposal adds:
> > >
> > >    - A new SEMANTIC_MODEL entity type
> > >    - CRUD APIs for semantic models
> > >    - Schema validation and authorization
> > >
> > > Polaris remains a metadata service and does not execute metrics or
> > semantic
> > > queries.
> > > Feedback on the overall direction, design, and OSI adoption would be
> > > greatly appreciated.
> > >
> > > 1.
> > >
> > >
> >
> https://docs.google.com/document/d/1ZdI-1w_5LbyCMhvUhLCtOt-N1Z89L2P-oiGLaYayCZg/edit?usp=sharing
> > > 2. https://open-semantic-interchange.org
> > > 3.
> > >
> > >
> >
> https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md
> > >
> > >
> > > Yufei
> > >
> >
>

Reply via email to