Howdy Adnan, 1. Re: Semantic Drift: I agree with your concept of a Dataset model being some sort of nested objects underneath a Table/View. To me, that makes a lot more sense than just having it in the OSI spec outside of the table or view. That being said, I think we are aligned to move forward with this proposal and adjust as necessary. 2. Re: Descriptions & Iceberg Properties: I am unsure if the purposes are different for a Semantic Model Dataset Description and the Iceberg Table Property comment. Firstly, this is the approach the OSI community has taken with their converters. [1] Secondly, the Iceberg Table Property comment is defined as "a table-level description that documents the business meaning and usage context." [2] and the Semantic Model Dataset Description is defined as a "Human-readable description" [3]. These two seem to serve the same purpose. Now, you are right that Generic Tables do not support a comment property, however, I wonder if that is more about a missing component from Generic Table rather than an issue with using the comments as already defined. Table comments are pretty standard across the database world: "COMMENT ON TABLE employees IS 'Stores corporate employee profiles';" is something you can do in Snowflake, PostgreSQL, Oracle, Databricks, etc, etc. That being said, I don't want to impede this proposal as we can always adjust when we get user feedback.
[1] - https://github.com/open-semantic-interchange/OSI/tree/main/converters/polaris#export-osi--polaris-1 [2] - https://iceberg.apache.org/docs/latest/configuration/#informational-properties [3] - https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md#schema-1 Go community, Adam On Fri, Jun 12, 2026 at 8:29 PM Adnan Hemani via dev <[email protected]> wrote: > Hi Adam, > > I am definitely not completely up-to-date on this proposal so excuse me if > I'm missing something here. A few points I'd like to double click on: > > * I agree with your point about Semantic Drift - and we should work > towards allowing the reuse of dataset information across semantic models. > I'd prefer we try Option 2 to build this directly into OSI first and if > that does not make sense, we can then consider dynamically generating > Semantic Models from within Polaris. Alternatively, if it's possible to > build the Dataset model into nested objects underneath a Table/View in > Polaris, that might also make sense to me. > * I'm not sure we should rely on Iceberg Properties to model the dataset. > Although re-using it is surely tempting, I don't think we should take a > dependency on this approach which was not built for this purpose. > Additionally, this may cause issues for our Generic Table support for OSI > model, which don't have those table properties. Conceptually, keeping the > Semantic information within Polaris rather than the data plane still seems > right to me. > > Happy to see this proposal moving forward! > > Best, > Adnan Hemani > > On Fri, Jun 12, 2026 at 8:03 AM Adam Christian < > [email protected]> wrote: > > > Hi community, > > > > I wanted to update you on the offline conversations between Yufei, JB, > > Dennis, and me. > > > > Overall, I am good to move forward with this proposal although I have > some > > concerns. My specific concerns are: > > > > 1. Semantic Drift > > > > 2. Lack of Reusability > > > > #1 - Semantic Drift: This proposal adds a catalog entity that houses an > OSI > > Semantic Model. The OSI Semantic Model contains Datasets which represent > a > > table or a view with additional attributes [1]. In this proposal, there > is > > currently no way to centralize a dataset’s semantic attributes. If a user > > wants to have two semantic models refer to a single dataset, they must > > duplicate the semantic attributes. In my opinion, this goes against the > > “inconsistent definitions, duplicated effort” that Yufei mentioned above. > > > > There are two alternatives that could handle this: > > > > 1. Store semantic attributes on the table or view, then dynamically > > generate the OSI Semantic Model from the referenced datasets > > > > 2. Work with the OSI Team to propose hierarchical Semantic Models > > > > The second alternative is backwards compatible with this proposal, but > > requires a change to the OSI Specification. The first can be done today > but > > would be more costly to implement. The first option aligns better with > the > > current converters in the OSI repository [2]. However, it could be made > > backward-compatible with the current proposal by adding an additional > > parameter to the GET for Semantic Models. > > > > #2 - Lack of Reusability: There are several attributes stored on Datasets > > which would be helpful for other consumers. For example, in OSI, Datasets > > and Fields have descriptions. These seem equivalent to a comment in an > > Iceberg Table Property or a doc field on the Schema’s NestedField. These > > comments are already widely supported by current Iceberg consumers and > the > > current Polaris OSI Converter actually leverages this already [2]. Rather > > than reinventing a new attribute, we could use the ones there. > > > > Now, this is opinionated and a user might want an Iceberg Table Property > to > > be different from their Semantic Model. The current proposal moves > forward > > with storing a different attribute. > > > > Given that the concerns above can be handled in a backwards-compatible > > manner, I believe the value of this work is better than waiting for a > > perfect solution. The perfect is the enemy of the good in this case. > > > > Go community, > > > > > > Adam > > > > [1] - > > > > > https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md > > > > [2] - > > > > > https://github.com/open-semantic-interchange/OSI/tree/main/converters/polaris > > > > > > On Fri, May 29, 2026 at 6:35 PM Yufei Gu <[email protected]> wrote: > > > > > Hi folks, > > > > > > As AI agents, BI tools, notebooks, and query engines increasingly > consume > > > the same data, semantic definitions such as metrics and dimensions are > > > often duplicated across multiple systems. This leads to inconsistent > > > definitions, duplicated effort, and governance challenges. The rise of > AI > > > agents further amplifies this problem, as agents rely on semantic > context > > > to understand data and reason about business concepts. Without a shared > > > semantic layer, organizations often end up maintaining multiple > versions > > of > > > the same business definitions across tools and applications. > > > > > > JB and I would like to start a discussion on adding semantic layer > > support > > > to Apache Polaris so semantic models can be defined once, governed > > > centrally, and consumed consistently across tools. The proposal[1] > > > introduces semantic models as a first class Polaris entity using the > Open > > > Semantic Interchange (OSI)[2] specification[3]. At a high level, the > > > proposal adds: > > > > > > - A new SEMANTIC_MODEL entity type > > > - CRUD APIs for semantic models > > > - Schema validation and authorization > > > > > > Polaris remains a metadata service and does not execute metrics or > > semantic > > > queries. > > > Feedback on the overall direction, design, and OSI adoption would be > > > greatly appreciated. > > > > > > 1. > > > > > > > > > https://docs.google.com/document/d/1ZdI-1w_5LbyCMhvUhLCtOt-N1Z89L2P-oiGLaYayCZg/edit?usp=sharing > > > 2. https://open-semantic-interchange.org > > > 3. > > > > > > > > > https://github.com/open-semantic-interchange/OSI/blob/main/core-spec/spec.md > > > > > > > > > Yufei > > > > > >
