Guy's PR on the documentation of table properties was merged last week. I just found out that the key is actually not defined in the `TableProperties`. So I created a PR to fix it. Also updated Spark test code to use this new prop key. https://github.com/apache/iceberg/pull/15531
On Fri, Feb 20, 2026 at 10:58 AM huaxin gao <[email protected]> wrote: > Agree that table comment/description belongs in table metadata. +1 to > documenting comment as the standard convention (PR #15367 > <https://github.com/apache/iceberg/pull/15367>) > > On Fri, Feb 20, 2026 at 9:27 AM Steven Wu <[email protected]> wrote: > >> I think the table description or comment belongs in the table metadata. >> It should be updated infrequently. I am not too worried about the table >> commit. >> >> On Fri, Feb 20, 2026 at 8:13 AM Ryan Blue <[email protected]> wrote: >> >>> You're right that this would require a table commit, but that's the case >>> for almost all other parts of table metadata, including if we were to add a >>> doc field to schemas. We could handle this entirely at the catalog level, >>> but then it would be difficult to pass the data to engines to display. >>> >>> That said, there is other catalog metadata, like `owner`, that we don't >>> track in the table and don't recommend using a table property for, so >>> there's room to have additional catalog-tracked metadata fields passed to >>> REST clients. >>> >>> On Fri, Feb 20, 2026 at 7:34 AM Kevin Liu <[email protected]> wrote: >>> >>>> I've been thinking about this particular use case lately. One drawback >>>> of using the doc or comment property in the Iceberg table metadata is that >>>> updates fall on the table commit path; meaning any update to a comment >>>> will trigger the creation of an additional table snapshot. I think this >>>> side effect is worth documenting. >>>> >>>> Another option for supporting this use case would be to leave it to the >>>> catalogs to co-locate "business metadata" with the table. I've raised a >>>> discussion with the Polaris community [1]. >>>> >>>> Best, >>>> Kevin Liu >>>> >>>> >>>> [1] https://github.com/apache/polaris/issues/3222 >>>> >>>> On Thu, Feb 19, 2026 at 1:45 AM Guy Yasoor via dev < >>>> [email protected]> wrote: >>>> >>>>> Sure - I opened a PR here: >>>>> https://github.com/apache/iceberg/pull/15367 >>>>> >>>>> On Thu, Feb 19, 2026 at 7:02 AM Steven Wu <[email protected]> >>>>> wrote: >>>>> >>>>>> It seems that we have a consensus to standardize and document the >>>>>> "comment" table properties. It is useful to provide the semantic context >>>>>> that is super helpful to LLMs. This is also how popular engines like >>>>>> Spark >>>>>> and Trino store the `comment` string from "CREATE TABLE" DDL. >>>>>> >>>>>> Taeyu/Guy, let us know if you are interested in creating a PR for >>>>>> that. >>>>>> >>>>>> On Thu, Aug 7, 2025 at 12:08 PM Ryan Blue <[email protected]> wrote: >>>>>> >>>>>>> I think it's probably a good idea to add more >>>>>>> implementation-specific details to the spec, like the use of "comment" >>>>>>> for >>>>>>> table documentation. We recently added a section for this that is clear >>>>>>> that these are not required but are important conventions. >>>>>>> >>>>>>> I would not add "owner" to that section. Storing owner in table >>>>>>> properties is not a good idea because it would either need to be >>>>>>> controlled >>>>>>> and overridden by catalogs or would be informational and untrustworthy. >>>>>>> I >>>>>>> think that owner is part of catalog metadata, not table metadata. >>>>>>> >>>>>>> On Thu, Aug 7, 2025 at 9:38 AM Guy Yasoor <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Got it - I now understand better the meaning of "reserved table >>>>>>>> properties", and I agree it shouldn't be touched or expanded. >>>>>>>> >>>>>>>> Going back to the original topic: >>>>>>>> It appears that both `comment` and `owner` are important fields, >>>>>>>> which are populated by some engines, and can prove useful for others, >>>>>>>> but >>>>>>>> aren't standardized anywhere in the spec. >>>>>>>> To improve engine alignment, I think they should be documented >>>>>>>> somewhere. >>>>>>>> I'd suggest one of two approaches: >>>>>>>> >>>>>>>> 1. Either keeping them in the table properties map, and >>>>>>>> documenting it in the Table Properties documentation >>>>>>>> >>>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#table-properties> >>>>>>>> (but >>>>>>>> not in the reserved section - perhaps it deserves its own section, >>>>>>>> "Table >>>>>>>> context properties"?) >>>>>>>> 2. Or adding them as optional top-level fields in the >>>>>>>> metadata.json schema - this might be the "best practice" >>>>>>>> (especially if >>>>>>>> `owner` is supposed to be controlled by the catalog). However, it >>>>>>>> will >>>>>>>> require changing the current behavior of Spark, both for `owner` >>>>>>>> assignment, and for `comment` assignment in "CREATE TABLE ... >>>>>>>> COMMENT >>>>>>>> 'table documentation'". >>>>>>>> >>>>>>>> WDYT? >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 5, 2025 at 8:08 PM Ryan Blue <[email protected]> wrote: >>>>>>>> >>>>>>>>> The `format-version` table property is different because it is >>>>>>>>> mapped to the format version that is not stored in table properties. >>>>>>>>> It is >>>>>>>>> reserved because implementations will override it and so it isn't a >>>>>>>>> real >>>>>>>>> table property. This is not a pattern that we want to expand because >>>>>>>>> of the >>>>>>>>> strange behavior. >>>>>>>>> >>>>>>>>> For cases like `comment`, these other properties are normal table >>>>>>>>> properties that can be used like any other. If the schema had a doc >>>>>>>>> string >>>>>>>>> and that was used in place of `comment`, then I think it would be a >>>>>>>>> reserved property. But there's no need for that because setting the >>>>>>>>> property or using `COMMENT ON` would have the same behavior -- >>>>>>>>> changing the >>>>>>>>> property value. >>>>>>>>> >>>>>>>>> The `owner` property is a different case. Owner is something that >>>>>>>>> should be restricted. A user should not be able to change it with just >>>>>>>>> access to modify table metadata. Tracking a table's owner is the >>>>>>>>> responsibility of the catalog and its access control scheme. Because >>>>>>>>> of >>>>>>>>> this, I don't think that we should standardize or encourage setting an >>>>>>>>> `owner` table property. >>>>>>>>> >>>>>>>>> On Tue, Aug 5, 2025 at 4:21 AM Guy Yasoor >>>>>>>>> <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> If using "comment" is the best practice, should we add this to >>>>>>>>>> the "reserved table properties" docs >>>>>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#reserved-table-properties>, >>>>>>>>>> to make sure it's aligned between different engines and >>>>>>>>>> implementations? >>>>>>>>>> In the same opportunity, I would suggest adding "owner" as >>>>>>>>>> well, which is automatically added by Spark. >>>>>>>>>> >>>>>>>>>> On Tue, Aug 5, 2025 at 2:16 AM Taeyun Kim < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I see, thank you for your response. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Taeyun >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: "Ryan Blue" <[email protected]> >>>>>>>>>>> To: <[email protected]>; >>>>>>>>>>> Cc: >>>>>>>>>>> Sent: 2025-08-05 (화) 07:45:43 (UTC+09:00) >>>>>>>>>>> Subject: Re: Re: Thoughts on Adding a `doc` Property for Schema >>>>>>>>>>> Objects >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If there isn't a significant difference between table-level >>>>>>>>>>> description and schema-level description, then I think you should >>>>>>>>>>> consider >>>>>>>>>>> it standardized. You can store the table description in the >>>>>>>>>>> "comment" table >>>>>>>>>>> property. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 3, 2025 at 5:28 PM Taeyun Kim < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I’ve already explained my reasoning in earlier messages, >>>>>>>>>>> including the example about making table and column descriptions >>>>>>>>>>> more >>>>>>>>>>> accessible for LLM‑generated SQL. >>>>>>>>>>> From my perspective, table‑level comments, like column‑level >>>>>>>>>>> comments, should also be standardized. >>>>>>>>>>> If standardized, it seems natural for them to be part of the >>>>>>>>>>> schema definition, just like column‑level comments. >>>>>>>>>>> This way, they stay consistent with the schema version and avoid >>>>>>>>>>> drifting out of sync when the schema changes. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Taeyun >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: "Ryan Blue" <[email protected]> >>>>>>>>>>> To: <[email protected]>; >>>>>>>>>>> Cc: >>>>>>>>>>> Sent: 2025-07-26 (토) 08:05:55 (UTC+09:00) >>>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema >>>>>>>>>>> Objects >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Why would you need to version table descriptions? Are >>>>>>>>>>> there cases where they are changing rapidly and inaccurate due to >>>>>>>>>>> schema >>>>>>>>>>> changes? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Thank you for your reply. >>>>>>>>>>> >>>>>>>>>>> Column-level comments are already part of the schema definition. >>>>>>>>>>> Would adding just one table-level comment really cause noticeable >>>>>>>>>>> bloat? >>>>>>>>>>> For example, if a table has 20 columns, adding one more comment >>>>>>>>>>> would only >>>>>>>>>>> increase the metadata size by about 1/20th. >>>>>>>>>>> >>>>>>>>>>> Also, using schema-id as part of the property key feels like a >>>>>>>>>>> workaround rather than a proper solution. It is not part of the >>>>>>>>>>> specification, so any tool or integration (including LLM-based >>>>>>>>>>> ones) would >>>>>>>>>>> need extra logic to interpret it. A standardized, schema-level >>>>>>>>>>> field would >>>>>>>>>>> avoid that complexity and make the metadata easier to consume >>>>>>>>>>> consistently. >>>>>>>>>>> >>>>>>>>>>> If bloat is a real concern, perhaps column-level comments should >>>>>>>>>>> also be moved out of the schema, with a proper mechanism to version >>>>>>>>>>> and >>>>>>>>>>> manage them separately. >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> Taeyun. >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: "Gang Wu" <[email protected]> >>>>>>>>>>> To: <[email protected]>; >>>>>>>>>>> Cc: >>>>>>>>>>> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00) >>>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema >>>>>>>>>>> Objects >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'd rather not complicate the schema definitions in the table >>>>>>>>>>> metadata. You may append `schema-id` to the key of table property >>>>>>>>>>> to manage >>>>>>>>>>> different schema versions. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Storing verbose text to each field may bloat the metadata >>>>>>>>>>> storage, especially when there are a lot of duplicate `doc`s if >>>>>>>>>>> schema >>>>>>>>>>> evolution happens a lot. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Gang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Thank you for your response. >>>>>>>>>>> As I understand it, the table description is currently stored as >>>>>>>>>>> a table property within the table metadata’s `properties` map. >>>>>>>>>>> >>>>>>>>>>> In my opinion, this approach has a few issues: >>>>>>>>>>> >>>>>>>>>>> - Table metadata `properties` are not versioned. As a result, >>>>>>>>>>> when querying an older snapshot, the description may be inaccurate >>>>>>>>>>> because >>>>>>>>>>> the value reflects only the current state. >>>>>>>>>>> - According to the specification, the purpose of table metadata >>>>>>>>>>> properties is: “A string to string map of table properties. This is >>>>>>>>>>> used to >>>>>>>>>>> control settings that affect reading and writing and is not >>>>>>>>>>> intended to be >>>>>>>>>>> used for arbitrary metadata.” Based on this, a comment seems to >>>>>>>>>>> fall under >>>>>>>>>>> “arbitrary metadata,” and therefore may not be an appropriate use of >>>>>>>>>>> properties. >>>>>>>>>>> - Table comments seem to have become significant enough that >>>>>>>>>>> relying on a convention alone may no longer be sufficient. It might >>>>>>>>>>> be >>>>>>>>>>> worth considering a standardized, schema-level field for them. >>>>>>>>>>> >>>>>>>>>>> Thank you. >>>>>>>>>>> Taeyun >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: "Ryan Blue" <[email protected]> >>>>>>>>>>> To: <[email protected]>; >>>>>>>>>>> Cc: >>>>>>>>>>> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00) >>>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema >>>>>>>>>>> Objects >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Iceberg does allow you to store table descriptions. The >>>>>>>>>>> convention is to use a table property, "comment". While this isn't a >>>>>>>>>>> schema-level doc/comment, I don't know of anything that makes a >>>>>>>>>>> distinction between schema description and table description, so I >>>>>>>>>>> think it >>>>>>>>>>> should work for your use. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> With the growing trend of using LLMs to automatically generate >>>>>>>>>>> SQL, it feels increasingly important to manage descriptions of >>>>>>>>>>> database >>>>>>>>>>> tables and columns in a way that these tools can easily access. >>>>>>>>>>> >>>>>>>>>>> In the Iceberg specification, comments for schema fields (i.e., >>>>>>>>>>> columns) can be specified using the `doc` property within the >>>>>>>>>>> `fields` >>>>>>>>>>> array of a `struct` type. However, there doesn’t seem to be a way to >>>>>>>>>>> specify a comment for the root struct type itself - that is, for >>>>>>>>>>> the table >>>>>>>>>>> as a whole. >>>>>>>>>>> >>>>>>>>>>> From what I can tell, OLAP DBMSs today may handle table-level >>>>>>>>>>> comments by storing them in the `properties` map within the table >>>>>>>>>>> metadata >>>>>>>>>>> under various non-standard keys. But since a table comment >>>>>>>>>>> conceptually >>>>>>>>>>> belongs to the schema, and can vary by schema, it feels like the >>>>>>>>>>> `properties` map within the table metadata might not be the best >>>>>>>>>>> place for >>>>>>>>>>> it. >>>>>>>>>>> >>>>>>>>>>> Would it make sense to allow a `doc` property on the `schema` >>>>>>>>>>> object (the root struct type), alongside `schema-id` and >>>>>>>>>>> `identifier-field-ids`, so that a description for the schema itself >>>>>>>>>>> can be >>>>>>>>>>> included? >>>>>>>>>>> It seems like it would be helpful, especially for tooling and >>>>>>>>>>> LLM-related use cases. >>>>>>>>>>> >>>>>>>>>>> Curious to hear your thoughts. >>>>>>>>>>> Apologies if I’m overlooking something or if this has already >>>>>>>>>>> been discussed. >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> Taeyun >>>>>>>>>> >>>>>>>>>>
