Guy's PR on the documentation of table properties was merged last week.

I just found out that the key is actually not defined in the
`TableProperties`. So I created a PR to fix it. Also updated Spark test
code to use this new prop key.
https://github.com/apache/iceberg/pull/15531


On Fri, Feb 20, 2026 at 10:58 AM huaxin gao <[email protected]> wrote:

> Agree that table comment/description belongs in table metadata. +1 to
> documenting comment as the standard convention (PR #15367
> <https://github.com/apache/iceberg/pull/15367>)
>
> On Fri, Feb 20, 2026 at 9:27 AM Steven Wu <[email protected]> wrote:
>
>> I think the table description or comment belongs in the table metadata.
>> It should be updated infrequently. I am not too worried about the table
>> commit.
>>
>> On Fri, Feb 20, 2026 at 8:13 AM Ryan Blue <[email protected]> wrote:
>>
>>> You're right that this would require a table commit, but that's the case
>>> for almost all other parts of table metadata, including if we were to add a
>>> doc field to schemas. We could handle this entirely at the catalog level,
>>> but then it would be difficult to pass the data to engines to display.
>>>
>>> That said, there is other catalog metadata, like `owner`, that we don't
>>> track in the table and don't recommend using a table property for, so
>>> there's room to have additional catalog-tracked metadata fields passed to
>>> REST clients.
>>>
>>> On Fri, Feb 20, 2026 at 7:34 AM Kevin Liu <[email protected]> wrote:
>>>
>>>> I've been thinking about this particular use case lately. One drawback
>>>> of using the doc or comment property in the Iceberg table metadata is that
>>>> updates fall on the table commit path;  meaning any update to a comment
>>>> will trigger the creation of an additional table snapshot. I think this
>>>> side effect is worth documenting.
>>>>
>>>> Another option for supporting this use case would be to leave it to the
>>>> catalogs to co-locate "business metadata" with the table. I've raised a
>>>> discussion with the Polaris community [1].
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>>
>>>> [1] https://github.com/apache/polaris/issues/3222
>>>>
>>>> On Thu, Feb 19, 2026 at 1:45 AM Guy Yasoor via dev <
>>>> [email protected]> wrote:
>>>>
>>>>> Sure - I opened a PR here:
>>>>> https://github.com/apache/iceberg/pull/15367
>>>>>
>>>>> On Thu, Feb 19, 2026 at 7:02 AM Steven Wu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> It seems that we have a consensus to standardize and document the
>>>>>> "comment" table properties. It is useful to provide the semantic context
>>>>>> that is super helpful to LLMs. This is also how popular engines like 
>>>>>> Spark
>>>>>> and Trino store the `comment` string from "CREATE TABLE" DDL.
>>>>>>
>>>>>> Taeyu/Guy, let us know if you are interested in creating a PR for
>>>>>> that.
>>>>>>
>>>>>> On Thu, Aug 7, 2025 at 12:08 PM Ryan Blue <[email protected]> wrote:
>>>>>>
>>>>>>> I think it's probably a good idea to add more
>>>>>>> implementation-specific details to the spec, like the use of "comment" 
>>>>>>> for
>>>>>>> table documentation. We recently added a section for this that is clear
>>>>>>> that these are not required but are important conventions.
>>>>>>>
>>>>>>> I would not add "owner" to that section. Storing owner in table
>>>>>>> properties is not a good idea because it would either need to be 
>>>>>>> controlled
>>>>>>> and overridden by catalogs or would be informational and untrustworthy. 
>>>>>>> I
>>>>>>> think that owner is part of catalog metadata, not table metadata.
>>>>>>>
>>>>>>> On Thu, Aug 7, 2025 at 9:38 AM Guy Yasoor <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Got it - I now understand better the meaning of "reserved table
>>>>>>>> properties", and I agree it shouldn't be touched or expanded.
>>>>>>>>
>>>>>>>> Going back to the original topic:
>>>>>>>> It appears that both `comment` and `owner` are important fields,
>>>>>>>> which are populated by some engines, and can prove useful for others, 
>>>>>>>> but
>>>>>>>> aren't standardized anywhere in the spec.
>>>>>>>> To improve engine alignment, I think they should be documented
>>>>>>>> somewhere.
>>>>>>>> I'd suggest one of two approaches:
>>>>>>>>
>>>>>>>>    1. Either keeping them in the table properties map, and
>>>>>>>>    documenting it in the Table Properties documentation
>>>>>>>>    
>>>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#table-properties>
>>>>>>>>  (but
>>>>>>>>    not in the reserved section - perhaps it deserves its own section, 
>>>>>>>> "Table
>>>>>>>>    context properties"?)
>>>>>>>>    2. Or adding them as optional top-level fields in the
>>>>>>>>    metadata.json schema - this might be the "best practice" 
>>>>>>>> (especially if
>>>>>>>>    `owner` is supposed to be controlled by the catalog). However, it 
>>>>>>>> will
>>>>>>>>    require changing the current behavior of Spark, both for `owner`
>>>>>>>>    assignment, and for `comment` assignment in "CREATE TABLE ... 
>>>>>>>> COMMENT
>>>>>>>>    'table documentation'".
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2025 at 8:08 PM Ryan Blue <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> The `format-version` table property is different because it is
>>>>>>>>> mapped to the format version that is not stored in table properties. 
>>>>>>>>> It is
>>>>>>>>> reserved because implementations will override it and so it isn't a 
>>>>>>>>> real
>>>>>>>>> table property. This is not a pattern that we want to expand because 
>>>>>>>>> of the
>>>>>>>>> strange behavior.
>>>>>>>>>
>>>>>>>>> For cases like `comment`, these other properties are normal table
>>>>>>>>> properties that can be used like any other. If the schema had a doc 
>>>>>>>>> string
>>>>>>>>> and that was used in place of `comment`, then I think it would be a
>>>>>>>>> reserved property. But there's no need for that because setting the
>>>>>>>>> property or using `COMMENT ON` would have the same behavior -- 
>>>>>>>>> changing the
>>>>>>>>> property value.
>>>>>>>>>
>>>>>>>>> The `owner` property is a different case. Owner is something that
>>>>>>>>> should be restricted. A user should not be able to change it with just
>>>>>>>>> access to modify table metadata. Tracking a table's owner is the
>>>>>>>>> responsibility of the catalog and its access control scheme. Because 
>>>>>>>>> of
>>>>>>>>> this, I don't think that we should standardize or encourage setting an
>>>>>>>>> `owner` table property.
>>>>>>>>>
>>>>>>>>> On Tue, Aug 5, 2025 at 4:21 AM Guy Yasoor
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> If using "comment" is the best practice, should we add this to
>>>>>>>>>> the "reserved table properties" docs
>>>>>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#reserved-table-properties>,
>>>>>>>>>> to make sure it's aligned between different engines and 
>>>>>>>>>> implementations?
>>>>>>>>>> In the same opportunity, I would suggest adding "owner" as
>>>>>>>>>> well, which is automatically added by Spark.
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 5, 2025 at 2:16 AM Taeyun Kim <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I see, thank you for your response.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Taeyun
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>>>>> To: <[email protected]>;
>>>>>>>>>>> Cc:
>>>>>>>>>>> Sent: 2025-08-05 (화) 07:45:43 (UTC+09:00)
>>>>>>>>>>> Subject: Re: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>>>>>>> Objects
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If there isn't a significant difference between table-level
>>>>>>>>>>> description and schema-level description, then I think you should 
>>>>>>>>>>> consider
>>>>>>>>>>> it standardized. You can store the table description in the 
>>>>>>>>>>> "comment" table
>>>>>>>>>>> property.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 3, 2025 at 5:28 PM Taeyun Kim <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I’ve already explained my reasoning in earlier messages,
>>>>>>>>>>> including the example about making table and column descriptions 
>>>>>>>>>>> more
>>>>>>>>>>> accessible for LLM‑generated SQL.
>>>>>>>>>>> From my perspective, table‑level comments, like column‑level
>>>>>>>>>>> comments, should also be standardized.
>>>>>>>>>>> If standardized, it seems natural for them to be part of the
>>>>>>>>>>> schema definition, just like column‑level comments.
>>>>>>>>>>> This way, they stay consistent with the schema version and avoid
>>>>>>>>>>> drifting out of sync when the schema changes.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Taeyun
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>>>>> To: <[email protected]>;
>>>>>>>>>>> Cc:
>>>>>>>>>>> Sent: 2025-07-26 (토) 08:05:55 (UTC+09:00)
>>>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>>>>>>> Objects
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Why would you need to version table descriptions? Are
>>>>>>>>>>> there cases where they are changing rapidly and inaccurate due to 
>>>>>>>>>>> schema
>>>>>>>>>>> changes?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thank you for your reply.
>>>>>>>>>>>
>>>>>>>>>>> Column-level comments are already part of the schema definition.
>>>>>>>>>>> Would adding just one table-level comment really cause noticeable 
>>>>>>>>>>> bloat?
>>>>>>>>>>> For example, if a table has 20 columns, adding one more comment 
>>>>>>>>>>> would only
>>>>>>>>>>> increase the metadata size by about 1/20th.
>>>>>>>>>>>
>>>>>>>>>>> Also, using schema-id as part of the property key feels like a
>>>>>>>>>>> workaround rather than a proper solution. It is not part of the
>>>>>>>>>>> specification, so any tool or integration (including LLM-based 
>>>>>>>>>>> ones) would
>>>>>>>>>>> need extra logic to interpret it. A standardized, schema-level 
>>>>>>>>>>> field would
>>>>>>>>>>> avoid that complexity and make the metadata easier to consume 
>>>>>>>>>>> consistently.
>>>>>>>>>>>
>>>>>>>>>>> If bloat is a real concern, perhaps column-level comments should
>>>>>>>>>>> also be moved out of the schema, with a proper mechanism to version 
>>>>>>>>>>> and
>>>>>>>>>>> manage them separately.
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Taeyun.
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: "Gang Wu" <[email protected]>
>>>>>>>>>>> To: <[email protected]>;
>>>>>>>>>>> Cc:
>>>>>>>>>>> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00)
>>>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>>>>>>> Objects
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'd rather not complicate the schema definitions in the table
>>>>>>>>>>> metadata. You may append `schema-id` to the key of table property 
>>>>>>>>>>> to manage
>>>>>>>>>>> different schema versions.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Storing verbose text to each field may bloat the metadata
>>>>>>>>>>> storage, especially when there are a lot of duplicate `doc`s if 
>>>>>>>>>>> schema
>>>>>>>>>>> evolution happens a lot.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Gang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thank you for your response.
>>>>>>>>>>> As I understand it, the table description is currently stored as
>>>>>>>>>>> a table property within the table metadata’s `properties` map.
>>>>>>>>>>>
>>>>>>>>>>> In my opinion, this approach has a few issues:
>>>>>>>>>>>
>>>>>>>>>>> - Table metadata `properties` are not versioned. As a result,
>>>>>>>>>>> when querying an older snapshot, the description may be inaccurate 
>>>>>>>>>>> because
>>>>>>>>>>> the value reflects only the current state.
>>>>>>>>>>> - According to the specification, the purpose of table metadata
>>>>>>>>>>> properties is: “A string to string map of table properties. This is 
>>>>>>>>>>> used to
>>>>>>>>>>> control settings that affect reading and writing and is not 
>>>>>>>>>>> intended to be
>>>>>>>>>>> used for arbitrary metadata.” Based on this, a comment seems to 
>>>>>>>>>>> fall under
>>>>>>>>>>> “arbitrary metadata,” and therefore may not be an appropriate use of
>>>>>>>>>>> properties.
>>>>>>>>>>> - Table comments seem to have become significant enough that
>>>>>>>>>>> relying on a convention alone may no longer be sufficient. It might 
>>>>>>>>>>> be
>>>>>>>>>>> worth considering a standardized, schema-level field for them.
>>>>>>>>>>>
>>>>>>>>>>> Thank you.
>>>>>>>>>>> Taeyun
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>>>>> To: <[email protected]>;
>>>>>>>>>>> Cc:
>>>>>>>>>>> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00)
>>>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>>>>>>> Objects
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Iceberg does allow you to store table descriptions. The
>>>>>>>>>>> convention is to use a table property, "comment". While this isn't a
>>>>>>>>>>> schema-level doc/comment, I don't know of anything that makes a
>>>>>>>>>>> distinction between schema description and table description, so I 
>>>>>>>>>>> think it
>>>>>>>>>>> should work for your use.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> With the growing trend of using LLMs to automatically generate
>>>>>>>>>>> SQL, it feels increasingly important to manage descriptions of 
>>>>>>>>>>> database
>>>>>>>>>>> tables and columns in a way that these tools can easily access.
>>>>>>>>>>>
>>>>>>>>>>> In the Iceberg specification, comments for schema fields (i.e.,
>>>>>>>>>>> columns) can be specified using the `doc` property within the 
>>>>>>>>>>> `fields`
>>>>>>>>>>> array of a `struct` type. However, there doesn’t seem to be a way to
>>>>>>>>>>> specify a comment for the root struct type itself - that is, for 
>>>>>>>>>>> the table
>>>>>>>>>>> as a whole.
>>>>>>>>>>>
>>>>>>>>>>> From what I can tell, OLAP DBMSs today may handle table-level
>>>>>>>>>>> comments by storing them in the `properties` map within the table 
>>>>>>>>>>> metadata
>>>>>>>>>>> under various non-standard keys. But since a table comment 
>>>>>>>>>>> conceptually
>>>>>>>>>>> belongs to the schema, and can vary by schema, it feels like the
>>>>>>>>>>> `properties` map within the table metadata might not be the best 
>>>>>>>>>>> place for
>>>>>>>>>>> it.
>>>>>>>>>>>
>>>>>>>>>>> Would it make sense to allow a `doc` property on the `schema`
>>>>>>>>>>> object (the root struct type), alongside `schema-id` and
>>>>>>>>>>> `identifier-field-ids`, so that a description for the schema itself 
>>>>>>>>>>> can be
>>>>>>>>>>> included?
>>>>>>>>>>> It seems like it would be helpful, especially for tooling and
>>>>>>>>>>> LLM-related use cases.
>>>>>>>>>>>
>>>>>>>>>>> Curious to hear your thoughts.
>>>>>>>>>>> Apologies if I’m overlooking something or if this has already
>>>>>>>>>>> been discussed.
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Taeyun
>>>>>>>>>>
>>>>>>>>>>

Reply via email to