Re: [DISCUSS] metadata.json in v4?

Yufei Gu Tue, 10 Feb 2026 14:18:23 -0800

Hi Anton, thanks for raising this. I would really like to make this
optional and then build additional use cases on top of it. For example, a
catalog like IRC could completely eliminate storage IO during commit and
load, which is a big win. It could also provide better protection for
encrypted Iceberg tables, since metadata.json files are plain text today.

That said, do we have consensus that metadata.json can be optional? There
are real portability concerns, and engine-side work also needs
consideration. For example, static tables and the Spark driver still expect
to read this file directly from storage. It feels like the first step here
is aligning on whether metadata.json can be optional at all, before we go
deeper into how we get rid of. What do you think?

Yufei

On Tue, Feb 10, 2026 at 11:23 AM Anton Okolnychyi <[email protected]>
wrote:

> While it may be common knowledge among Iceberg devs that writing the root
> JSON file on commit is somewhat optional with a right catalog, what can we
> do in V4 to solve this problem for all? My problem is the suboptimal
> behavior that new users get by default with HMS or Hadoop catalogs and how
> this impacts their perception of Iceberg. We are doing a bunch of work for
> streaming (e.g. changelog scans, single file commits, etc), but the need to
> write the root JSON file may cancel all of that.
>
> Let me throw some ideas out there.
>
> - Describe how catalogs can make the generation of the root metadata file
> optional in the spec. Ideally, implement that in a built-in catalog of
> choice as a reference implementation.
> - Offload portions of the root metadata file to external files and keep
> references to them.
>
> Thoughts?
>
> - Anton
>
>
>

Re: [DISCUSS] metadata.json in v4?

Reply via email to