Re: [DISCUSS] metadata.json in v4?

Anton Okolnychyi Tue, 10 Feb 2026 14:58:19 -0800

I don't think we have any consensus or concrete plan. In fact, I don't know
what my personal preference is at this point. The intention of this thread
is to gain that clarity. I don't think removing the root metadata file
entirely is a good idea. It is great to have a way to describe the entire
state of a table in a file. We just need to find a solution for streaming
appends that suffer from the increasing size of the root metadata file.


Like I said, making the generation of the json file on commit optional is
one way to solve this problem. We can also think about offloading pieces of
it to external files (say old snapshots). This would mean we still have to
write the root file on each commit but it will be smaller. One clear
downside is more complicated maintenance.

Any other ideas/thoughts/feedback? Do people see this as a problem?


вт, 10 лют. 2026 р. о 14:18 Yufei Gu <[email protected]> пише:

> Hi Anton, thanks for raising this. I would really like to make this
> optional and then build additional use cases on top of it. For example, a
> catalog like IRC could completely eliminate storage IO during commit and
> load, which is a big win. It could also provide better protection for
> encrypted Iceberg tables, since metadata.json files are plain text today.
>
> That said, do we have consensus that metadata.json can be optional? There
> are real portability concerns, and engine-side work also needs
> consideration. For example, static tables and the Spark driver still expect
> to read this file directly from storage. It feels like the first step here
> is aligning on whether metadata.json can be optional at all, before we go
> deeper into how we get rid of. What do you think?
>
> Yufei
>
>
> On Tue, Feb 10, 2026 at 11:23 AM Anton Okolnychyi <[email protected]>
> wrote:
>
>> While it may be common knowledge among Iceberg devs that writing the root
>> JSON file on commit is somewhat optional with a right catalog, what can we
>> do in V4 to solve this problem for all? My problem is the suboptimal
>> behavior that new users get by default with HMS or Hadoop catalogs and how
>> this impacts their perception of Iceberg. We are doing a bunch of work for
>> streaming (e.g. changelog scans, single file commits, etc), but the need to
>> write the root JSON file may cancel all of that.
>>
>> Let me throw some ideas out there.
>>
>> - Describe how catalogs can make the generation of the root metadata file
>> optional in the spec. Ideally, implement that in a built-in catalog of
>> choice as a reference implementation.
>> - Offload portions of the root metadata file to external files and keep
>> references to them.
>>
>> Thoughts?
>>
>> - Anton
>>
>>
>>

Re: [DISCUSS] metadata.json in v4?

Reply via email to