I think Yufei is right and the snapshot history is the main contributor. Streaming jobs that write every minute would generate over 10K of snapshot entries per week. We had a similar problem with the list of manifests that kept growing (until we added manifest lists) and with references to previous metadata files (we only keep the last 100 now). So we can definitely come up with something for snapshot entries. We will have to ensure the entire set of snapshots is reachable from the latest root file, even if it requires multiple IO operations.
The main question is whether we still want to require writing root JSON files during commits. If so, our commits will never be single file commits. In V4, we will have to write the root manifest as well as the root metadata file. I would prefer the second to be optional but we will need to think about static tables and how to incorporate that in the spec. вт, 10 лют. 2026 р. о 15:58 Yufei Gu <[email protected]> пише: > AFAIK, the snapshot history is the main, if not the only, reason for the > large metadata.json file. Moving the extra snapshot history to additional > file and keep it referenced in the root one may just resolve the issue. > > Yufei > > > On Tue, Feb 10, 2026 at 3:27 PM huaxin gao <[email protected]> wrote: > >> +1, I think this is a real problem, especially for streaming / frequent >> appends where commit latency matters and metadata.json keeps getting >> bigger. >> >> I also agree we probably shouldn’t remove the root metadata file >> completely. Having one file that describes the whole table is really useful >> for portability and debugging. >> >> Of the options you listed, I like “offload pieces to external files” as a >> first step. We still write the root file every commit, but it won’t grow as >> fast. The downside is extra maintenance/GC complexity. >> >> A couple questions/ideas: >> >> - Do we have any data on what parts of metadata.json grow the most >> (snapshots / history / refs)? Even a rough breakdown could help decide >> what >> to move out first. >> - Could we do a hybrid: still write the root file every commit, but >> only keep a “recent window” in it, and move older history to referenced >> files? (portable, but bounded growth) >> - For “optional on commit”, maybe make it a catalog capability (fast >> commits if the catalog can serve metadata), but still support an >> export/materialize step when portability is needed. >> >> Thanks, >> Huaxin >> >> On Tue, Feb 10, 2026 at 2:58 PM Anton Okolnychyi <[email protected]> >> wrote: >> >>> I don't think we have any consensus or concrete plan. In fact, I don't >>> know what my personal preference is at this point. The intention of this >>> thread is to gain that clarity. I don't think removing the root metadata >>> file entirely is a good idea. It is great to have a way to describe the >>> entire state of a table in a file. We just need to find a solution for >>> streaming appends that suffer from the increasing size of the root metadata >>> file. >>> >>> Like I said, making the generation of the json file on commit optional >>> is one way to solve this problem. We can also think about offloading pieces >>> of it to external files (say old snapshots). This would mean we still have >>> to write the root file on each commit but it will be smaller. One clear >>> downside is more complicated maintenance. >>> >>> Any other ideas/thoughts/feedback? Do people see this as a problem? >>> >>> >>> вт, 10 лют. 2026 р. о 14:18 Yufei Gu <[email protected]> пише: >>> >>>> Hi Anton, thanks for raising this. I would really like to make this >>>> optional and then build additional use cases on top of it. For example, a >>>> catalog like IRC could completely eliminate storage IO during commit and >>>> load, which is a big win. It could also provide better protection for >>>> encrypted Iceberg tables, since metadata.json files are plain text today. >>>> >>>> That said, do we have consensus that metadata.json can be optional? >>>> There are real portability concerns, and engine-side work also needs >>>> consideration. For example, static tables and the Spark driver still expect >>>> to read this file directly from storage. It feels like the first step here >>>> is aligning on whether metadata.json can be optional at all, before we go >>>> deeper into how we get rid of. What do you think? >>>> >>>> Yufei >>>> >>>> >>>> On Tue, Feb 10, 2026 at 11:23 AM Anton Okolnychyi < >>>> [email protected]> wrote: >>>> >>>>> While it may be common knowledge among Iceberg devs that writing the >>>>> root JSON file on commit is somewhat optional with a right catalog, what >>>>> can we do in V4 to solve this problem for all? My problem is the >>>>> suboptimal >>>>> behavior that new users get by default with HMS or Hadoop catalogs and how >>>>> this impacts their perception of Iceberg. We are doing a bunch of work for >>>>> streaming (e.g. changelog scans, single file commits, etc), but the need >>>>> to >>>>> write the root JSON file may cancel all of that. >>>>> >>>>> Let me throw some ideas out there. >>>>> >>>>> - Describe how catalogs can make the generation of the root metadata >>>>> file optional in the spec. Ideally, implement that in a built-in catalog >>>>> of >>>>> choice as a reference implementation. >>>>> - Offload portions of the root metadata file to external files and >>>>> keep references to them. >>>>> >>>>> Thoughts? >>>>> >>>>> - Anton >>>>> >>>>> >>>>>
