It seems that we are discussing on two orthogonal approaches: 1. Making the writing of the complete metadata.json file optional during a commit, especially for catalogs that can manage metadata themselves. 2. Restructuring the metadata.json file (e.g., by offloading growing parts like snapshot history to external files) to limit its size and reduce write I/O, while still requiring the root file on every commit for portability.
I believe both approaches are worth exploring because in some cases portability is still a top priority. Best, Gang On Wed, Feb 11, 2026 at 9:27 AM Manu Zhang <[email protected]> wrote: > > Can we add an abstraction to spec like root metadata (or snapshot history > manager) with the default implementation being metadata.json? > > > On Wed, Feb 11, 2026 at 9:07 AM Prashant Singh <[email protected]> > wrote: >> >> +1 i think snapshot summary bloating was a major factor for bloating >> metadata.json too specially for streaming writer based on my past exp, one >> other way since we didn't wanted to propose the spec change was to have >> strict requirement on how many snapshot we wanted to keep and let the remove >> orphans do the clean up, also we removed the snapshot summaries since they >> are optional anyways in addition to as in streaming mode we create a large >> number of snapshot (not all were required anyways). >> I believe there had been a lot of interesting discussion to optimize read >> [1] as well as write [2] if we are open to make spec a bit relaxed, it would >> be nice to move to the tracking of the metadata to the catalog and then a >> protocol to retrieve it back without compromising the portability, maybe we >> can just have a dedicate api which can help export this to a file and in an >> intermediate stage we just operate on what we have stored in catalog and we >> just materialize to the file when and if asked we are kind of having similar >> discussion in IRC. >> >> All i think acknowledge it being a real problem for streaming writers :) ! >> >> Past discussions : >> [1] https://lists.apache.org/thread/pwdd7qmdsfcrzjtsll53d3m9f74d03l8 >> [2] https://github.com/apache/iceberg/issues/2723 >> >> Best, >> Prashant Singh >> >> On Tue, Feb 10, 2026 at 4:45 PM Anton Okolnychyi <[email protected]> >> wrote: >>> >>> I think Yufei is right and the snapshot history is the main contributor. >>> Streaming jobs that write every minute would generate over 10K of snapshot >>> entries per week. We had a similar problem with the list of manifests that >>> kept growing (until we added manifest lists) and with references to >>> previous metadata files (we only keep the last 100 now). So we can >>> definitely come up with something for snapshot entries. We will have to >>> ensure the entire set of snapshots is reachable from the latest root file, >>> even if it requires multiple IO operations. >>> >>> The main question is whether we still want to require writing root JSON >>> files during commits. If so, our commits will never be single file commits. >>> In V4, we will have to write the root manifest as well as the root metadata >>> file. I would prefer the second to be optional but we will need to think >>> about static tables and how to incorporate that in the spec. >>> >>> >>> >>> вт, 10 лют. 2026 р. о 15:58 Yufei Gu <[email protected]> пише: >>>> >>>> AFAIK, the snapshot history is the main, if not the only, reason for the >>>> large metadata.json file. Moving the extra snapshot history to additional >>>> file and keep it referenced in the root one may just resolve the issue. >>>> >>>> Yufei >>>> >>>> >>>> On Tue, Feb 10, 2026 at 3:27 PM huaxin gao <[email protected]> wrote: >>>>> >>>>> +1, I think this is a real problem, especially for streaming / frequent >>>>> appends where commit latency matters and metadata.json keeps getting >>>>> bigger. >>>>> >>>>> I also agree we probably shouldn’t remove the root metadata file >>>>> completely. Having one file that describes the whole table is really >>>>> useful for portability and debugging. >>>>> >>>>> Of the options you listed, I like “offload pieces to external files” as a >>>>> first step. We still write the root file every commit, but it won’t grow >>>>> as fast. The downside is extra maintenance/GC complexity. >>>>> >>>>> A couple questions/ideas: >>>>> >>>>> Do we have any data on what parts of metadata.json grow the most >>>>> (snapshots / history / refs)? Even a rough breakdown could help decide >>>>> what to move out first. >>>>> Could we do a hybrid: still write the root file every commit, but only >>>>> keep a “recent window” in it, and move older history to referenced files? >>>>> (portable, but bounded growth) >>>>> For “optional on commit”, maybe make it a catalog capability (fast >>>>> commits if the catalog can serve metadata), but still support an >>>>> export/materialize step when portability is needed. >>>>> >>>>> Thanks, >>>>> Huaxin >>>>> >>>>> On Tue, Feb 10, 2026 at 2:58 PM Anton Okolnychyi <[email protected]> >>>>> wrote: >>>>>> >>>>>> I don't think we have any consensus or concrete plan. In fact, I don't >>>>>> know what my personal preference is at this point. The intention of this >>>>>> thread is to gain that clarity. I don't think removing the root metadata >>>>>> file entirely is a good idea. It is great to have a way to describe the >>>>>> entire state of a table in a file. We just need to find a solution for >>>>>> streaming appends that suffer from the increasing size of the root >>>>>> metadata file. >>>>>> >>>>>> Like I said, making the generation of the json file on commit optional >>>>>> is one way to solve this problem. We can also think about offloading >>>>>> pieces of it to external files (say old snapshots). This would mean we >>>>>> still have to write the root file on each commit but it will be smaller. >>>>>> One clear downside is more complicated maintenance. >>>>>> >>>>>> Any other ideas/thoughts/feedback? Do people see this as a problem? >>>>>> >>>>>> >>>>>> вт, 10 лют. 2026 р. о 14:18 Yufei Gu <[email protected]> пише: >>>>>>> >>>>>>> Hi Anton, thanks for raising this. I would really like to make this >>>>>>> optional and then build additional use cases on top of it. For example, >>>>>>> a catalog like IRC could completely eliminate storage IO during commit >>>>>>> and load, which is a big win. It could also provide better protection >>>>>>> for encrypted Iceberg tables, since metadata.json files are plain text >>>>>>> today. >>>>>>> >>>>>>> That said, do we have consensus that metadata.json can be optional? >>>>>>> There are real portability concerns, and engine-side work also needs >>>>>>> consideration. For example, static tables and the Spark driver still >>>>>>> expect to read this file directly from storage. It feels like the first >>>>>>> step here is aligning on whether metadata.json can be optional at all, >>>>>>> before we go deeper into how we get rid of. What do you think? >>>>>>> >>>>>>> Yufei >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 10, 2026 at 11:23 AM Anton Okolnychyi >>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>> While it may be common knowledge among Iceberg devs that writing the >>>>>>>> root JSON file on commit is somewhat optional with a right catalog, >>>>>>>> what can we do in V4 to solve this problem for all? My problem is the >>>>>>>> suboptimal behavior that new users get by default with HMS or Hadoop >>>>>>>> catalogs and how this impacts their perception of Iceberg. We are >>>>>>>> doing a bunch of work for streaming (e.g. changelog scans, single file >>>>>>>> commits, etc), but the need to write the root JSON file may cancel all >>>>>>>> of that. >>>>>>>> >>>>>>>> Let me throw some ideas out there. >>>>>>>> >>>>>>>> - Describe how catalogs can make the generation of the root metadata >>>>>>>> file optional in the spec. Ideally, implement that in a built-in >>>>>>>> catalog of choice as a reference implementation. >>>>>>>> - Offload portions of the root metadata file to external files and >>>>>>>> keep references to them. >>>>>>>> >>>>>>>> Thoughts? >>>>>>>> >>>>>>>> - Anton >>>>>>>> >>>>>>>>
