Hi Anton, thanks for raising this. I would really like to make this optional and then build additional use cases on top of it. For example, a catalog like IRC could completely eliminate storage IO during commit and load, which is a big win. It could also provide better protection for encrypted Iceberg tables, since metadata.json files are plain text today.
That said, do we have consensus that metadata.json can be optional? There are real portability concerns, and engine-side work also needs consideration. For example, static tables and the Spark driver still expect to read this file directly from storage. It feels like the first step here is aligning on whether metadata.json can be optional at all, before we go deeper into how we get rid of. What do you think? Yufei On Tue, Feb 10, 2026 at 11:23 AM Anton Okolnychyi <[email protected]> wrote: > While it may be common knowledge among Iceberg devs that writing the root > JSON file on commit is somewhat optional with a right catalog, what can we > do in V4 to solve this problem for all? My problem is the suboptimal > behavior that new users get by default with HMS or Hadoop catalogs and how > this impacts their perception of Iceberg. We are doing a bunch of work for > streaming (e.g. changelog scans, single file commits, etc), but the need to > write the root JSON file may cancel all of that. > > Let me throw some ideas out there. > > - Describe how catalogs can make the generation of the root metadata file > optional in the spec. Ideally, implement that in a built-in catalog of > choice as a reference implementation. > - Offload portions of the root metadata file to external files and keep > references to them. > > Thoughts? > > - Anton > > >
