Hey Ryan / Yufei, Here is my one attempt to get rid of that, it was from gov pov, it's mostly from Serializable Table [1] If we are all onboard, I can clean up and revive this effort.
[1] https://github.com/apache/iceberg/pull/14944#issuecomment-3812676977 Best, Prashant Singh On Thu, Apr 16, 2026 at 9:08 AM Ryan Blue <[email protected]> wrote: > They do? Where is that? > > Definitely something we should remove as soon as we can. > > On Thu, Apr 16, 2026 at 8:58 AM Yufei Gu <[email protected]> wrote: > >> To add to that, some engines like Spark still assume metadata.json exists >> in storage. The executors load the file directly instead of checking the >> REST catalog for table metadata. We will need to modify that. >> >> Yufei >> >> >> On Thu, Apr 16, 2026 at 8:45 AM Ryan Blue <[email protected]> wrote: >> >>> I think that the problem of large metadata.json files is largely solved >>> by the REST protocol, which does not need to send snapshots to clients. I >>> agree with Anton's suggestion to relax the requirement that the >>> metadata.json file has to be stored somewhere (for v4). As long as catalogs >>> are required to be able to produce the full content of metadata.json when >>> loading the table for a client requesting all snapshots, we don't need to >>> worry about storing the file. >>> >>> There are two things to keep in mind though: >>> 1. I think the current Java REST implementation still requests all >>> snapshots to commit, which we should fix >>> 2. I think it is a bad idea to split up the metadata.json file for >>> non-REST catalogs. This introduces way too much complexity that necessarily >>> leaks out of the catalog implementation. I don't think this is a problem >>> worth solving when we have a perfectly good solution that has significant >>> benefits. >>> >>> Ryan >>> >>> On Thu, Apr 16, 2026 at 12:13 AM Innocent Djiofack < >>> [email protected]> wrote: >>> >>>> Hi all, >>>> >>>> Thank you for the replies. Steven the change is scoped to only >>>> offloading snapshots history. Yufei, yes this is a large change. I >>>> agreed that removing the requirement for a metadata.json file per commit in >>>> storage would help most of the concerns. If there is already a design doc >>>> for that direction, please share it with me. If not, I can start something >>>> around that line of reasoning. >>>> >>>> Thanks. >>>> >>>> On Tue, Apr 14, 2026 at 4:09 PM Yufei Gu <[email protected]> wrote: >>>> >>>>> Separating snapshot history from table metadata feels like a large, >>>>> invasive change since it would require updates across all clients and >>>>> engines. If we instead remove the requirement for a metadata.json file per >>>>> commit in storage, many of the current concerns could be addressed. This >>>>> seems like a more practical path forward. There are already >>>>> multiple discussions over there. I'd suggest to move forward with that >>>>> direction. >>>>> >>>>> Yufei >>>>> >>>>> >>>>> On Tue, Apr 14, 2026 at 8:44 AM Steven Wu <[email protected]> >>>>> wrote: >>>>> >>>>>> I understand the problem we are trying to solve here. But the actual >>>>>> proposed solution is unclear to me. The proposal seems lack some details >>>>>> in >>>>>> the actual design/solution. >>>>>> >>>>>> How do the proposed snapshot read and write APIs differ from the >>>>>> current APIs? I can't tell the difference. >>>>>> >>>>>> > Once defined, this interface could be implemented by various >>>>>> backing stores, such as another file or even a Catalog. >>>>>> >>>>>> To support offloading, we probably have to update the table metadata >>>>>> in the table spec >>>>>> <https://iceberg.apache.org/spec/#table-metadata-fields>. Does this >>>>>> depend on making metadata.json file optional? Or is this limited to just >>>>>> externalizing the snapshot list? >>>>>> >>>>>> On Tue, Apr 14, 2026 at 2:53 AM Jean-Baptiste Onofré <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Innocent >>>>>>> >>>>>>> Maybe it's a kind of redundant with the V4 initiative ? >>>>>>> What are your thoughts on this? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Regards >>>>>>> JB >>>>>>> >>>>>>> On Tue, Apr 14, 2026 at 6:44 AM Innocent Djiofack < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hello Everyone, >>>>>>>> >>>>>>>> My name is Innocent and I have enjoyed working on the apache >>>>>>>> Iceberg project so far and have learned a lot from people in the group. >>>>>>>> I wanted to follow up on a concern raised by Anton around the >>>>>>>> growing size of metadata.json and the problems it brings. Before going >>>>>>>> ahead and doing the implementation work, I wanted to share the high >>>>>>>> level >>>>>>>> thinking with the community and get feedback. You will find the link >>>>>>>> to the >>>>>>>> proposal here >>>>>>>> <https://docs.google.com/document/d/1xpzpsA9BGSkxo58yUhSdDQaSu7_ITQLFmGarEOyM8P0/edit?tab=t.0#heading=h.7g59t9p9o1xi> >>>>>>>> I >>>>>>>> would appreciate comments and feedback on it. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> *DJIOFACK INNOCENT* >>>>>>>> *"Be better than the day before!" -* >>>>>>>> *+1 404 751 8024* >>>>>>>> >>>>>>> >>>> >>>> -- >>>> >>>> *DJIOFACK INNOCENT* >>>> *"Be better than the day before!" -* >>>> *+1 404 751 8024* >>>> >>>
