They do? Where is that? Definitely something we should remove as soon as we can.
On Thu, Apr 16, 2026 at 8:58 AM Yufei Gu <[email protected]> wrote: > To add to that, some engines like Spark still assume metadata.json exists > in storage. The executors load the file directly instead of checking the > REST catalog for table metadata. We will need to modify that. > > Yufei > > > On Thu, Apr 16, 2026 at 8:45 AM Ryan Blue <[email protected]> wrote: > >> I think that the problem of large metadata.json files is largely solved >> by the REST protocol, which does not need to send snapshots to clients. I >> agree with Anton's suggestion to relax the requirement that the >> metadata.json file has to be stored somewhere (for v4). As long as catalogs >> are required to be able to produce the full content of metadata.json when >> loading the table for a client requesting all snapshots, we don't need to >> worry about storing the file. >> >> There are two things to keep in mind though: >> 1. I think the current Java REST implementation still requests all >> snapshots to commit, which we should fix >> 2. I think it is a bad idea to split up the metadata.json file for >> non-REST catalogs. This introduces way too much complexity that necessarily >> leaks out of the catalog implementation. I don't think this is a problem >> worth solving when we have a perfectly good solution that has significant >> benefits. >> >> Ryan >> >> On Thu, Apr 16, 2026 at 12:13 AM Innocent Djiofack <[email protected]> >> wrote: >> >>> Hi all, >>> >>> Thank you for the replies. Steven the change is scoped to only >>> offloading snapshots history. Yufei, yes this is a large change. I >>> agreed that removing the requirement for a metadata.json file per commit in >>> storage would help most of the concerns. If there is already a design doc >>> for that direction, please share it with me. If not, I can start something >>> around that line of reasoning. >>> >>> Thanks. >>> >>> On Tue, Apr 14, 2026 at 4:09 PM Yufei Gu <[email protected]> wrote: >>> >>>> Separating snapshot history from table metadata feels like a large, >>>> invasive change since it would require updates across all clients and >>>> engines. If we instead remove the requirement for a metadata.json file per >>>> commit in storage, many of the current concerns could be addressed. This >>>> seems like a more practical path forward. There are already >>>> multiple discussions over there. I'd suggest to move forward with that >>>> direction. >>>> >>>> Yufei >>>> >>>> >>>> On Tue, Apr 14, 2026 at 8:44 AM Steven Wu <[email protected]> wrote: >>>> >>>>> I understand the problem we are trying to solve here. But the actual >>>>> proposed solution is unclear to me. The proposal seems lack some details >>>>> in >>>>> the actual design/solution. >>>>> >>>>> How do the proposed snapshot read and write APIs differ from the >>>>> current APIs? I can't tell the difference. >>>>> >>>>> > Once defined, this interface could be implemented by various >>>>> backing stores, such as another file or even a Catalog. >>>>> >>>>> To support offloading, we probably have to update the table metadata >>>>> in the table spec >>>>> <https://iceberg.apache.org/spec/#table-metadata-fields>. Does this >>>>> depend on making metadata.json file optional? Or is this limited to just >>>>> externalizing the snapshot list? >>>>> >>>>> On Tue, Apr 14, 2026 at 2:53 AM Jean-Baptiste Onofré <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Innocent >>>>>> >>>>>> Maybe it's a kind of redundant with the V4 initiative ? >>>>>> What are your thoughts on this? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Regards >>>>>> JB >>>>>> >>>>>> On Tue, Apr 14, 2026 at 6:44 AM Innocent Djiofack < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hello Everyone, >>>>>>> >>>>>>> My name is Innocent and I have enjoyed working on the apache Iceberg >>>>>>> project so far and have learned a lot from people in the group. >>>>>>> I wanted to follow up on a concern raised by Anton around the >>>>>>> growing size of metadata.json and the problems it brings. Before going >>>>>>> ahead and doing the implementation work, I wanted to share the high >>>>>>> level >>>>>>> thinking with the community and get feedback. You will find the link to >>>>>>> the >>>>>>> proposal here >>>>>>> <https://docs.google.com/document/d/1xpzpsA9BGSkxo58yUhSdDQaSu7_ITQLFmGarEOyM8P0/edit?tab=t.0#heading=h.7g59t9p9o1xi> >>>>>>> I >>>>>>> would appreciate comments and feedback on it. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> *DJIOFACK INNOCENT* >>>>>>> *"Be better than the day before!" -* >>>>>>> *+1 404 751 8024* >>>>>>> >>>>>> >>> >>> -- >>> >>> *DJIOFACK INNOCENT* >>> *"Be better than the day before!" -* >>> *+1 404 751 8024* >>> >>
