Thank you for starting the document Yufei, I was planning on doing some discovering through the code source later today. Your doc is perfect, can you please give write access?
On Fri, Apr 17, 2026 at 2:48 PM Yufei Gu <[email protected]> wrote: > Thanks Péter for highlighting the Hive case. I’ve created a one-page doc > to track specific places with hard dependencies on the file in storage to > help ground the ongoing discussion: > https://docs.google.com/document/d/17PBhJ0IBxHxMKvCW6CstGOp7cZnboMDdpO6BCPO2kmA/edit?usp=sharing > > Yufei > > > On Fri, Apr 17, 2026 at 12:54 AM Péter Váry <[email protected]> > wrote: > >> I don’t think splitting the metadata.json is the right approach. >> >> Making it optional in V4 could be a better direction, but many systems >> rely on it today. For example, Hive uses SerializableTable to ensure >> consistency between query planning and execution. As mentioned earlier, >> SerializableTable relies on StaticTableOperations, which reads the table >> metadata from the expected metadataFileLocation. Writing out a >> metadata.json each time we serialize a table could therefore introduce >> performance bottlenecks. >> >> That said, I agree we need a way to speed up metadata reads and updates >> to support more frequent table operations. Removing the need to serialize >> the metadata JSON could be a good path forward, as long as the metadata >> remains fully and reliably accessible whenever it is required. >> >> Yufei Gu <[email protected]> ezt írta (időpont: 2026. ápr. 17., P, >> 0:19): >> >>> Ryan, StaticTableOperations is the one reading the metadata.json files. >>> Everything depending on it makes the assumption that metadata.json is in >>> storage, including almost all metadata tables and some Spark actions. The >>> executor use case I mentioned is somewhere like here, >>> https://github.com/apache/iceberg/blob/dde712ec9ed6c9d28183ee4615d50f97b246af5d/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L215 >>> >>> Broadcast<Table> tableBroadcast = >>> sparkContext.broadcast(SerializableTableWithSize.copyOf(table)); >>> >>> The driver broadcasts a trimmed table metadata, and executor will pick >>> up the full table metadata from storage. >>> >>> Yufei >>> >>> >>> On Thu, Apr 16, 2026 at 2:24 PM huaxin gao <[email protected]> >>> wrote: >>> >>>> +1 to the direction Ryan and Yufei outlined. Making metadata.json >>>> optional in storage for v4 and fixing the REST client to not request all >>>> snapshots seems like the right path forward. >>>> >>>> On the executor side, Prashant's earlier work in #14944 >>>> <https://github.com/apache/iceberg/pull/14944> looks like a good >>>> starting point to remove the direct metadata file reads from >>>> SerializableTable. Happy to help review when that gets revived. >>>> >>>> Thanks, >>>> >>>> Huaxin >>>> >>>> On Thu, Apr 16, 2026 at 12:43 PM Amogh Jahagirdar <[email protected]> >>>> wrote: >>>> >>>>> I pretty much agree with about everything Yufei and Ryan said. I >>>>> feel like sharding the metadata json across multiple files is >>>>> overcomplicated when the REST protocol already abstracts which snapshots a >>>>> client even sees. It would be much better for us to make progress on >>>>> relaxing the requirement for metadata.json storage. We should also look at >>>>> the client implementation defaults to make sure those are sane as well. >>>>> >>>>> +1 to removing the code where executors fetch full metadata from the >>>>> metadata.json. I remember when we did the analysis on that PR, if I recall >>>>> correctly, that effectively is dead code so I think there's a good cleanup >>>>> opportunity there. >>>>> >>>>> Thanks, >>>>> Amogh Jahagirdar >>>>> >>>>> On Thu, Apr 16, 2026 at 11:09 AM Prashant Singh < >>>>> [email protected]> wrote: >>>>> >>>>>> Hey Ryan / Yufei, >>>>>> Here is my one attempt to get rid of that, it was from gov pov, it's >>>>>> mostly from Serializable Table [1] >>>>>> If we are all onboard, I can clean up and revive this effort. >>>>>> >>>>>> [1] >>>>>> https://github.com/apache/iceberg/pull/14944#issuecomment-3812676977 >>>>>> >>>>>> Best, >>>>>> Prashant Singh >>>>>> >>>>>> On Thu, Apr 16, 2026 at 9:08 AM Ryan Blue <[email protected]> wrote: >>>>>> >>>>>>> They do? Where is that? >>>>>>> >>>>>>> Definitely something we should remove as soon as we can. >>>>>>> >>>>>>> On Thu, Apr 16, 2026 at 8:58 AM Yufei Gu <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> To add to that, some engines like Spark still assume metadata.json >>>>>>>> exists in storage. The executors load the file directly instead of >>>>>>>> checking >>>>>>>> the REST catalog for table metadata. We will need to modify that. >>>>>>>> >>>>>>>> Yufei >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Apr 16, 2026 at 8:45 AM Ryan Blue <[email protected]> wrote: >>>>>>>> >>>>>>>>> I think that the problem of large metadata.json files is largely >>>>>>>>> solved by the REST protocol, which does not need to send snapshots to >>>>>>>>> clients. I agree with Anton's suggestion to relax the requirement >>>>>>>>> that the >>>>>>>>> metadata.json file has to be stored somewhere (for v4). As long as >>>>>>>>> catalogs >>>>>>>>> are required to be able to produce the full content of metadata.json >>>>>>>>> when >>>>>>>>> loading the table for a client requesting all snapshots, we don't >>>>>>>>> need to >>>>>>>>> worry about storing the file. >>>>>>>>> >>>>>>>>> There are two things to keep in mind though: >>>>>>>>> 1. I think the current Java REST implementation still requests all >>>>>>>>> snapshots to commit, which we should fix >>>>>>>>> 2. I think it is a bad idea to split up the metadata.json file for >>>>>>>>> non-REST catalogs. This introduces way too much complexity that >>>>>>>>> necessarily >>>>>>>>> leaks out of the catalog implementation. I don't think this is a >>>>>>>>> problem >>>>>>>>> worth solving when we have a perfectly good solution that has >>>>>>>>> significant >>>>>>>>> benefits. >>>>>>>>> >>>>>>>>> Ryan >>>>>>>>> >>>>>>>>> On Thu, Apr 16, 2026 at 12:13 AM Innocent Djiofack < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Thank you for the replies. Steven the change is scoped to only >>>>>>>>>> offloading snapshots history. Yufei, yes this is a large change. I >>>>>>>>>> agreed that removing the requirement for a metadata.json file per >>>>>>>>>> commit in >>>>>>>>>> storage would help most of the concerns. If there is already a >>>>>>>>>> design doc >>>>>>>>>> for that direction, please share it with me. If not, I can start >>>>>>>>>> something >>>>>>>>>> around that line of reasoning. >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> On Tue, Apr 14, 2026 at 4:09 PM Yufei Gu <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Separating snapshot history from table metadata feels like a >>>>>>>>>>> large, invasive change since it would require updates across all >>>>>>>>>>> clients >>>>>>>>>>> and engines. If we instead remove the requirement for a >>>>>>>>>>> metadata.json file >>>>>>>>>>> per commit in storage, many of the current concerns could be >>>>>>>>>>> addressed. >>>>>>>>>>> This seems like a more practical path forward. There are already >>>>>>>>>>> multiple discussions over there. I'd suggest to move forward with >>>>>>>>>>> that >>>>>>>>>>> direction. >>>>>>>>>>> >>>>>>>>>>> Yufei >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 14, 2026 at 8:44 AM Steven Wu <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I understand the problem we are trying to solve here. But the >>>>>>>>>>>> actual proposed solution is unclear to me. The proposal seems lack >>>>>>>>>>>> some >>>>>>>>>>>> details in the actual design/solution. >>>>>>>>>>>> >>>>>>>>>>>> How do the proposed snapshot read and write APIs differ from >>>>>>>>>>>> the current APIs? I can't tell the difference. >>>>>>>>>>>> >>>>>>>>>>>> > Once defined, this interface could be implemented by various >>>>>>>>>>>> backing stores, such as another file or even a Catalog. >>>>>>>>>>>> >>>>>>>>>>>> To support offloading, we probably have to update the table >>>>>>>>>>>> metadata in the table spec >>>>>>>>>>>> <https://iceberg.apache.org/spec/#table-metadata-fields>. Does >>>>>>>>>>>> this depend on making metadata.json file optional? Or is this >>>>>>>>>>>> limited to >>>>>>>>>>>> just externalizing the snapshot list? >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Apr 14, 2026 at 2:53 AM Jean-Baptiste Onofré < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Innocent >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe it's a kind of redundant with the V4 initiative ? >>>>>>>>>>>>> What are your thoughts on this? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks! >>>>>>>>>>>>> >>>>>>>>>>>>> Regards >>>>>>>>>>>>> JB >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Apr 14, 2026 at 6:44 AM Innocent Djiofack < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello Everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> My name is Innocent and I have enjoyed working on the apache >>>>>>>>>>>>>> Iceberg project so far and have learned a lot from people in the >>>>>>>>>>>>>> group. >>>>>>>>>>>>>> I wanted to follow up on a concern raised by Anton around the >>>>>>>>>>>>>> growing size of metadata.json and the problems it brings. Before >>>>>>>>>>>>>> going >>>>>>>>>>>>>> ahead and doing the implementation work, I wanted to share the >>>>>>>>>>>>>> high level >>>>>>>>>>>>>> thinking with the community and get feedback. You will find the >>>>>>>>>>>>>> link to the >>>>>>>>>>>>>> proposal here >>>>>>>>>>>>>> <https://docs.google.com/document/d/1xpzpsA9BGSkxo58yUhSdDQaSu7_ITQLFmGarEOyM8P0/edit?tab=t.0#heading=h.7g59t9p9o1xi> >>>>>>>>>>>>>> I >>>>>>>>>>>>>> would appreciate comments and feedback on it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> *DJIOFACK INNOCENT* >>>>>>>>>>>>>> *"Be better than the day before!" -* >>>>>>>>>>>>>> *+1 404 751 8024* >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> *DJIOFACK INNOCENT* >>>>>>>>>> *"Be better than the day before!" -* >>>>>>>>>> *+1 404 751 8024* >>>>>>>>>> >>>>>>>>> -- *DJIOFACK INNOCENT* *"Be better than the day before!" -* *+1 404 751 8024*
