Write access is enabled. Feel free to add more to the document, Innocent. Yufei
On Fri, Apr 17, 2026 at 2:52 PM Innocent Djiofack <[email protected]> wrote: > Thank you for starting the document Yufei, I was planning on doing some > discovering through the code source later today. Your doc is perfect, can > you please give write access? > > On Fri, Apr 17, 2026 at 2:48 PM Yufei Gu <[email protected]> wrote: > >> Thanks Péter for highlighting the Hive case. I’ve created a one-page doc >> to track specific places with hard dependencies on the file in storage to >> help ground the ongoing discussion: >> https://docs.google.com/document/d/17PBhJ0IBxHxMKvCW6CstGOp7cZnboMDdpO6BCPO2kmA/edit?usp=sharing >> >> Yufei >> >> >> On Fri, Apr 17, 2026 at 12:54 AM Péter Váry <[email protected]> >> wrote: >> >>> I don’t think splitting the metadata.json is the right approach. >>> >>> Making it optional in V4 could be a better direction, but many systems >>> rely on it today. For example, Hive uses SerializableTable to ensure >>> consistency between query planning and execution. As mentioned earlier, >>> SerializableTable relies on StaticTableOperations, which reads the table >>> metadata from the expected metadataFileLocation. Writing out a >>> metadata.json each time we serialize a table could therefore introduce >>> performance bottlenecks. >>> >>> That said, I agree we need a way to speed up metadata reads and updates >>> to support more frequent table operations. Removing the need to serialize >>> the metadata JSON could be a good path forward, as long as the metadata >>> remains fully and reliably accessible whenever it is required. >>> >>> Yufei Gu <[email protected]> ezt írta (időpont: 2026. ápr. 17., P, >>> 0:19): >>> >>>> Ryan, StaticTableOperations is the one reading the metadata.json files. >>>> Everything depending on it makes the assumption that metadata.json is in >>>> storage, including almost all metadata tables and some Spark actions. The >>>> executor use case I mentioned is somewhere like here, >>>> https://github.com/apache/iceberg/blob/dde712ec9ed6c9d28183ee4615d50f97b246af5d/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L215 >>>> >>>> Broadcast<Table> tableBroadcast = >>>> sparkContext.broadcast(SerializableTableWithSize.copyOf(table)); >>>> >>>> The driver broadcasts a trimmed table metadata, and executor will pick >>>> up the full table metadata from storage. >>>> >>>> Yufei >>>> >>>> >>>> On Thu, Apr 16, 2026 at 2:24 PM huaxin gao <[email protected]> >>>> wrote: >>>> >>>>> +1 to the direction Ryan and Yufei outlined. Making metadata.json >>>>> optional in storage for v4 and fixing the REST client to not request all >>>>> snapshots seems like the right path forward. >>>>> >>>>> On the executor side, Prashant's earlier work in #14944 >>>>> <https://github.com/apache/iceberg/pull/14944> looks like a good >>>>> starting point to remove the direct metadata file reads from >>>>> SerializableTable. Happy to help review when that gets revived. >>>>> >>>>> Thanks, >>>>> >>>>> Huaxin >>>>> >>>>> On Thu, Apr 16, 2026 at 12:43 PM Amogh Jahagirdar <[email protected]> >>>>> wrote: >>>>> >>>>>> I pretty much agree with about everything Yufei and Ryan said. I >>>>>> feel like sharding the metadata json across multiple files is >>>>>> overcomplicated when the REST protocol already abstracts which snapshots >>>>>> a >>>>>> client even sees. It would be much better for us to make progress on >>>>>> relaxing the requirement for metadata.json storage. We should also look >>>>>> at >>>>>> the client implementation defaults to make sure those are sane as well. >>>>>> >>>>>> +1 to removing the code where executors fetch full metadata from the >>>>>> metadata.json. I remember when we did the analysis on that PR, if I >>>>>> recall >>>>>> correctly, that effectively is dead code so I think there's a good >>>>>> cleanup >>>>>> opportunity there. >>>>>> >>>>>> Thanks, >>>>>> Amogh Jahagirdar >>>>>> >>>>>> On Thu, Apr 16, 2026 at 11:09 AM Prashant Singh < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hey Ryan / Yufei, >>>>>>> Here is my one attempt to get rid of that, it was from gov pov, it's >>>>>>> mostly from Serializable Table [1] >>>>>>> If we are all onboard, I can clean up and revive this effort. >>>>>>> >>>>>>> [1] >>>>>>> https://github.com/apache/iceberg/pull/14944#issuecomment-3812676977 >>>>>>> >>>>>>> Best, >>>>>>> Prashant Singh >>>>>>> >>>>>>> On Thu, Apr 16, 2026 at 9:08 AM Ryan Blue <[email protected]> wrote: >>>>>>> >>>>>>>> They do? Where is that? >>>>>>>> >>>>>>>> Definitely something we should remove as soon as we can. >>>>>>>> >>>>>>>> On Thu, Apr 16, 2026 at 8:58 AM Yufei Gu <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> To add to that, some engines like Spark still assume metadata.json >>>>>>>>> exists in storage. The executors load the file directly instead of >>>>>>>>> checking >>>>>>>>> the REST catalog for table metadata. We will need to modify that. >>>>>>>>> >>>>>>>>> Yufei >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Apr 16, 2026 at 8:45 AM Ryan Blue <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I think that the problem of large metadata.json files is largely >>>>>>>>>> solved by the REST protocol, which does not need to send snapshots to >>>>>>>>>> clients. I agree with Anton's suggestion to relax the requirement >>>>>>>>>> that the >>>>>>>>>> metadata.json file has to be stored somewhere (for v4). As long as >>>>>>>>>> catalogs >>>>>>>>>> are required to be able to produce the full content of metadata.json >>>>>>>>>> when >>>>>>>>>> loading the table for a client requesting all snapshots, we don't >>>>>>>>>> need to >>>>>>>>>> worry about storing the file. >>>>>>>>>> >>>>>>>>>> There are two things to keep in mind though: >>>>>>>>>> 1. I think the current Java REST implementation still requests >>>>>>>>>> all snapshots to commit, which we should fix >>>>>>>>>> 2. I think it is a bad idea to split up the metadata.json file >>>>>>>>>> for non-REST catalogs. This introduces way too much complexity that >>>>>>>>>> necessarily leaks out of the catalog implementation. I don't think >>>>>>>>>> this is >>>>>>>>>> a problem worth solving when we have a perfectly good solution that >>>>>>>>>> has >>>>>>>>>> significant benefits. >>>>>>>>>> >>>>>>>>>> Ryan >>>>>>>>>> >>>>>>>>>> On Thu, Apr 16, 2026 at 12:13 AM Innocent Djiofack < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> Thank you for the replies. Steven the change is scoped to only >>>>>>>>>>> offloading snapshots history. Yufei, yes this is a large change. I >>>>>>>>>>> agreed that removing the requirement for a metadata.json file per >>>>>>>>>>> commit in >>>>>>>>>>> storage would help most of the concerns. If there is already a >>>>>>>>>>> design doc >>>>>>>>>>> for that direction, please share it with me. If not, I can start >>>>>>>>>>> something >>>>>>>>>>> around that line of reasoning. >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 14, 2026 at 4:09 PM Yufei Gu <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Separating snapshot history from table metadata feels like a >>>>>>>>>>>> large, invasive change since it would require updates across all >>>>>>>>>>>> clients >>>>>>>>>>>> and engines. If we instead remove the requirement for a >>>>>>>>>>>> metadata.json file >>>>>>>>>>>> per commit in storage, many of the current concerns could be >>>>>>>>>>>> addressed. >>>>>>>>>>>> This seems like a more practical path forward. There are already >>>>>>>>>>>> multiple discussions over there. I'd suggest to move forward with >>>>>>>>>>>> that >>>>>>>>>>>> direction. >>>>>>>>>>>> >>>>>>>>>>>> Yufei >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Apr 14, 2026 at 8:44 AM Steven Wu <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I understand the problem we are trying to solve here. But the >>>>>>>>>>>>> actual proposed solution is unclear to me. The proposal seems >>>>>>>>>>>>> lack some >>>>>>>>>>>>> details in the actual design/solution. >>>>>>>>>>>>> >>>>>>>>>>>>> How do the proposed snapshot read and write APIs differ from >>>>>>>>>>>>> the current APIs? I can't tell the difference. >>>>>>>>>>>>> >>>>>>>>>>>>> > Once defined, this interface could be implemented by >>>>>>>>>>>>> various backing stores, such as another file or even a Catalog. >>>>>>>>>>>>> >>>>>>>>>>>>> To support offloading, we probably have to update the table >>>>>>>>>>>>> metadata in the table spec >>>>>>>>>>>>> <https://iceberg.apache.org/spec/#table-metadata-fields>. >>>>>>>>>>>>> Does this depend on making metadata.json file optional? Or is >>>>>>>>>>>>> this limited >>>>>>>>>>>>> to just externalizing the snapshot list? >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Apr 14, 2026 at 2:53 AM Jean-Baptiste Onofré < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Innocent >>>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe it's a kind of redundant with the V4 initiative ? >>>>>>>>>>>>>> What are your thoughts on this? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards >>>>>>>>>>>>>> JB >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Apr 14, 2026 at 6:44 AM Innocent Djiofack < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello Everyone, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My name is Innocent and I have enjoyed working on the apache >>>>>>>>>>>>>>> Iceberg project so far and have learned a lot from people in >>>>>>>>>>>>>>> the group. >>>>>>>>>>>>>>> I wanted to follow up on a concern raised by Anton around >>>>>>>>>>>>>>> the growing size of metadata.json and the problems it brings. >>>>>>>>>>>>>>> Before going >>>>>>>>>>>>>>> ahead and doing the implementation work, I wanted to share the >>>>>>>>>>>>>>> high level >>>>>>>>>>>>>>> thinking with the community and get feedback. You will find the >>>>>>>>>>>>>>> link to the >>>>>>>>>>>>>>> proposal here >>>>>>>>>>>>>>> <https://docs.google.com/document/d/1xpzpsA9BGSkxo58yUhSdDQaSu7_ITQLFmGarEOyM8P0/edit?tab=t.0#heading=h.7g59t9p9o1xi> >>>>>>>>>>>>>>> I >>>>>>>>>>>>>>> would appreciate comments and feedback on it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *DJIOFACK INNOCENT* >>>>>>>>>>>>>>> *"Be better than the day before!" -* >>>>>>>>>>>>>>> *+1 404 751 8024* >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> *DJIOFACK INNOCENT* >>>>>>>>>>> *"Be better than the day before!" -* >>>>>>>>>>> *+1 404 751 8024* >>>>>>>>>>> >>>>>>>>>> > > -- > > *DJIOFACK INNOCENT* > *"Be better than the day before!" -* > *+1 404 751 8024* >
