Re: [DISCUSS] Offloading Snapshots from Metadata.json

Innocent Djiofack Fri, 17 Apr 2026 14:52:45 -0700

Thank you for starting the document Yufei, I was planning on doing some
discovering through the code source later today. Your doc is perfect, can
you please give write access?


On Fri, Apr 17, 2026 at 2:48 PM Yufei Gu <[email protected]> wrote:

> Thanks Péter for highlighting the Hive case. I’ve created a one-page doc
> to track specific places with hard dependencies on the file in storage to
> help ground the ongoing discussion:
> https://docs.google.com/document/d/17PBhJ0IBxHxMKvCW6CstGOp7cZnboMDdpO6BCPO2kmA/edit?usp=sharing
>
> Yufei
>
>
> On Fri, Apr 17, 2026 at 12:54 AM Péter Váry <[email protected]>
> wrote:
>
>> I don’t think splitting the metadata.json is the right approach.
>>
>> Making it optional in V4 could be a better direction, but many systems
>> rely on it today. For example, Hive uses SerializableTable to ensure
>> consistency between query planning and execution. As mentioned earlier,
>> SerializableTable relies on StaticTableOperations, which reads the table
>> metadata from the expected metadataFileLocation. Writing out a
>> metadata.json each time we serialize a table could therefore introduce
>> performance bottlenecks.
>>
>> That said, I agree we need a way to speed up metadata reads and updates
>> to support more frequent table operations. Removing the need to serialize
>> the metadata JSON could be a good path forward, as long as the metadata
>> remains fully and reliably accessible whenever it is required.
>>
>> Yufei Gu <[email protected]> ezt írta (időpont: 2026. ápr. 17., P,
>> 0:19):
>>
>>> Ryan, StaticTableOperations is the one reading the metadata.json files.
>>> Everything depending on it makes the assumption that metadata.json is in
>>> storage, including almost all metadata tables and some Spark actions. The
>>> executor use case I mentioned is somewhere like here,
>>> https://github.com/apache/iceberg/blob/dde712ec9ed6c9d28183ee4615d50f97b246af5d/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L215
>>>
>>>  Broadcast<Table> tableBroadcast =
>>>         sparkContext.broadcast(SerializableTableWithSize.copyOf(table));
>>>
>>> The driver broadcasts a trimmed table metadata, and executor will pick
>>> up the full table metadata from storage.
>>>
>>> Yufei
>>>
>>>
>>> On Thu, Apr 16, 2026 at 2:24 PM huaxin gao <[email protected]>
>>> wrote:
>>>
>>>> +1 to the direction Ryan and Yufei outlined. Making metadata.json
>>>> optional in storage for v4 and fixing the REST client to not request all
>>>> snapshots seems like the right path forward.
>>>>
>>>> On the executor side, Prashant's earlier work in #14944
>>>> <https://github.com/apache/iceberg/pull/14944> looks like a good
>>>> starting point to remove the direct metadata file reads from
>>>> SerializableTable. Happy to help review when that gets revived.
>>>>
>>>> Thanks,
>>>>
>>>> Huaxin
>>>>
>>>> On Thu, Apr 16, 2026 at 12:43 PM Amogh Jahagirdar <[email protected]>
>>>> wrote:
>>>>
>>>>> I pretty much agree with about everything Yufei and Ryan said. I
>>>>> feel like sharding the metadata json across multiple files is
>>>>> overcomplicated when the REST protocol already abstracts which snapshots a
>>>>> client even sees. It would be much better for us to make progress on
>>>>> relaxing the requirement for metadata.json storage. We should also look at
>>>>> the client implementation defaults to make sure those are sane as well.
>>>>>
>>>>> +1 to removing the code where executors fetch full metadata from the
>>>>> metadata.json. I remember when we did the analysis on that PR, if I recall
>>>>> correctly, that effectively is dead code so I think there's a good cleanup
>>>>> opportunity there.
>>>>>
>>>>> Thanks,
>>>>> Amogh Jahagirdar
>>>>>
>>>>> On Thu, Apr 16, 2026 at 11:09 AM Prashant Singh <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hey Ryan / Yufei,
>>>>>> Here is my one attempt to get rid of that, it was from gov pov, it's
>>>>>> mostly from Serializable Table [1]
>>>>>> If we are all onboard, I can clean up and revive this effort.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/apache/iceberg/pull/14944#issuecomment-3812676977
>>>>>>
>>>>>> Best,
>>>>>> Prashant Singh
>>>>>>
>>>>>> On Thu, Apr 16, 2026 at 9:08 AM Ryan Blue <[email protected]> wrote:
>>>>>>
>>>>>>> They do? Where is that?
>>>>>>>
>>>>>>> Definitely something we should remove as soon as we can.
>>>>>>>
>>>>>>> On Thu, Apr 16, 2026 at 8:58 AM Yufei Gu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> To add to that, some engines like Spark still assume metadata.json
>>>>>>>> exists in storage. The executors load the file directly instead of 
>>>>>>>> checking
>>>>>>>> the REST catalog for table metadata. We will need to modify that.
>>>>>>>>
>>>>>>>> Yufei
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 16, 2026 at 8:45 AM Ryan Blue <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> I think that the problem of large metadata.json files is largely
>>>>>>>>> solved by the REST protocol, which does not need to send snapshots to
>>>>>>>>> clients. I agree with Anton's suggestion to relax the requirement 
>>>>>>>>> that the
>>>>>>>>> metadata.json file has to be stored somewhere (for v4). As long as 
>>>>>>>>> catalogs
>>>>>>>>> are required to be able to produce the full content of metadata.json 
>>>>>>>>> when
>>>>>>>>> loading the table for a client requesting all snapshots, we don't 
>>>>>>>>> need to
>>>>>>>>> worry about storing the file.
>>>>>>>>>
>>>>>>>>> There are two things to keep in mind though:
>>>>>>>>> 1. I think the current Java REST implementation still requests all
>>>>>>>>> snapshots to commit, which we should fix
>>>>>>>>> 2. I think it is a bad idea to split up the metadata.json file for
>>>>>>>>> non-REST catalogs. This introduces way too much complexity that 
>>>>>>>>> necessarily
>>>>>>>>> leaks out of the catalog implementation. I don't think this is a 
>>>>>>>>> problem
>>>>>>>>> worth solving when we have a perfectly good solution that has 
>>>>>>>>> significant
>>>>>>>>> benefits.
>>>>>>>>>
>>>>>>>>> Ryan
>>>>>>>>>
>>>>>>>>> On Thu, Apr 16, 2026 at 12:13 AM Innocent Djiofack <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Thank you for the replies. Steven the change is scoped to only
>>>>>>>>>> offloading snapshots history. Yufei, yes this is a large change. I
>>>>>>>>>> agreed that removing the requirement for a metadata.json file per 
>>>>>>>>>> commit in
>>>>>>>>>> storage would help most of the concerns. If there is already a 
>>>>>>>>>> design doc
>>>>>>>>>> for that direction, please share it with me. If not, I can start 
>>>>>>>>>> something
>>>>>>>>>> around that line of reasoning.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 14, 2026 at 4:09 PM Yufei Gu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Separating snapshot history from table metadata feels like a
>>>>>>>>>>> large, invasive change since it would require updates across all 
>>>>>>>>>>> clients
>>>>>>>>>>> and engines. If we instead remove the requirement for a 
>>>>>>>>>>> metadata.json file
>>>>>>>>>>> per commit in storage, many of the current concerns could be 
>>>>>>>>>>> addressed.
>>>>>>>>>>> This seems like a more practical path forward. There are already
>>>>>>>>>>> multiple discussions over there. I'd suggest to move forward with 
>>>>>>>>>>> that
>>>>>>>>>>> direction.
>>>>>>>>>>>
>>>>>>>>>>> Yufei
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 14, 2026 at 8:44 AM Steven Wu <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I understand the problem we are trying to solve here. But the
>>>>>>>>>>>> actual proposed solution is unclear to me. The proposal seems lack 
>>>>>>>>>>>> some
>>>>>>>>>>>> details in the actual design/solution.
>>>>>>>>>>>>
>>>>>>>>>>>> How do the proposed snapshot read and write APIs differ from
>>>>>>>>>>>> the current APIs? I can't tell the difference.
>>>>>>>>>>>>
>>>>>>>>>>>> > Once defined, this interface could be implemented by various
>>>>>>>>>>>> backing stores, such as another file or even a Catalog.
>>>>>>>>>>>>
>>>>>>>>>>>> To support offloading, we probably have to update the table
>>>>>>>>>>>> metadata in the table spec
>>>>>>>>>>>> <https://iceberg.apache.org/spec/#table-metadata-fields>. Does
>>>>>>>>>>>> this depend on making metadata.json file optional? Or is this 
>>>>>>>>>>>> limited to
>>>>>>>>>>>> just externalizing the snapshot list?
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 14, 2026 at 2:53 AM Jean-Baptiste Onofré <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Innocent
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe it's a kind of redundant with the V4 initiative ?
>>>>>>>>>>>>> What are your thoughts on this?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> JB
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 14, 2026 at 6:44 AM Innocent Djiofack <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello Everyone,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My name is Innocent and I have enjoyed working on the apache
>>>>>>>>>>>>>> Iceberg project so far and have learned a lot from people in the 
>>>>>>>>>>>>>> group.
>>>>>>>>>>>>>> I wanted to follow up on a concern raised by Anton around the
>>>>>>>>>>>>>> growing size of metadata.json and the problems it brings. Before 
>>>>>>>>>>>>>> going
>>>>>>>>>>>>>> ahead and doing the implementation work, I wanted to share the 
>>>>>>>>>>>>>> high level
>>>>>>>>>>>>>> thinking with the community and get feedback. You will find the 
>>>>>>>>>>>>>> link to the
>>>>>>>>>>>>>> proposal here
>>>>>>>>>>>>>> <https://docs.google.com/document/d/1xpzpsA9BGSkxo58yUhSdDQaSu7_ITQLFmGarEOyM8P0/edit?tab=t.0#heading=h.7g59t9p9o1xi>
>>>>>>>>>>>>>>  I
>>>>>>>>>>>>>> would appreciate comments and feedback on it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *DJIOFACK INNOCENT*
>>>>>>>>>>>>>> *"Be better than the day before!" -*
>>>>>>>>>>>>>> *+1 404 751 8024*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> *DJIOFACK INNOCENT*
>>>>>>>>>> *"Be better than the day before!" -*
>>>>>>>>>> *+1 404 751 8024*
>>>>>>>>>>
>>>>>>>>>

-- 

*DJIOFACK INNOCENT*
*"Be better than the day before!" -*
*+1 404 751 8024*

Re: [DISCUSS] Offloading Snapshots from Metadata.json

Reply via email to