Re: [DISCUSS] Offloading Snapshots from Metadata.json

Yufei Gu Fri, 17 Apr 2026 14:57:41 -0700

Write access is enabled. Feel free to add more to the document, Innocent.

Yufei



On Fri, Apr 17, 2026 at 2:52 PM Innocent Djiofack <[email protected]>
wrote:

> Thank you for starting the document Yufei, I was planning on doing some
> discovering through the code source later today. Your doc is perfect, can
> you please give write access?
>
> On Fri, Apr 17, 2026 at 2:48 PM Yufei Gu <[email protected]> wrote:
>
>> Thanks Péter for highlighting the Hive case. I’ve created a one-page doc
>> to track specific places with hard dependencies on the file in storage to
>> help ground the ongoing discussion:
>> https://docs.google.com/document/d/17PBhJ0IBxHxMKvCW6CstGOp7cZnboMDdpO6BCPO2kmA/edit?usp=sharing
>>
>> Yufei
>>
>>
>> On Fri, Apr 17, 2026 at 12:54 AM Péter Váry <[email protected]>
>> wrote:
>>
>>> I don’t think splitting the metadata.json is the right approach.
>>>
>>> Making it optional in V4 could be a better direction, but many systems
>>> rely on it today. For example, Hive uses SerializableTable to ensure
>>> consistency between query planning and execution. As mentioned earlier,
>>> SerializableTable relies on StaticTableOperations, which reads the table
>>> metadata from the expected metadataFileLocation. Writing out a
>>> metadata.json each time we serialize a table could therefore introduce
>>> performance bottlenecks.
>>>
>>> That said, I agree we need a way to speed up metadata reads and updates
>>> to support more frequent table operations. Removing the need to serialize
>>> the metadata JSON could be a good path forward, as long as the metadata
>>> remains fully and reliably accessible whenever it is required.
>>>
>>> Yufei Gu <[email protected]> ezt írta (időpont: 2026. ápr. 17., P,
>>> 0:19):
>>>
>>>> Ryan, StaticTableOperations is the one reading the metadata.json files.
>>>> Everything depending on it makes the assumption that metadata.json is in
>>>> storage, including almost all metadata tables and some Spark actions. The
>>>> executor use case I mentioned is somewhere like here,
>>>> https://github.com/apache/iceberg/blob/dde712ec9ed6c9d28183ee4615d50f97b246af5d/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L215
>>>>
>>>>  Broadcast<Table> tableBroadcast =
>>>>         sparkContext.broadcast(SerializableTableWithSize.copyOf(table));
>>>>
>>>> The driver broadcasts a trimmed table metadata, and executor will pick
>>>> up the full table metadata from storage.
>>>>
>>>> Yufei
>>>>
>>>>
>>>> On Thu, Apr 16, 2026 at 2:24 PM huaxin gao <[email protected]>
>>>> wrote:
>>>>
>>>>> +1 to the direction Ryan and Yufei outlined. Making metadata.json
>>>>> optional in storage for v4 and fixing the REST client to not request all
>>>>> snapshots seems like the right path forward.
>>>>>
>>>>> On the executor side, Prashant's earlier work in #14944
>>>>> <https://github.com/apache/iceberg/pull/14944> looks like a good
>>>>> starting point to remove the direct metadata file reads from
>>>>> SerializableTable. Happy to help review when that gets revived.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Huaxin
>>>>>
>>>>> On Thu, Apr 16, 2026 at 12:43 PM Amogh Jahagirdar <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I pretty much agree with about everything Yufei and Ryan said. I
>>>>>> feel like sharding the metadata json across multiple files is
>>>>>> overcomplicated when the REST protocol already abstracts which snapshots 
>>>>>> a
>>>>>> client even sees. It would be much better for us to make progress on
>>>>>> relaxing the requirement for metadata.json storage. We should also look 
>>>>>> at
>>>>>> the client implementation defaults to make sure those are sane as well.
>>>>>>
>>>>>> +1 to removing the code where executors fetch full metadata from the
>>>>>> metadata.json. I remember when we did the analysis on that PR, if I 
>>>>>> recall
>>>>>> correctly, that effectively is dead code so I think there's a good 
>>>>>> cleanup
>>>>>> opportunity there.
>>>>>>
>>>>>> Thanks,
>>>>>> Amogh Jahagirdar
>>>>>>
>>>>>> On Thu, Apr 16, 2026 at 11:09 AM Prashant Singh <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hey Ryan / Yufei,
>>>>>>> Here is my one attempt to get rid of that, it was from gov pov, it's
>>>>>>> mostly from Serializable Table [1]
>>>>>>> If we are all onboard, I can clean up and revive this effort.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/apache/iceberg/pull/14944#issuecomment-3812676977
>>>>>>>
>>>>>>> Best,
>>>>>>> Prashant Singh
>>>>>>>
>>>>>>> On Thu, Apr 16, 2026 at 9:08 AM Ryan Blue <[email protected]> wrote:
>>>>>>>
>>>>>>>> They do? Where is that?
>>>>>>>>
>>>>>>>> Definitely something we should remove as soon as we can.
>>>>>>>>
>>>>>>>> On Thu, Apr 16, 2026 at 8:58 AM Yufei Gu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> To add to that, some engines like Spark still assume metadata.json
>>>>>>>>> exists in storage. The executors load the file directly instead of 
>>>>>>>>> checking
>>>>>>>>> the REST catalog for table metadata. We will need to modify that.
>>>>>>>>>
>>>>>>>>> Yufei
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 16, 2026 at 8:45 AM Ryan Blue <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I think that the problem of large metadata.json files is largely
>>>>>>>>>> solved by the REST protocol, which does not need to send snapshots to
>>>>>>>>>> clients. I agree with Anton's suggestion to relax the requirement 
>>>>>>>>>> that the
>>>>>>>>>> metadata.json file has to be stored somewhere (for v4). As long as 
>>>>>>>>>> catalogs
>>>>>>>>>> are required to be able to produce the full content of metadata.json 
>>>>>>>>>> when
>>>>>>>>>> loading the table for a client requesting all snapshots, we don't 
>>>>>>>>>> need to
>>>>>>>>>> worry about storing the file.
>>>>>>>>>>
>>>>>>>>>> There are two things to keep in mind though:
>>>>>>>>>> 1. I think the current Java REST implementation still requests
>>>>>>>>>> all snapshots to commit, which we should fix
>>>>>>>>>> 2. I think it is a bad idea to split up the metadata.json file
>>>>>>>>>> for non-REST catalogs. This introduces way too much complexity that
>>>>>>>>>> necessarily leaks out of the catalog implementation. I don't think 
>>>>>>>>>> this is
>>>>>>>>>> a problem worth solving when we have a perfectly good solution that 
>>>>>>>>>> has
>>>>>>>>>> significant benefits.
>>>>>>>>>>
>>>>>>>>>> Ryan
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 16, 2026 at 12:13 AM Innocent Djiofack <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for the replies. Steven the change is scoped to only
>>>>>>>>>>> offloading snapshots history. Yufei, yes this is a large change. I
>>>>>>>>>>> agreed that removing the requirement for a metadata.json file per 
>>>>>>>>>>> commit in
>>>>>>>>>>> storage would help most of the concerns. If there is already a 
>>>>>>>>>>> design doc
>>>>>>>>>>> for that direction, please share it with me. If not, I can start 
>>>>>>>>>>> something
>>>>>>>>>>> around that line of reasoning.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 14, 2026 at 4:09 PM Yufei Gu <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Separating snapshot history from table metadata feels like a
>>>>>>>>>>>> large, invasive change since it would require updates across all 
>>>>>>>>>>>> clients
>>>>>>>>>>>> and engines. If we instead remove the requirement for a 
>>>>>>>>>>>> metadata.json file
>>>>>>>>>>>> per commit in storage, many of the current concerns could be 
>>>>>>>>>>>> addressed.
>>>>>>>>>>>> This seems like a more practical path forward. There are already
>>>>>>>>>>>> multiple discussions over there. I'd suggest to move forward with 
>>>>>>>>>>>> that
>>>>>>>>>>>> direction.
>>>>>>>>>>>>
>>>>>>>>>>>> Yufei
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 14, 2026 at 8:44 AM Steven Wu <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I understand the problem we are trying to solve here. But the
>>>>>>>>>>>>> actual proposed solution is unclear to me. The proposal seems 
>>>>>>>>>>>>> lack some
>>>>>>>>>>>>> details in the actual design/solution.
>>>>>>>>>>>>>
>>>>>>>>>>>>> How do the proposed snapshot read and write APIs differ from
>>>>>>>>>>>>> the current APIs? I can't tell the difference.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Once defined, this interface could be implemented by
>>>>>>>>>>>>> various backing stores, such as another file or even a Catalog.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To support offloading, we probably have to update the table
>>>>>>>>>>>>> metadata in the table spec
>>>>>>>>>>>>> <https://iceberg.apache.org/spec/#table-metadata-fields>.
>>>>>>>>>>>>> Does this depend on making metadata.json file optional? Or is 
>>>>>>>>>>>>> this limited
>>>>>>>>>>>>> to just externalizing the snapshot list?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 14, 2026 at 2:53 AM Jean-Baptiste Onofré <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Innocent
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe it's a kind of redundant with the V4 initiative ?
>>>>>>>>>>>>>> What are your thoughts on this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Apr 14, 2026 at 6:44 AM Innocent Djiofack <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello Everyone,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My name is Innocent and I have enjoyed working on the apache
>>>>>>>>>>>>>>> Iceberg project so far and have learned a lot from people in 
>>>>>>>>>>>>>>> the group.
>>>>>>>>>>>>>>> I wanted to follow up on a concern raised by Anton around
>>>>>>>>>>>>>>> the growing size of metadata.json and the problems it brings. 
>>>>>>>>>>>>>>> Before going
>>>>>>>>>>>>>>> ahead and doing the implementation work, I wanted to share the 
>>>>>>>>>>>>>>> high level
>>>>>>>>>>>>>>> thinking with the community and get feedback. You will find the 
>>>>>>>>>>>>>>> link to the
>>>>>>>>>>>>>>> proposal here
>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1xpzpsA9BGSkxo58yUhSdDQaSu7_ITQLFmGarEOyM8P0/edit?tab=t.0#heading=h.7g59t9p9o1xi>
>>>>>>>>>>>>>>>  I
>>>>>>>>>>>>>>> would appreciate comments and feedback on it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *DJIOFACK INNOCENT*
>>>>>>>>>>>>>>> *"Be better than the day before!" -*
>>>>>>>>>>>>>>> *+1 404 751 8024*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> *DJIOFACK INNOCENT*
>>>>>>>>>>> *"Be better than the day before!" -*
>>>>>>>>>>> *+1 404 751 8024*
>>>>>>>>>>>
>>>>>>>>>>
>
> --
>
> *DJIOFACK INNOCENT*
> *"Be better than the day before!" -*
> *+1 404 751 8024*
>

Re: [DISCUSS] Offloading Snapshots from Metadata.json

Reply via email to