Re: [DISCUSS] v4 - One file commits

Amogh Jahagirdar Wed, 17 Sep 2025 12:36:17 -0700

Update: I moved the discussion time to this Friday at 9 am PST since I
found out that quite a few folks involved in the proposals will be out next
week, and I also know some folks will also be out the week after that.


Thanks,
Amogh J

On Mon, Sep 8, 2025 at 8:57 AM Amogh Jahagirdar <[email protected]> wrote:

> Hey folks sorry for the late follow up here,
>
> Thanks @Kevin Liu <[email protected]> for sharing the recording link
> of the previous discussion! I've set up another sync for next Tuesday 09/16
> at 9am PST. This time I've set it up from my corporate email so we can get
> recordings and transcriptions (and I've made sure to keep the meeting
> invite open so we don't have to manually let people in).
>
> In terms of next steps of areas which I think would be good to focus on
> for establishing consensus:
>
> 1. How do we model the manifest entry structure so that changes to
> manifest DVs can be obtained easily from the root? There are a few options
> here; the most promising approach is to keep an additional DV which encodes
> the diff in additional positions which have been removed from a leaf
> manifest.
>
> 2. Modeling partition transforms via expressions and establishing a
> unified table ID space so that we can simplify how partition tuples may be
> represented via stats and also have a way in the future to store stats on
> any derived column. I have a short proposal
> <https://docs.google.com/document/d/1oV8dapKVzB4pZy5pKHUCj5j9i2_1p37BJSeT7hyKPpg/edit?tab=t.0>
>  for
> this that probably still needs some tightening up on the expression
> modeling itself (and some prototyping) but the general idea for
> establishing a unified table ID space is covered. All feedback welcome!
>
> Thanks,
>
> Amogh Jahagirdar
>
> On Mon, Aug 25, 2025 at 1:34 PM Kevin Liu <[email protected]> wrote:
>
>> Thanks Amogh. Looks like the recording for last week's sync is available
>> on Youtube. Here's the link, https://www.youtube.com/watch?v=uWm-p--8oVQ
>>
>> Best,
>> Kevin Liu
>>
>> On Tue, Aug 12, 2025 at 9:10 PM Amogh Jahagirdar <[email protected]>
>> wrote:
>>
>>> Hey folks,
>>>
>>> Just following up on this to give the community as to where we're at and
>>> my proposed next steps.
>>>
>>> I've been editing and merging the contents from our proposal into the
>>> proposal
>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw>
>>>  from
>>> Russell and others. For any future comments on docs, please comment on the
>>> linked proposal. I've also marked it on our doc in red text so it's clear
>>> to redirect to the other proposal as a source of truth for comments.
>>>
>>> In terms of next steps,
>>>
>>> 1. An important design decision point is around inline manifest DVs,
>>> external manifest DVs or enabling both. I'm working on measuring different
>>> approaches for representing the compressed DV representation since that
>>> will inform how many entries can reasonably fit in a small root manifest;
>>> from that we can derive implications on different write patterns and
>>> determine the right approach for storing these manifest DVs.
>>>
>>> 2. Another key point is around determining if/how we can reasonably
>>> enable V4 to represent changes in the root manifest so that readers can
>>> effectively just infer file level changes from the root.
>>>
>>> 3. One of the aspects of the proposal is getting away from partition
>>> tuple requirement in the root which currently holds us to have
>>> associativity between a partition spec and a manifest. These aspects can be
>>> modeled as essentially column stats which gives a lot of flexibility into
>>> the organization of the manifest. There are important details around field
>>> ID spaces here which tie into how the stats are structured. What we're
>>> proposing here is to have a unified expression ID space that could also
>>> benefit us for storing things like virtual columns down the line. I go into
>>> this in the proposal but I'm working on separating the appropriate parts so
>>> that the original proposal can mostly just focus on the organization of the
>>> content metadata tree and not how we want to solve this particular ID space
>>> problem.
>>>
>>> 4. I'm planning on scheduling a recurring community sync starting next
>>> Tuesday at 9am PST, every 2 weeks. If I get feedback from folks that this
>>> time will never work, I can certainly adjust. For some reason, I don't have
>>> the ability to add to the Iceberg Dev calendar, so I'll figure that out and
>>> update the thread when the event is scheduled.
>>>
>>> Thanks,
>>>
>>> Amogh Jahagirdar
>>>
>>> On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer <
>>> [email protected]> wrote:
>>>
>>>> I think this is a great way forward, starting out with this much
>>>> parallel development shows that we have a lot of consensus already :)
>>>>
>>>> On Tue, Jul 22, 2025 at 12:42 PM Amogh Jahagirdar <[email protected]>
>>>> wrote:
>>>>
>>>>> Hey folks, just following up on this. It looks like our proposal and
>>>>> the proposal that @Russell Spitzer <[email protected]> shared
>>>>> are pretty aligned. I was just chatting with Russell about this, and we
>>>>> think it'd be best to combine both proposals and have a singular large
>>>>> effort on this. I can also set up a focused community discussion (similar
>>>>> to what we're doing on the other V4 proposals) on this starting sometime
>>>>> next week just to get things moving, if that works for people.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Amogh Jahagirdar
>>>>>
>>>>> On Mon, Jul 14, 2025 at 9:48 PM Amogh Jahagirdar <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hey Russell,
>>>>>>
>>>>>> Thanks for sharing the proposal! A few of us (Ryan, Dan, Anoop and I)
>>>>>> have also been working on a proposal for an adaptive metadata tree
>>>>>> structure as part of enabling more efficient one file commits. From a 
>>>>>> read
>>>>>> of the summary, it's great to see that we're thinking along the same 
>>>>>> lines
>>>>>> about how to tackle this fundamental area!
>>>>>>
>>>>>> Here is our proposal:
>>>>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0
>>>>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0>
>>>>>>
>>>>>> Thanks,
>>>>>> Amogh Jahagirdar
>>>>>>
>>>>>> On Mon, Jul 14, 2025 at 8:08 PM Russell Spitzer <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hey y'all!
>>>>>>>
>>>>>>> We (Yi Fang, Steven Wu and Myself) wanted to share some
>>>>>>> of the thoughts we had on how one-file commits could work in
>>>>>>> Iceberg. This is pretty
>>>>>>> much just a high level overview of the concepts we think we need and
>>>>>>> how Iceberg would behave.
>>>>>>> We haven't gone very far into the actual implementation and changes
>>>>>>> that would need to occur in the
>>>>>>> SDK to make this happen.
>>>>>>>
>>>>>>> The high level summary is:
>>>>>>>
>>>>>>> Manifest Lists are out
>>>>>>> Root Manifests take their place
>>>>>>>   A Root manifest can have data manifests, delete manifests,
>>>>>>> manifest delete vectors, data delete vectors and data files
>>>>>>>   Manifest delete vectors allow for modifying a manifest without
>>>>>>> deleting it entirely
>>>>>>>   Data files let you append without writing an intermediary manifest
>>>>>>>   Having child data and delete manifests lets you still scale
>>>>>>>
>>>>>>> Please take a look if you like,
>>>>>>>
>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0
>>>>>>>
>>>>>>> I'm excited to see what other proposals and Ideas are floating
>>>>>>> around the community,
>>>>>>> Russ
>>>>>>>
>>>>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge <[email protected]> wrote:
>>>>>>>
>>>>>>>> Very excited about the idea!
>>>>>>>>
>>>>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> I'm very interested in this initiative. Micah Kornfield and I
>>>>>>>>> presented
>>>>>>>>> <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405> on
>>>>>>>>> high-throughput ingestion for Iceberg tables at the 2024 Iceberg 
>>>>>>>>> Summit,
>>>>>>>>> which leveraged Google infrastructure like Colossus for efficient 
>>>>>>>>> appends.
>>>>>>>>>
>>>>>>>>> This new proposal is particularly exciting because it offers
>>>>>>>>> significant advancements in commit latency and metadata storage 
>>>>>>>>> footprint.
>>>>>>>>> Furthermore, a consistent manifest structure promises to simplify the
>>>>>>>>> design and codebase, which is a major benefit.
>>>>>>>>>
>>>>>>>>> A related idea I've been exploring is having a loose affinity
>>>>>>>>> between data and delete manifests. While the current separation of 
>>>>>>>>> data and
>>>>>>>>> delete manifests in Iceberg is valuable for avoiding data file 
>>>>>>>>> rewrites
>>>>>>>>> (and stats updates) when deletes change, it does necessitate a join
>>>>>>>>> operation during reads. I'd be keen to discuss approaches that could
>>>>>>>>> potentially reduce this read-side cost while retaining the benefits of
>>>>>>>>> separate manifests.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Anoop
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep Sidhu <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi everyone,
>>>>>>>>>>
>>>>>>>>>> I am new to the Iceberg community but would love to participate
>>>>>>>>>> in these discussions to reduce the number of file writes, especially 
>>>>>>>>>> for
>>>>>>>>>> small writes/commits.
>>>>>>>>>>
>>>>>>>>>> Thank you!
>>>>>>>>>> -Jagdeep
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> We have been hitting all the metadata problems you mentioned,
>>>>>>>>>>> Ryan. I’m on-board to help however I can to improve this area.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ~ Anurag Mantripragada
>>>>>>>>>>>
>>>>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I am interested in this idea and looking forward to
>>>>>>>>>>> collaboration.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Huang-Hsiang
>>>>>>>>>>>
>>>>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am interested in contributing to this effort.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Namratha
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for kicking this thread off Ryan, I'm interested in
>>>>>>>>>>>> helping out here! I've been working on a proposal in this area and 
>>>>>>>>>>>> it would
>>>>>>>>>>>> be great to collaborate with different folks and exchange ideas 
>>>>>>>>>>>> here, since
>>>>>>>>>>>> I think a lot of people are interested in solving this problem.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Like Russell’s recent note, I’m starting a thread to connect
>>>>>>>>>>>>> those of us that are interested in the idea of changing Iceberg’s 
>>>>>>>>>>>>> metadata
>>>>>>>>>>>>> in v4 so that in most cases committing a change only requires 
>>>>>>>>>>>>> writing one
>>>>>>>>>>>>> additional metadata file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Idea: One-file commits*
>>>>>>>>>>>>>
>>>>>>>>>>>>> The current Iceberg metadata structure requires writing at
>>>>>>>>>>>>> least one manifest and a new manifest list to produce a new 
>>>>>>>>>>>>> snapshot. The
>>>>>>>>>>>>> goal of this work is to allow more flexibility by allowing the 
>>>>>>>>>>>>> manifest
>>>>>>>>>>>>> list layer to store data and delete files. As a result, only one 
>>>>>>>>>>>>> file write
>>>>>>>>>>>>> would be needed before committing the new snapshot. In addition, 
>>>>>>>>>>>>> this work
>>>>>>>>>>>>> will also try to explore:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Avoiding small manifests that must be read in parallel
>>>>>>>>>>>>>    and later compacted (metadata maintenance changes)
>>>>>>>>>>>>>    - Extend metadata skipping to use aggregated column ranges
>>>>>>>>>>>>>    that are compatible with geospatial data (manifest metadata)
>>>>>>>>>>>>>    - Using soft deletes to avoid rewriting existing manifests
>>>>>>>>>>>>>    (metadata DVs)
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you’re interested in these problems, please reply!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> John Zhuge
>>>>>>>>
>>>>>>>

Re: [DISCUSS] v4 - One file commits

Reply via email to