Hey folks, I was out for some time, but set up a sync for tomorrow at 9am PST. For this discussion, I do think it would be great to focus on the manifest DV representation, factoring in analyses on bitmap representation storage footprints, and the entry structure considering how we want to approach change detection. If there are other topics that people want to highlight, please do bring those up as well!
I also recognize that this is a bit short term scheduling, so please do reach out to me if this time is difficult to work with; next week is the Thanksgiving holidays here, and since people would be travelling/out I figured I'd try to schedule before then. Thanks, Amogh Jahagirdar On Fri, Oct 17, 2025 at 9:03 AM Amogh Jahagirdar <[email protected]> wrote: > Hey folks, > > Sorry for the delay, here's the recording link > <https://drive.google.com/file/d/1YOmPROXjAKYAWAcYxqAFHdADbqELVVf2/view> from > last week's discussion. > > Thanks, > Amogh Jahagirdar > > On Fri, Oct 10, 2025 at 9:44 AM Péter Váry <[email protected]> > wrote: > >> Same here. >> Please record if you can. >> Thanks, Peter >> >> On Fri, Oct 10, 2025, 17:39 Fokko Driesprong <[email protected]> wrote: >> >>> Hey Amogh, >>> >>> Thanks for the write-up. Unfortunately, I won’t be able to attend. Will >>> it be recorded? Thanks! >>> >>> Kind regards, >>> Fokko >>> >>> Op di 7 okt 2025 om 20:36 schreef Amogh Jahagirdar <[email protected]> >>> >>>> Hey all, >>>> >>>> I've setup time this Friday at 9am PST for another sync on single file >>>> commits. In terms of what would be great to focus on for the discussion: >>>> >>>> 1. Whether it makes sense or not to eliminate the tuple, and instead >>>> representing the tuple via lower/upper boundaries. As a reminder, one of >>>> the goals is to avoid tying a partition spec to a manifest; in the root we >>>> can have a mix of files spanning different partition specs, and even in >>>> leaf manifests avoiding this coupling can enable more desirable clustering >>>> of metadata. >>>> In the vast majority of cases, we could leverage the property that a >>>> file is effectively partitioned if the lower/upper for a given field is >>>> equal. The nuance here is with the particular case of identity partitioned >>>> string/binary columns which can be truncated in stats. One approach is to >>>> require that writers must not produce truncated stats for identity >>>> partitioned columns. It's also important to keep in mind that all of this >>>> is just for the purpose of reconstructing the partition tuple, which is >>>> only required during equality delete matching. Another area we need to >>>> cover as part of this is on exact bounds on stats. There are other options >>>> here as well such as making all new equality deletes in V4 be global and >>>> instead match based on bounds, or keeping the tuple but each tuple is >>>> effectively based off a union schema of all partition specs. I am adding a >>>> separate appendix section outlining the span of options here and the >>>> different tradeoffs. >>>> Once we get this more to a conclusive state, I'll move a summarized >>>> version to the main doc. >>>> >>>> 2. @[email protected] <[email protected]> has updated the doc >>>> with a section >>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.rrpksmp8zkb#heading=h.qau0y5xkh9mn> >>>> on >>>> how we can do change detection from the root in a variety of write >>>> scenarios. I've done a review on it, and it covers the cases I would >>>> expect. It'd be good for folks to take a look and please give feedback >>>> before we discuss. Thank you Steven for adding that section and all the >>>> diagrams. >>>> >>>> Thanks, >>>> Amogh Jahagirdar >>>> >>>> On Thu, Sep 18, 2025 at 3:19 PM Amogh Jahagirdar <[email protected]> >>>> wrote: >>>> >>>>> Hey folks just following up from the discussion last Friday with a >>>>> summary and some next steps: >>>>> >>>>> 1.) For the various change detection cases, we concluded it's best >>>>> just to go through those in an offline manner on the doc since it's hard >>>>> to >>>>> verify all that correctness in a large meeting setting. >>>>> 2.) We mostly discussed eliminating the partition tuple. On the >>>>> original proposal, I was mostly aiming for the ability to re-constructing >>>>> the tuple from the stats for the purpose of equality delete matching (a >>>>> file is partitioned if the lower and upper bounds are equal); There's some >>>>> nuance in how we need to handle identity partition values since for >>>>> string/binary they cannot be truncated. Another potential option is to >>>>> treat all equality deletes as effectively global and narrow their >>>>> application based on the stats values. This may require defining tight >>>>> bounds. I'm still collecting my thoughts on this one. >>>>> >>>>> Thanks folks! Please also let me know if any of the following links >>>>> are inaccessible for any reason. >>>>> >>>>> Meeting recording link: >>>>> https://drive.google.com/file/d/1gv8TrR5xzqqNxek7_sTZkpbwQx1M3dhK/view >>>>> >>>>> Meeting summary: >>>>> https://docs.google.com/document/d/131N0CDpzZczURxitN0HGS7dTqRxQT_YS9jMECkGGvQU >>>>> >>>>> On Mon, Sep 8, 2025 at 3:40 PM Amogh Jahagirdar <[email protected]> >>>>> wrote: >>>>> >>>>>> Update: I moved the discussion time to this Friday at 9 am PST since >>>>>> I found out that quite a few folks involved in the proposals will be out >>>>>> next week, and I also know some folks will also be out the week after >>>>>> that. >>>>>> >>>>>> Thanks, >>>>>> Amogh J >>>>>> >>>>>> On Mon, Sep 8, 2025 at 8:57 AM Amogh Jahagirdar <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hey folks sorry for the late follow up here, >>>>>>> >>>>>>> Thanks @Kevin Liu <[email protected]> for sharing the >>>>>>> recording link of the previous discussion! I've set up another sync for >>>>>>> next Tuesday 09/16 at 9am PST. This time I've set it up from my >>>>>>> corporate >>>>>>> email so we can get recordings and transcriptions (and I've made sure to >>>>>>> keep the meeting invite open so we don't have to manually let people >>>>>>> in). >>>>>>> >>>>>>> In terms of next steps of areas which I think would be good to focus >>>>>>> on for establishing consensus: >>>>>>> >>>>>>> 1. How do we model the manifest entry structure so that changes to >>>>>>> manifest DVs can be obtained easily from the root? There are a few >>>>>>> options >>>>>>> here; the most promising approach is to keep an additional DV which >>>>>>> encodes >>>>>>> the diff in additional positions which have been removed from a leaf >>>>>>> manifest. >>>>>>> >>>>>>> 2. Modeling partition transforms via expressions and establishing a >>>>>>> unified table ID space so that we can simplify how partition tuples may >>>>>>> be >>>>>>> represented via stats and also have a way in the future to store stats >>>>>>> on >>>>>>> any derived column. I have a short proposal >>>>>>> <https://docs.google.com/document/d/1oV8dapKVzB4pZy5pKHUCj5j9i2_1p37BJSeT7hyKPpg/edit?tab=t.0> >>>>>>> for >>>>>>> this that probably still needs some tightening up on the expression >>>>>>> modeling itself (and some prototyping) but the general idea for >>>>>>> establishing a unified table ID space is covered. All feedback welcome! >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Amogh Jahagirdar >>>>>>> >>>>>>> On Mon, Aug 25, 2025 at 1:34 PM Kevin Liu <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks Amogh. Looks like the recording for last week's sync is >>>>>>>> available on Youtube. Here's the link, >>>>>>>> https://www.youtube.com/watch?v=uWm-p--8oVQ >>>>>>>> >>>>>>>> Best, >>>>>>>> Kevin Liu >>>>>>>> >>>>>>>> On Tue, Aug 12, 2025 at 9:10 PM Amogh Jahagirdar <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hey folks, >>>>>>>>> >>>>>>>>> Just following up on this to give the community as to where we're >>>>>>>>> at and my proposed next steps. >>>>>>>>> >>>>>>>>> I've been editing and merging the contents from our proposal into >>>>>>>>> the proposal >>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw> >>>>>>>>> from >>>>>>>>> Russell and others. For any future comments on docs, please comment >>>>>>>>> on the >>>>>>>>> linked proposal. I've also marked it on our doc in red text so it's >>>>>>>>> clear >>>>>>>>> to redirect to the other proposal as a source of truth for comments. >>>>>>>>> >>>>>>>>> In terms of next steps, >>>>>>>>> >>>>>>>>> 1. An important design decision point is around inline manifest >>>>>>>>> DVs, external manifest DVs or enabling both. I'm working on >>>>>>>>> measuring different approaches for representing the compressed DV >>>>>>>>> representation since that will inform how many entries can reasonably >>>>>>>>> fit >>>>>>>>> in a small root manifest; from that we can derive implications on >>>>>>>>> different >>>>>>>>> write patterns and determine the right approach for storing these >>>>>>>>> manifest >>>>>>>>> DVs. >>>>>>>>> >>>>>>>>> 2. Another key point is around determining if/how we can >>>>>>>>> reasonably enable V4 to represent changes in the root manifest so that >>>>>>>>> readers can effectively just infer file level changes from the root. >>>>>>>>> >>>>>>>>> 3. One of the aspects of the proposal is getting away from >>>>>>>>> partition tuple requirement in the root which currently holds us to >>>>>>>>> have >>>>>>>>> associativity between a partition spec and a manifest. These aspects >>>>>>>>> can be >>>>>>>>> modeled as essentially column stats which gives a lot of flexibility >>>>>>>>> into >>>>>>>>> the organization of the manifest. There are important details around >>>>>>>>> field >>>>>>>>> ID spaces here which tie into how the stats are structured. What we're >>>>>>>>> proposing here is to have a unified expression ID space that could >>>>>>>>> also >>>>>>>>> benefit us for storing things like virtual columns down the line. I >>>>>>>>> go into >>>>>>>>> this in the proposal but I'm working on separating the appropriate >>>>>>>>> parts so >>>>>>>>> that the original proposal can mostly just focus on the organization >>>>>>>>> of the >>>>>>>>> content metadata tree and not how we want to solve this particular ID >>>>>>>>> space >>>>>>>>> problem. >>>>>>>>> >>>>>>>>> 4. I'm planning on scheduling a recurring community sync starting >>>>>>>>> next Tuesday at 9am PST, every 2 weeks. If I get feedback from folks >>>>>>>>> that >>>>>>>>> this time will never work, I can certainly adjust. For some reason, I >>>>>>>>> don't >>>>>>>>> have the ability to add to the Iceberg Dev calendar, so I'll figure >>>>>>>>> that >>>>>>>>> out and update the thread when the event is scheduled. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Amogh Jahagirdar >>>>>>>>> >>>>>>>>> On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I think this is a great way forward, starting out with this much >>>>>>>>>> parallel development shows that we have a lot of consensus already :) >>>>>>>>>> >>>>>>>>>> On Tue, Jul 22, 2025 at 12:42 PM Amogh Jahagirdar < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hey folks, just following up on this. It looks like our proposal >>>>>>>>>>> and the proposal that @Russell Spitzer >>>>>>>>>>> <[email protected]> shared are pretty aligned. I was >>>>>>>>>>> just chatting with Russell about this, and we think it'd be best to >>>>>>>>>>> combine >>>>>>>>>>> both proposals and have a singular large effort on this. I can also >>>>>>>>>>> set up >>>>>>>>>>> a focused community discussion (similar to what we're doing on the >>>>>>>>>>> other V4 >>>>>>>>>>> proposals) on this starting sometime next week just to get things >>>>>>>>>>> moving, >>>>>>>>>>> if that works for people. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>> >>>>>>>>>>> On Mon, Jul 14, 2025 at 9:48 PM Amogh Jahagirdar < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey Russell, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for sharing the proposal! A few of us (Ryan, Dan, Anoop >>>>>>>>>>>> and I) have also been working on a proposal for an adaptive >>>>>>>>>>>> metadata tree >>>>>>>>>>>> structure as part of enabling more efficient one file commits. >>>>>>>>>>>> From a read >>>>>>>>>>>> of the summary, it's great to see that we're thinking along the >>>>>>>>>>>> same lines >>>>>>>>>>>> about how to tackle this fundamental area! >>>>>>>>>>>> >>>>>>>>>>>> Here is our proposal: >>>>>>>>>>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0 >>>>>>>>>>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jul 14, 2025 at 8:08 PM Russell Spitzer < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hey y'all! >>>>>>>>>>>>> >>>>>>>>>>>>> We (Yi Fang, Steven Wu and Myself) wanted to share some >>>>>>>>>>>>> of the thoughts we had on how one-file commits could work in >>>>>>>>>>>>> Iceberg. This is pretty >>>>>>>>>>>>> much just a high level overview of the concepts we think we >>>>>>>>>>>>> need and how Iceberg would behave. >>>>>>>>>>>>> We haven't gone very far into the actual implementation and >>>>>>>>>>>>> changes that would need to occur in the >>>>>>>>>>>>> SDK to make this happen. >>>>>>>>>>>>> >>>>>>>>>>>>> The high level summary is: >>>>>>>>>>>>> >>>>>>>>>>>>> Manifest Lists are out >>>>>>>>>>>>> Root Manifests take their place >>>>>>>>>>>>> A Root manifest can have data manifests, delete manifests, >>>>>>>>>>>>> manifest delete vectors, data delete vectors and data files >>>>>>>>>>>>> Manifest delete vectors allow for modifying a manifest >>>>>>>>>>>>> without deleting it entirely >>>>>>>>>>>>> Data files let you append without writing an intermediary >>>>>>>>>>>>> manifest >>>>>>>>>>>>> Having child data and delete manifests lets you still scale >>>>>>>>>>>>> >>>>>>>>>>>>> Please take a look if you like, >>>>>>>>>>>>> >>>>>>>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0 >>>>>>>>>>>>> >>>>>>>>>>>>> I'm excited to see what other proposals and Ideas are floating >>>>>>>>>>>>> around the community, >>>>>>>>>>>>> Russ >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Very excited about the idea! >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm very interested in this initiative. Micah Kornfield and >>>>>>>>>>>>>>> I presented >>>>>>>>>>>>>>> <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405> >>>>>>>>>>>>>>> on high-throughput ingestion for Iceberg tables at the 2024 >>>>>>>>>>>>>>> Iceberg Summit, >>>>>>>>>>>>>>> which leveraged Google infrastructure like Colossus for >>>>>>>>>>>>>>> efficient appends. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This new proposal is particularly exciting because it offers >>>>>>>>>>>>>>> significant advancements in commit latency and metadata storage >>>>>>>>>>>>>>> footprint. >>>>>>>>>>>>>>> Furthermore, a consistent manifest structure promises to >>>>>>>>>>>>>>> simplify the >>>>>>>>>>>>>>> design and codebase, which is a major benefit. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> A related idea I've been exploring is having a loose >>>>>>>>>>>>>>> affinity between data and delete manifests. While the current >>>>>>>>>>>>>>> separation of >>>>>>>>>>>>>>> data and delete manifests in Iceberg is valuable for avoiding >>>>>>>>>>>>>>> data file >>>>>>>>>>>>>>> rewrites (and stats updates) when deletes change, it does >>>>>>>>>>>>>>> necessitate a >>>>>>>>>>>>>>> join operation during reads. I'd be keen to discuss approaches >>>>>>>>>>>>>>> that could >>>>>>>>>>>>>>> potentially reduce this read-side cost while retaining the >>>>>>>>>>>>>>> benefits of >>>>>>>>>>>>>>> separate manifests. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Anoop >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep Sidhu < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am new to the Iceberg community but would love to >>>>>>>>>>>>>>>> participate in these discussions to reduce the number of file >>>>>>>>>>>>>>>> writes, >>>>>>>>>>>>>>>> especially for small writes/commits. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you! >>>>>>>>>>>>>>>> -Jagdeep >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada >>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We have been hitting all the metadata problems you >>>>>>>>>>>>>>>>> mentioned, Ryan. I’m on-board to help however I can to >>>>>>>>>>>>>>>>> improve this area. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ~ Anurag Mantripragada >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng >>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am interested in this idea and looking forward to >>>>>>>>>>>>>>>>> collaboration. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Huang-Hsiang >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk <[email protected]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am interested in contributing to this effort. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Namratha >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for kicking this thread off Ryan, I'm interested >>>>>>>>>>>>>>>>>> in helping out here! I've been working on a proposal in this >>>>>>>>>>>>>>>>>> area and it >>>>>>>>>>>>>>>>>> would be great to collaborate with different folks and >>>>>>>>>>>>>>>>>> exchange ideas here, >>>>>>>>>>>>>>>>>> since I think a lot of people are interested in solving this >>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Like Russell’s recent note, I’m starting a thread to >>>>>>>>>>>>>>>>>>> connect those of us that are interested in the idea of >>>>>>>>>>>>>>>>>>> changing Iceberg’s >>>>>>>>>>>>>>>>>>> metadata in v4 so that in most cases committing a change >>>>>>>>>>>>>>>>>>> only requires >>>>>>>>>>>>>>>>>>> writing one additional metadata file. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *Idea: One-file commits* >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The current Iceberg metadata structure requires writing >>>>>>>>>>>>>>>>>>> at least one manifest and a new manifest list to produce a >>>>>>>>>>>>>>>>>>> new snapshot. >>>>>>>>>>>>>>>>>>> The goal of this work is to allow more flexibility by >>>>>>>>>>>>>>>>>>> allowing the manifest >>>>>>>>>>>>>>>>>>> list layer to store data and delete files. As a result, >>>>>>>>>>>>>>>>>>> only one file write >>>>>>>>>>>>>>>>>>> would be needed before committing the new snapshot. In >>>>>>>>>>>>>>>>>>> addition, this work >>>>>>>>>>>>>>>>>>> will also try to explore: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - Avoiding small manifests that must be read in >>>>>>>>>>>>>>>>>>> parallel and later compacted (metadata maintenance >>>>>>>>>>>>>>>>>>> changes) >>>>>>>>>>>>>>>>>>> - Extend metadata skipping to use aggregated column >>>>>>>>>>>>>>>>>>> ranges that are compatible with geospatial data >>>>>>>>>>>>>>>>>>> (manifest metadata) >>>>>>>>>>>>>>>>>>> - Using soft deletes to avoid rewriting existing >>>>>>>>>>>>>>>>>>> manifests (metadata DVs) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If you’re interested in these problems, please reply! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> John Zhuge >>>>>>>>>>>>>> >>>>>>>>>>>>>
