Hey all, Here is the meeting recording <https://drive.google.com/file/d/1lG9sM-JTwqcIgk7JsAryXXCc1vMnstJs/view?usp=sharing> and generated meeting summary <https://docs.google.com/document/d/1e50p8TXL2e3CnUwKMOvm8F4s2PeVMiKWHPxhxOW1fIM/edit?usp=sharing>. Thanks all for attending yesterday!
On Thu, Nov 20, 2025 at 8:49 AM Amogh Jahagirdar <[email protected]> wrote: > Hey folks, > > I was out for some time, but set up a sync for tomorrow at 9am PST. For > this discussion, I do think it would be great to focus on the manifest DV > representation, factoring in analyses on bitmap representation storage > footprints, and the entry structure considering how we want to approach > change detection. If there are other topics that people want to highlight, > please do bring those up as well! > > I also recognize that this is a bit short term scheduling, so please do > reach out to me if this time is difficult to work with; next week is the > Thanksgiving holidays here, and since people would be travelling/out I > figured I'd try to schedule before then. > > Thanks, > Amogh Jahagirdar > > > > On Fri, Oct 17, 2025 at 9:03 AM Amogh Jahagirdar <[email protected]> wrote: > >> Hey folks, >> >> Sorry for the delay, here's the recording link >> <https://drive.google.com/file/d/1YOmPROXjAKYAWAcYxqAFHdADbqELVVf2/view> >> from >> last week's discussion. >> >> Thanks, >> Amogh Jahagirdar >> >> On Fri, Oct 10, 2025 at 9:44 AM Péter Váry <[email protected]> >> wrote: >> >>> Same here. >>> Please record if you can. >>> Thanks, Peter >>> >>> On Fri, Oct 10, 2025, 17:39 Fokko Driesprong <[email protected]> wrote: >>> >>>> Hey Amogh, >>>> >>>> Thanks for the write-up. Unfortunately, I won’t be able to attend. Will >>>> it be recorded? Thanks! >>>> >>>> Kind regards, >>>> Fokko >>>> >>>> Op di 7 okt 2025 om 20:36 schreef Amogh Jahagirdar <[email protected]> >>>> >>>>> Hey all, >>>>> >>>>> I've setup time this Friday at 9am PST for another sync on single file >>>>> commits. In terms of what would be great to focus on for the discussion: >>>>> >>>>> 1. Whether it makes sense or not to eliminate the tuple, and instead >>>>> representing the tuple via lower/upper boundaries. As a reminder, one of >>>>> the goals is to avoid tying a partition spec to a manifest; in the root we >>>>> can have a mix of files spanning different partition specs, and even in >>>>> leaf manifests avoiding this coupling can enable more desirable clustering >>>>> of metadata. >>>>> In the vast majority of cases, we could leverage the property that a >>>>> file is effectively partitioned if the lower/upper for a given field is >>>>> equal. The nuance here is with the particular case of identity partitioned >>>>> string/binary columns which can be truncated in stats. One approach is to >>>>> require that writers must not produce truncated stats for identity >>>>> partitioned columns. It's also important to keep in mind that all of this >>>>> is just for the purpose of reconstructing the partition tuple, which is >>>>> only required during equality delete matching. Another area we need to >>>>> cover as part of this is on exact bounds on stats. There are other options >>>>> here as well such as making all new equality deletes in V4 be global and >>>>> instead match based on bounds, or keeping the tuple but each tuple is >>>>> effectively based off a union schema of all partition specs. I am adding a >>>>> separate appendix section outlining the span of options here and the >>>>> different tradeoffs. >>>>> Once we get this more to a conclusive state, I'll move a summarized >>>>> version to the main doc. >>>>> >>>>> 2. @[email protected] <[email protected]> has updated the doc >>>>> with a section >>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.rrpksmp8zkb#heading=h.qau0y5xkh9mn> >>>>> on >>>>> how we can do change detection from the root in a variety of write >>>>> scenarios. I've done a review on it, and it covers the cases I would >>>>> expect. It'd be good for folks to take a look and please give feedback >>>>> before we discuss. Thank you Steven for adding that section and all the >>>>> diagrams. >>>>> >>>>> Thanks, >>>>> Amogh Jahagirdar >>>>> >>>>> On Thu, Sep 18, 2025 at 3:19 PM Amogh Jahagirdar <[email protected]> >>>>> wrote: >>>>> >>>>>> Hey folks just following up from the discussion last Friday with a >>>>>> summary and some next steps: >>>>>> >>>>>> 1.) For the various change detection cases, we concluded it's best >>>>>> just to go through those in an offline manner on the doc since it's hard >>>>>> to >>>>>> verify all that correctness in a large meeting setting. >>>>>> 2.) We mostly discussed eliminating the partition tuple. On the >>>>>> original proposal, I was mostly aiming for the ability to re-constructing >>>>>> the tuple from the stats for the purpose of equality delete matching (a >>>>>> file is partitioned if the lower and upper bounds are equal); There's >>>>>> some >>>>>> nuance in how we need to handle identity partition values since for >>>>>> string/binary they cannot be truncated. Another potential option is to >>>>>> treat all equality deletes as effectively global and narrow their >>>>>> application based on the stats values. This may require defining tight >>>>>> bounds. I'm still collecting my thoughts on this one. >>>>>> >>>>>> Thanks folks! Please also let me know if any of the following links >>>>>> are inaccessible for any reason. >>>>>> >>>>>> Meeting recording link: >>>>>> https://drive.google.com/file/d/1gv8TrR5xzqqNxek7_sTZkpbwQx1M3dhK/view >>>>>> >>>>>> Meeting summary: >>>>>> https://docs.google.com/document/d/131N0CDpzZczURxitN0HGS7dTqRxQT_YS9jMECkGGvQU >>>>>> >>>>>> On Mon, Sep 8, 2025 at 3:40 PM Amogh Jahagirdar <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Update: I moved the discussion time to this Friday at 9 am PST since >>>>>>> I found out that quite a few folks involved in the proposals will be out >>>>>>> next week, and I also know some folks will also be out the week after >>>>>>> that. >>>>>>> >>>>>>> Thanks, >>>>>>> Amogh J >>>>>>> >>>>>>> On Mon, Sep 8, 2025 at 8:57 AM Amogh Jahagirdar <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hey folks sorry for the late follow up here, >>>>>>>> >>>>>>>> Thanks @Kevin Liu <[email protected]> for sharing the >>>>>>>> recording link of the previous discussion! I've set up another sync for >>>>>>>> next Tuesday 09/16 at 9am PST. This time I've set it up from my >>>>>>>> corporate >>>>>>>> email so we can get recordings and transcriptions (and I've made sure >>>>>>>> to >>>>>>>> keep the meeting invite open so we don't have to manually let people >>>>>>>> in). >>>>>>>> >>>>>>>> In terms of next steps of areas which I think would be good to >>>>>>>> focus on for establishing consensus: >>>>>>>> >>>>>>>> 1. How do we model the manifest entry structure so that changes to >>>>>>>> manifest DVs can be obtained easily from the root? There are a few >>>>>>>> options >>>>>>>> here; the most promising approach is to keep an additional DV which >>>>>>>> encodes >>>>>>>> the diff in additional positions which have been removed from a leaf >>>>>>>> manifest. >>>>>>>> >>>>>>>> 2. Modeling partition transforms via expressions and establishing a >>>>>>>> unified table ID space so that we can simplify how partition tuples >>>>>>>> may be >>>>>>>> represented via stats and also have a way in the future to store stats >>>>>>>> on >>>>>>>> any derived column. I have a short proposal >>>>>>>> <https://docs.google.com/document/d/1oV8dapKVzB4pZy5pKHUCj5j9i2_1p37BJSeT7hyKPpg/edit?tab=t.0> >>>>>>>> for >>>>>>>> this that probably still needs some tightening up on the expression >>>>>>>> modeling itself (and some prototyping) but the general idea for >>>>>>>> establishing a unified table ID space is covered. All feedback welcome! >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Amogh Jahagirdar >>>>>>>> >>>>>>>> On Mon, Aug 25, 2025 at 1:34 PM Kevin Liu <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks Amogh. Looks like the recording for last week's sync is >>>>>>>>> available on Youtube. Here's the link, >>>>>>>>> https://www.youtube.com/watch?v=uWm-p--8oVQ >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Kevin Liu >>>>>>>>> >>>>>>>>> On Tue, Aug 12, 2025 at 9:10 PM Amogh Jahagirdar <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hey folks, >>>>>>>>>> >>>>>>>>>> Just following up on this to give the community as to where we're >>>>>>>>>> at and my proposed next steps. >>>>>>>>>> >>>>>>>>>> I've been editing and merging the contents from our proposal into >>>>>>>>>> the proposal >>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw> >>>>>>>>>> from >>>>>>>>>> Russell and others. For any future comments on docs, please comment >>>>>>>>>> on the >>>>>>>>>> linked proposal. I've also marked it on our doc in red text so it's >>>>>>>>>> clear >>>>>>>>>> to redirect to the other proposal as a source of truth for comments. >>>>>>>>>> >>>>>>>>>> In terms of next steps, >>>>>>>>>> >>>>>>>>>> 1. An important design decision point is around inline manifest >>>>>>>>>> DVs, external manifest DVs or enabling both. I'm working on >>>>>>>>>> measuring different approaches for representing the compressed DV >>>>>>>>>> representation since that will inform how many entries can >>>>>>>>>> reasonably fit >>>>>>>>>> in a small root manifest; from that we can derive implications on >>>>>>>>>> different >>>>>>>>>> write patterns and determine the right approach for storing these >>>>>>>>>> manifest >>>>>>>>>> DVs. >>>>>>>>>> >>>>>>>>>> 2. Another key point is around determining if/how we can >>>>>>>>>> reasonably enable V4 to represent changes in the root manifest so >>>>>>>>>> that >>>>>>>>>> readers can effectively just infer file level changes from the root. >>>>>>>>>> >>>>>>>>>> 3. One of the aspects of the proposal is getting away from >>>>>>>>>> partition tuple requirement in the root which currently holds us to >>>>>>>>>> have >>>>>>>>>> associativity between a partition spec and a manifest. These aspects >>>>>>>>>> can be >>>>>>>>>> modeled as essentially column stats which gives a lot of flexibility >>>>>>>>>> into >>>>>>>>>> the organization of the manifest. There are important details around >>>>>>>>>> field >>>>>>>>>> ID spaces here which tie into how the stats are structured. What >>>>>>>>>> we're >>>>>>>>>> proposing here is to have a unified expression ID space that could >>>>>>>>>> also >>>>>>>>>> benefit us for storing things like virtual columns down the line. I >>>>>>>>>> go into >>>>>>>>>> this in the proposal but I'm working on separating the appropriate >>>>>>>>>> parts so >>>>>>>>>> that the original proposal can mostly just focus on the organization >>>>>>>>>> of the >>>>>>>>>> content metadata tree and not how we want to solve this particular >>>>>>>>>> ID space >>>>>>>>>> problem. >>>>>>>>>> >>>>>>>>>> 4. I'm planning on scheduling a recurring community sync starting >>>>>>>>>> next Tuesday at 9am PST, every 2 weeks. If I get feedback from folks >>>>>>>>>> that >>>>>>>>>> this time will never work, I can certainly adjust. For some reason, >>>>>>>>>> I don't >>>>>>>>>> have the ability to add to the Iceberg Dev calendar, so I'll figure >>>>>>>>>> that >>>>>>>>>> out and update the thread when the event is scheduled. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Amogh Jahagirdar >>>>>>>>>> >>>>>>>>>> On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I think this is a great way forward, starting out with this much >>>>>>>>>>> parallel development shows that we have a lot of consensus already >>>>>>>>>>> :) >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 22, 2025 at 12:42 PM Amogh Jahagirdar < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey folks, just following up on this. It looks like our >>>>>>>>>>>> proposal and the proposal that @Russell Spitzer >>>>>>>>>>>> <[email protected]> shared are pretty aligned. I was >>>>>>>>>>>> just chatting with Russell about this, and we think it'd be best >>>>>>>>>>>> to combine >>>>>>>>>>>> both proposals and have a singular large effort on this. I can >>>>>>>>>>>> also set up >>>>>>>>>>>> a focused community discussion (similar to what we're doing on the >>>>>>>>>>>> other V4 >>>>>>>>>>>> proposals) on this starting sometime next week just to get things >>>>>>>>>>>> moving, >>>>>>>>>>>> if that works for people. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jul 14, 2025 at 9:48 PM Amogh Jahagirdar < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hey Russell, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for sharing the proposal! A few of us (Ryan, Dan, Anoop >>>>>>>>>>>>> and I) have also been working on a proposal for an adaptive >>>>>>>>>>>>> metadata tree >>>>>>>>>>>>> structure as part of enabling more efficient one file commits. >>>>>>>>>>>>> From a read >>>>>>>>>>>>> of the summary, it's great to see that we're thinking along the >>>>>>>>>>>>> same lines >>>>>>>>>>>>> about how to tackle this fundamental area! >>>>>>>>>>>>> >>>>>>>>>>>>> Here is our proposal: >>>>>>>>>>>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0 >>>>>>>>>>>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Jul 14, 2025 at 8:08 PM Russell Spitzer < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey y'all! >>>>>>>>>>>>>> >>>>>>>>>>>>>> We (Yi Fang, Steven Wu and Myself) wanted to share some >>>>>>>>>>>>>> of the thoughts we had on how one-file commits could work in >>>>>>>>>>>>>> Iceberg. This is pretty >>>>>>>>>>>>>> much just a high level overview of the concepts we think we >>>>>>>>>>>>>> need and how Iceberg would behave. >>>>>>>>>>>>>> We haven't gone very far into the actual implementation and >>>>>>>>>>>>>> changes that would need to occur in the >>>>>>>>>>>>>> SDK to make this happen. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The high level summary is: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Manifest Lists are out >>>>>>>>>>>>>> Root Manifests take their place >>>>>>>>>>>>>> A Root manifest can have data manifests, delete manifests, >>>>>>>>>>>>>> manifest delete vectors, data delete vectors and data files >>>>>>>>>>>>>> Manifest delete vectors allow for modifying a manifest >>>>>>>>>>>>>> without deleting it entirely >>>>>>>>>>>>>> Data files let you append without writing an intermediary >>>>>>>>>>>>>> manifest >>>>>>>>>>>>>> Having child data and delete manifests lets you still scale >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please take a look if you like, >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm excited to see what other proposals and Ideas are >>>>>>>>>>>>>> floating around the community, >>>>>>>>>>>>>> Russ >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Very excited about the idea! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm very interested in this initiative. Micah Kornfield and >>>>>>>>>>>>>>>> I presented >>>>>>>>>>>>>>>> <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405> >>>>>>>>>>>>>>>> on high-throughput ingestion for Iceberg tables at the 2024 >>>>>>>>>>>>>>>> Iceberg Summit, >>>>>>>>>>>>>>>> which leveraged Google infrastructure like Colossus for >>>>>>>>>>>>>>>> efficient appends. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This new proposal is particularly exciting because it >>>>>>>>>>>>>>>> offers significant advancements in commit latency and metadata >>>>>>>>>>>>>>>> storage >>>>>>>>>>>>>>>> footprint. Furthermore, a consistent manifest structure >>>>>>>>>>>>>>>> promises to >>>>>>>>>>>>>>>> simplify the design and codebase, which is a major benefit. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> A related idea I've been exploring is having a loose >>>>>>>>>>>>>>>> affinity between data and delete manifests. While the current >>>>>>>>>>>>>>>> separation of >>>>>>>>>>>>>>>> data and delete manifests in Iceberg is valuable for avoiding >>>>>>>>>>>>>>>> data file >>>>>>>>>>>>>>>> rewrites (and stats updates) when deletes change, it does >>>>>>>>>>>>>>>> necessitate a >>>>>>>>>>>>>>>> join operation during reads. I'd be keen to discuss approaches >>>>>>>>>>>>>>>> that could >>>>>>>>>>>>>>>> potentially reduce this read-side cost while retaining the >>>>>>>>>>>>>>>> benefits of >>>>>>>>>>>>>>>> separate manifests. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Anoop >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep Sidhu < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am new to the Iceberg community but would love to >>>>>>>>>>>>>>>>> participate in these discussions to reduce the number of file >>>>>>>>>>>>>>>>> writes, >>>>>>>>>>>>>>>>> especially for small writes/commits. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you! >>>>>>>>>>>>>>>>> -Jagdeep >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada >>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We have been hitting all the metadata problems you >>>>>>>>>>>>>>>>>> mentioned, Ryan. I’m on-board to help however I can to >>>>>>>>>>>>>>>>>> improve this area. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ~ Anurag Mantripragada >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng >>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I am interested in this idea and looking forward to >>>>>>>>>>>>>>>>>> collaboration. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Huang-Hsiang >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I am interested in contributing to this effort. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Namratha >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for kicking this thread off Ryan, I'm interested >>>>>>>>>>>>>>>>>>> in helping out here! I've been working on a proposal in >>>>>>>>>>>>>>>>>>> this area and it >>>>>>>>>>>>>>>>>>> would be great to collaborate with different folks and >>>>>>>>>>>>>>>>>>> exchange ideas here, >>>>>>>>>>>>>>>>>>> since I think a lot of people are interested in solving >>>>>>>>>>>>>>>>>>> this problem. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Like Russell’s recent note, I’m starting a thread to >>>>>>>>>>>>>>>>>>>> connect those of us that are interested in the idea of >>>>>>>>>>>>>>>>>>>> changing Iceberg’s >>>>>>>>>>>>>>>>>>>> metadata in v4 so that in most cases committing a change >>>>>>>>>>>>>>>>>>>> only requires >>>>>>>>>>>>>>>>>>>> writing one additional metadata file. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> *Idea: One-file commits* >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The current Iceberg metadata structure requires writing >>>>>>>>>>>>>>>>>>>> at least one manifest and a new manifest list to produce a >>>>>>>>>>>>>>>>>>>> new snapshot. >>>>>>>>>>>>>>>>>>>> The goal of this work is to allow more flexibility by >>>>>>>>>>>>>>>>>>>> allowing the manifest >>>>>>>>>>>>>>>>>>>> list layer to store data and delete files. As a result, >>>>>>>>>>>>>>>>>>>> only one file write >>>>>>>>>>>>>>>>>>>> would be needed before committing the new snapshot. In >>>>>>>>>>>>>>>>>>>> addition, this work >>>>>>>>>>>>>>>>>>>> will also try to explore: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> - Avoiding small manifests that must be read in >>>>>>>>>>>>>>>>>>>> parallel and later compacted (metadata maintenance >>>>>>>>>>>>>>>>>>>> changes) >>>>>>>>>>>>>>>>>>>> - Extend metadata skipping to use aggregated column >>>>>>>>>>>>>>>>>>>> ranges that are compatible with geospatial data >>>>>>>>>>>>>>>>>>>> (manifest metadata) >>>>>>>>>>>>>>>>>>>> - Using soft deletes to avoid rewriting existing >>>>>>>>>>>>>>>>>>>> manifests (metadata DVs) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> If you’re interested in these problems, please reply! >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> John Zhuge >>>>>>>>>>>>>>> >>>>>>>>>>>>>>
