My takeaway from the conversation is also that we don't need row-level column updates. Manifest DV can be used for row-level updates instead. Basically, a file (manifest or data) can be updated via (1) delete vector + updated rows in a new file (2) column file overlay. Depends on the percentage of modified rows, engines can choose which way to go.
On Tue, Mar 3, 2026 at 6:24 AM Gábor Kaszab <[email protected]> wrote: > Thanks for the summary, Micah! I tried to watch the recording linked to > the calendar event, but apparently I don't have permission to do so. Not > sure about others. > > So if 'm not mistaken, one way to reduce the write cost of an UPDATE for > colocated DVs is to use the column updates. As I see there was some > agreement that row-level partial column updates aren't desired, and we aim > for at least file-level column updates. This is very useful information for > the other conversation > <https://lists.apache.org/thread/w90rqyhmh6pb0yxp0bqzgzk1y1rotyny> going > on for the column update proposal. We can bring this up on the column > update sync tomorrow, but I'm wondering if the consensus on avoiding > row-level column updates is something we can incorporate into the column > update proposal too or if it's something still up to debate. > > Best Regards, > Gabor > > Micah Kornfield <[email protected]> ezt írta (időpont: 2026. febr. > 25., Sze, 22:30): > >> Just wanted to summarize my main takeaways of Monday's sync. >> >> The approach will always collocate DVs with the data files (i.e. every >> data file row in a manifest has an optional DV reference). This implies >> that there is not a separate "Deletion manifest". Rather in V4 all >> manifests are "combined" where data files and DVs are colocated. >> >> Write amplification is avoided in two ways: >> 1. For small updates we will need to carry through metadata statistics >> (and other relevant data file fields) in memory (rescanning these is likely >> two expensive). Once updates are available they will be written out a >> new manifest (either root or leaf) and use metadata DVs to remove the old >> rows. >> 2. For larger updates we will only carry through the DV update parts in >> memory and use column level updates to replace existing DVs (this would >> require rescanning the DV columns for any updated manifest to merge with >> the updated DVs in memory, and then writing out the column update). The >> consensus on the call is that we didn't want to support partial column >> updates (a.k.a. merge-on-read column updates). >> >> The idea is that engines would decide which path to follow based on the >> number of affected files. >> >> To help understand the implications of the new proposal, I put together a >> quick spreadsheet [1] to analyze trade-offs between separate deletion >> manifests and the new approach under scenario 1 and 2. This represents the >> worst case scenario where file updates are uniformly distributed across a >> single update operation. It does not account for repeated writes (e.g. >> on-going compaction). My main take-aways is that keeping at most 1 >> affiliated DV separate might still help (akin to a merge on read column >> update), but maybe not enough relative to other parts of the system (e.g. >> the churn on data files) that the complexity. >> >> Hope this is helpful. >> >> Micah >> >> [1] >> https://docs.google.com/spreadsheets/d/1klZQxV7ST2C-p9LTMmai_5rtFiyupj6jSLRPRkdI-u8/edit?gid=0#gid=0 >> >> >> >> On Thu, Feb 19, 2026 at 3:52 PM Amogh Jahagirdar <[email protected]> >> wrote: >> >>> Hey folks, I've set up an additional initial discussion on DVs for >>> Monday. This topic is fairly complex and there is also now a free calendar >>> slot. I think it'd be helpful for us to first make sure we're all on the >>> same page in terms of what the approach proposed by Anton earlier in the >>> thread means and the high level mechanics. I should also have more to share >>> on the doc about how the entry structure and change detection could look >>> like in this approach. Then on Thursday we can get into more details and >>> targeted points of discussion on this topic. >>> >>> Thanks, >>> Amogh Jahagirdar >>> >>> On Tue, Feb 17, 2026 at 9:27 PM Amogh Jahagirdar <[email protected]> >>> wrote: >>> >>>> Thanks Steven! I've set up some time next Thursday for the community to >>>> discuss this. We're also looking at how the content entry would look like >>>> in a combined DV with potential column updates for DV changes, and how >>>> change detection could look like in this approach. I should have more to >>>> share on this by the time of the community discussion next week. >>>> We should also consider potential root churn and memory consumption >>>> stemming from expected root entry inflation due to a combined data file + >>>> DV entry with possible column updates for certain DV workloads; though at >>>> least for memory consumption of stats being held after planning, that >>>> arguably is an implementation problem for certain integrations. >>>> >>>> Thanks, >>>> Amogh Jahagirdar >>>> >>>> On Fri, Feb 13, 2026 at 10:58 AM Steven Wu <[email protected]> >>>> wrote: >>>> >>>>> I wrote up some analysis with back-of-the-envelope calculations about >>>>> the column update approach for DV colocation. It mainly concerns the 2nd >>>>> use case: deleting a large number of rows from a small number of files. >>>>> >>>>> >>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.gvdulzy486n7 >>>>> >>>>> >>>>> >>>>> On Wed, Feb 4, 2026 at 1:02 AM Péter Váry <[email protected]> >>>>> wrote: >>>>> >>>>>> I fully agree with Anton and Steven that we need benchmarks before >>>>>> choosing any direction. >>>>>> >>>>>> I ran some preliminary column‑stitching benchmarks last summer: >>>>>> >>>>>> - Results are available in the doc: >>>>>> >>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww >>>>>> - Code is here: https://github.com/apache/iceberg/pull/13306 >>>>>> >>>>>> I’ve summarized the most relevant results at the end of this email. >>>>>> They show roughly a 10% slowdown on the read path with column stitching >>>>>> in >>>>>> similar scenarios when using local SSDs. I expect that in real >>>>>> deployments >>>>>> the metadata read cost will mostly be driven by blob I/O (assuming no >>>>>> caching). If blob access becomes the dominant factor in read latency, >>>>>> multithreaded fetching should be able to absorb the overhead introduced >>>>>> by >>>>>> column stitching, resulting in latency similar to the single‑file layout >>>>>> (unless IO is already the bottleneck) >>>>>> >>>>>> We should definitely rerun the benchmarks once we have a clearer >>>>>> understanding of the intended usage patterns. >>>>>> Thanks, >>>>>> Peter >>>>>> >>>>>> >>>>>> The relevant(ish) results are for 100 columns, with 2 families with >>>>>> 50-50 columns and local read: >>>>>> >>>>>> The base is: >>>>>> MultiThreadedParquetBenchmark.read 100 0 >>>>>> false ss 20 3.739 ± 0.096 s/op >>>>>> >>>>>> The read for single threaded: >>>>>> MultiThreadedParquetBenchmark.read 100 2 >>>>>> false ss 20 4.036 ± 0.082 s/op >>>>>> >>>>>> The read for multi threaded: >>>>>> MultiThreadedParquetBenchmark.read 100 2 >>>>>> true ss 20 4.063 ± 0.080 s/op >>>>>> >>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2026. febr. 3., >>>>>> K, 23:27): >>>>>> >>>>>>> >>>>>>> I agree with Anton in this >>>>>>> <https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o/edit?disco=AAAByzDx21w> >>>>>>> comment thread that we probably need to run benchmarks for a few common >>>>>>> scenarios to guide this decision. We need to write down detailed plans >>>>>>> for >>>>>>> those scenarios and what are we measuring. Also ideally, we want to >>>>>>> measure >>>>>>> using the V4 metadata structure (like Parquet manifest file, column >>>>>>> stats >>>>>>> structs, adaptive tree). There are PoC PRs available for column stats, >>>>>>> Parquet manifest, and root manifest. It would probably be tricky to >>>>>>> piece >>>>>>> them together to run the benchmark considering the PoC status. We also >>>>>>> need >>>>>>> the column stitching capability on the read path to test the column file >>>>>>> approach. >>>>>>> >>>>>>> On Tue, Feb 3, 2026 at 1:53 PM Anoop Johnson <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I'm in favor of co-located DV metadata with column file override >>>>>>>> and not doing affiliated/unaffiliated delete manifests. This is >>>>>>>> conceptually similar to strictly affiliated delete manifests with >>>>>>>> positional joins, and will halve the number of I/Os when there is no DV >>>>>>>> column override. It is simpler to implement >>>>>>>> and will speed up reads. >>>>>>>> >>>>>>>> Unaffiliated DV manifests are flexible for writers. They reduce the >>>>>>>> chance of physical conflicts when there are concurrent large/random >>>>>>>> deletes >>>>>>>> that change DVs on different files in the same manifest. But the >>>>>>>> flexibility comes at a read-time cost. If the number of unaffiliated >>>>>>>> DVs >>>>>>>> exceeds a threshold, it could cause driver OOMs or require distributed >>>>>>>> join >>>>>>>> to pair up DVs with data files. With colocated metadata, manifest DVs >>>>>>>> can >>>>>>>> reduce the chance of conflicts up to a certain write size. >>>>>>>> >>>>>>>> I assume we will still support unaffiliated manifests for equality >>>>>>>> deletes, but perhaps we can restrict it to just equality deletes. >>>>>>>> >>>>>>>> -Anoop >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 2, 2026 at 4:27 PM Anton Okolnychyi < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I added the approach with column files to the doc. >>>>>>>>> >>>>>>>>> To sum up, separate data and delete manifests with affinity >>>>>>>>> would perform somewhat on par with co-located DV metadata (a.k.a. >>>>>>>>> direct >>>>>>>>> assignment) if we add support for column files when we need to >>>>>>>>> replace most >>>>>>>>> or all DVs (use case 1). That said, the support for direct assignment >>>>>>>>> with >>>>>>>>> in-line metadata DVs can help us avoid unaffiliated delete manifests >>>>>>>>> when >>>>>>>>> we need to replace a few DVs (use case 2). >>>>>>>>> >>>>>>>>> So the key question is whether we want to allow >>>>>>>>> unaffiliated delete manifests with DVs... If we don't, then we would >>>>>>>>> likely >>>>>>>>> want to have co-located DV metadata and must support efficient column >>>>>>>>> updates not to regress compared to V2 and V3 for large MERGE jobs that >>>>>>>>> modify a small set of records for most files. >>>>>>>>> >>>>>>>>> пн, 2 лют. 2026 р. о 13:20 Anton Okolnychyi <[email protected]> >>>>>>>>> пише: >>>>>>>>> >>>>>>>>>> Anoop, correct, if we keep data and delete manifests separate, >>>>>>>>>> there is a better way to combine the entries and we should NOT rely >>>>>>>>>> on the >>>>>>>>>> referenced data file path. Reconciling by implicit position will >>>>>>>>>> reduce the >>>>>>>>>> size of the DV entry (no need to store the referenced data file >>>>>>>>>> path) and >>>>>>>>>> will improve the planning performance (no equals/hashCode on the >>>>>>>>>> path). >>>>>>>>>> >>>>>>>>>> Steven, I agree. Most notes in the doc pre-date discussions we >>>>>>>>>> had on column updates. You are right, given that we are gravitating >>>>>>>>>> towards >>>>>>>>>> a native way to handle column updates, it seems logical to use the >>>>>>>>>> same >>>>>>>>>> approach for replacing DVs, since they’re essentially column >>>>>>>>>> updates. Let >>>>>>>>>> me add one more approach to the doc based on what Anurag and Peter >>>>>>>>>> have so >>>>>>>>>> far. >>>>>>>>>> >>>>>>>>>> нд, 1 лют. 2026 р. о 20:59 Steven Wu <[email protected]> пише: >>>>>>>>>> >>>>>>>>>>> Anton, thanks for raising this. I agree this deserves another >>>>>>>>>>> look. I added a comment in your doc that we can potentially apply >>>>>>>>>>> the >>>>>>>>>>> column update proposal for data file update to the manifest file >>>>>>>>>>> updates as >>>>>>>>>>> well, to colocate the data DV and data manifest files. Data DVs can >>>>>>>>>>> be a >>>>>>>>>>> separate column in the data manifest file and updated separately in >>>>>>>>>>> a >>>>>>>>>>> column file. This is the same as the coalesced positional join that >>>>>>>>>>> Anoop >>>>>>>>>>> mentioned. >>>>>>>>>>> >>>>>>>>>>> On Sun, Feb 1, 2026 at 4:14 PM Anoop Johnson <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thank you for raising this, Anton. I had a similar observation >>>>>>>>>>>> while prototyping >>>>>>>>>>>> <https://github.com/apache/iceberg/pull/14533> the >>>>>>>>>>>> adaptive metadata tree. The overhead of doing a path-based hash >>>>>>>>>>>> join of a >>>>>>>>>>>> data manifest with the affiliated delete manifest is high: my >>>>>>>>>>>> estimate was >>>>>>>>>>>> that the join adds about 5-10% overhead. The hash table >>>>>>>>>>>> build/probe alone >>>>>>>>>>>> takes about 5 ms for manifests with 25K entries. There are engines >>>>>>>>>>>> that can >>>>>>>>>>>> do vectorized hash joins that can lower this, but the overhead and >>>>>>>>>>>> complexity of a SIMD-friendly hash join is non-trivial. >>>>>>>>>>>> >>>>>>>>>>>> An alternative to relying on the external file feature in >>>>>>>>>>>> Parquet, is to make affiliated manifests order-preserving: ie DVs >>>>>>>>>>>> in an >>>>>>>>>>>> affiliated delete manifest must appear in the same position as the >>>>>>>>>>>> corresponding data file in the data manifest the delete manifest is >>>>>>>>>>>> affiliated to. If a data file does not have a DV, the DV manifest >>>>>>>>>>>> must >>>>>>>>>>>> store a NULL. This would allow us to do positional joins, which >>>>>>>>>>>> are much >>>>>>>>>>>> faster. If we wanted, we could even have multiple affiliated DV >>>>>>>>>>>> manifests >>>>>>>>>>>> for a data manifest and the reader would do a COALESCED positional >>>>>>>>>>>> join >>>>>>>>>>>> (i.e. pick the first non-null value as the DV). It puts the sorting >>>>>>>>>>>> responsibility to the writers, but it might be a reasonable >>>>>>>>>>>> tradeoff. >>>>>>>>>>>> >>>>>>>>>>>> Also, the options don't necessarily have to be mutually >>>>>>>>>>>> exclusive. We could still allow affiliated DVs to be "folded" into >>>>>>>>>>>> data >>>>>>>>>>>> manifest (e.g. by background optimization jobs or the writer >>>>>>>>>>>> itself). That >>>>>>>>>>>> might be the optimal choice for read-heavy tables because it will >>>>>>>>>>>> halve the >>>>>>>>>>>> number of I/Os readers have to make. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Anoop >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jan 30, 2026 at 6:03 PM Anton Okolnychyi < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I had a chance to catch up on some of the V4 discussions. >>>>>>>>>>>>> Given that we are getting rid of the manifest list and switching >>>>>>>>>>>>> to >>>>>>>>>>>>> Parquet, I wanted to re-evaluate the possibility of direct DV >>>>>>>>>>>>> assignment >>>>>>>>>>>>> that we discarded in V3 to avoid regressions. I have put together >>>>>>>>>>>>> my >>>>>>>>>>>>> thoughts in a doc [1]. >>>>>>>>>>>>> >>>>>>>>>>>>> TL;DR: >>>>>>>>>>>>> >>>>>>>>>>>>> - I think the current V4 proposal that keeps data and delete >>>>>>>>>>>>> manifests separate but introduces affinity is a solid choice for >>>>>>>>>>>>> cases when >>>>>>>>>>>>> we need to replace DVs in many / most files. I outlined an >>>>>>>>>>>>> approach with >>>>>>>>>>>>> column-split Parquet files but it doesn't improve the performance >>>>>>>>>>>>> and takes >>>>>>>>>>>>> dependency on a portion of the Parquet spec that is not really >>>>>>>>>>>>> implemented. >>>>>>>>>>>>> - Pushing unaffiliated DVs directly into the root to replace a >>>>>>>>>>>>> small set of DVs is going to be fast on write but does require >>>>>>>>>>>>> resolving >>>>>>>>>>>>> where those DVs apply at read time. Using inline metadata DVs with >>>>>>>>>>>>> column-split Parquet files is a little more promising in this >>>>>>>>>>>>> case as it >>>>>>>>>>>>> allows to avoid unaffiliated DVs. That said, it again relies on >>>>>>>>>>>>> something >>>>>>>>>>>>> Parquet doesn't implement right now, requires changing maintenance >>>>>>>>>>>>> operations, and yields minimal benefits. >>>>>>>>>>>>> >>>>>>>>>>>>> All in all, the V4 proposal seems like a strict improvement >>>>>>>>>>>>> over V3 but I insist that we reconsider usage of the referenced >>>>>>>>>>>>> data file >>>>>>>>>>>>> path when resolving DVs to data files. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] - >>>>>>>>>>>>> https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o >>>>>>>>>>>>> >>>>>>>>>>>>> - Anton >>>>>>>>>>>>> >>>>>>>>>>>>> сб, 22 лист. 2025 р. о 13:37 Amogh Jahagirdar < >>>>>>>>>>>>> [email protected]> пише: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here is the meeting recording >>>>>>>>>>>>>> <https://drive.google.com/file/d/1lG9sM-JTwqcIgk7JsAryXXCc1vMnstJs/view?usp=sharing> >>>>>>>>>>>>>> and generated meeting summary >>>>>>>>>>>>>> <https://docs.google.com/document/d/1e50p8TXL2e3CnUwKMOvm8F4s2PeVMiKWHPxhxOW1fIM/edit?usp=sharing>. >>>>>>>>>>>>>> Thanks all for attending yesterday! >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Nov 20, 2025 at 8:49 AM Amogh Jahagirdar < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I was out for some time, but set up a sync for tomorrow at >>>>>>>>>>>>>>> 9am PST. For this discussion, I do think it would be great to >>>>>>>>>>>>>>> focus on the >>>>>>>>>>>>>>> manifest DV representation, factoring in analyses on bitmap >>>>>>>>>>>>>>> representation >>>>>>>>>>>>>>> storage footprints, and the entry structure considering how we >>>>>>>>>>>>>>> want to >>>>>>>>>>>>>>> approach change detection. If there are other topics that >>>>>>>>>>>>>>> people want to >>>>>>>>>>>>>>> highlight, please do bring those up as well! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I also recognize that this is a bit short term scheduling, >>>>>>>>>>>>>>> so please do reach out to me if this time is difficult to work >>>>>>>>>>>>>>> with; next >>>>>>>>>>>>>>> week is the Thanksgiving holidays here, and since people would >>>>>>>>>>>>>>> be >>>>>>>>>>>>>>> travelling/out I figured I'd try to schedule before then. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Oct 17, 2025 at 9:03 AM Amogh Jahagirdar < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sorry for the delay, here's the recording link >>>>>>>>>>>>>>>> <https://drive.google.com/file/d/1YOmPROXjAKYAWAcYxqAFHdADbqELVVf2/view> >>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>> last week's discussion. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Oct 10, 2025 at 9:44 AM Péter Váry < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Same here. >>>>>>>>>>>>>>>>> Please record if you can. >>>>>>>>>>>>>>>>> Thanks, Peter >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Oct 10, 2025, 17:39 Fokko Driesprong < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hey Amogh, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the write-up. Unfortunately, I won’t be able >>>>>>>>>>>>>>>>>> to attend. Will it be recorded? Thanks! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>> Fokko >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Op di 7 okt 2025 om 20:36 schreef Amogh Jahagirdar < >>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I've setup time this Friday at 9am PST for another sync >>>>>>>>>>>>>>>>>>> on single file commits. In terms of what would be great to >>>>>>>>>>>>>>>>>>> focus on for the >>>>>>>>>>>>>>>>>>> discussion: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1. Whether it makes sense or not to eliminate the tuple, >>>>>>>>>>>>>>>>>>> and instead representing the tuple via lower/upper >>>>>>>>>>>>>>>>>>> boundaries. As a >>>>>>>>>>>>>>>>>>> reminder, one of the goals is to avoid tying a partition >>>>>>>>>>>>>>>>>>> spec to a >>>>>>>>>>>>>>>>>>> manifest; in the root we can have a mix of files spanning >>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>> partition specs, and even in leaf manifests avoiding this >>>>>>>>>>>>>>>>>>> coupling can >>>>>>>>>>>>>>>>>>> enable more desirable clustering of metadata. >>>>>>>>>>>>>>>>>>> In the vast majority of cases, we could leverage the >>>>>>>>>>>>>>>>>>> property that a file is effectively partitioned if the >>>>>>>>>>>>>>>>>>> lower/upper for a >>>>>>>>>>>>>>>>>>> given field is equal. The nuance here is with the >>>>>>>>>>>>>>>>>>> particular case of >>>>>>>>>>>>>>>>>>> identity partitioned string/binary columns which can be >>>>>>>>>>>>>>>>>>> truncated in stats. >>>>>>>>>>>>>>>>>>> One approach is to require that writers must not produce >>>>>>>>>>>>>>>>>>> truncated stats >>>>>>>>>>>>>>>>>>> for identity partitioned columns. It's also important to >>>>>>>>>>>>>>>>>>> keep in mind that >>>>>>>>>>>>>>>>>>> all of this is just for the purpose of reconstructing the >>>>>>>>>>>>>>>>>>> partition tuple, >>>>>>>>>>>>>>>>>>> which is only required during equality delete matching. >>>>>>>>>>>>>>>>>>> Another area we >>>>>>>>>>>>>>>>>>> need to cover as part of this is on exact bounds on stats. >>>>>>>>>>>>>>>>>>> There are other >>>>>>>>>>>>>>>>>>> options here as well such as making all new equality >>>>>>>>>>>>>>>>>>> deletes in V4 be >>>>>>>>>>>>>>>>>>> global and instead match based on bounds, or keeping the >>>>>>>>>>>>>>>>>>> tuple but each >>>>>>>>>>>>>>>>>>> tuple is effectively based off a union schema of all >>>>>>>>>>>>>>>>>>> partition specs. I am >>>>>>>>>>>>>>>>>>> adding a separate appendix section outlining the span of >>>>>>>>>>>>>>>>>>> options here and >>>>>>>>>>>>>>>>>>> the different tradeoffs. >>>>>>>>>>>>>>>>>>> Once we get this more to a conclusive state, I'll move a >>>>>>>>>>>>>>>>>>> summarized version to the main doc. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2. @[email protected] <[email protected]> has >>>>>>>>>>>>>>>>>>> updated the doc with a section >>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.rrpksmp8zkb#heading=h.qau0y5xkh9mn> >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> how we can do change detection from the root in a variety >>>>>>>>>>>>>>>>>>> of write >>>>>>>>>>>>>>>>>>> scenarios. I've done a review on it, and it covers the >>>>>>>>>>>>>>>>>>> cases I would >>>>>>>>>>>>>>>>>>> expect. It'd be good for folks to take a look and please >>>>>>>>>>>>>>>>>>> give feedback >>>>>>>>>>>>>>>>>>> before we discuss. Thank you Steven for adding that section >>>>>>>>>>>>>>>>>>> and all the >>>>>>>>>>>>>>>>>>> diagrams. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Sep 18, 2025 at 3:19 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hey folks just following up from the discussion last >>>>>>>>>>>>>>>>>>>> Friday with a summary and some next steps: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 1.) For the various change detection cases, we >>>>>>>>>>>>>>>>>>>> concluded it's best just to go through those in an offline >>>>>>>>>>>>>>>>>>>> manner on the >>>>>>>>>>>>>>>>>>>> doc since it's hard to verify all that correctness in a >>>>>>>>>>>>>>>>>>>> large meeting >>>>>>>>>>>>>>>>>>>> setting. >>>>>>>>>>>>>>>>>>>> 2.) We mostly discussed eliminating the >>>>>>>>>>>>>>>>>>>> partition tuple. On the original proposal, I was mostly >>>>>>>>>>>>>>>>>>>> aiming for the >>>>>>>>>>>>>>>>>>>> ability to re-constructing the tuple from the stats for >>>>>>>>>>>>>>>>>>>> the purpose of >>>>>>>>>>>>>>>>>>>> equality delete matching (a file is partitioned if the >>>>>>>>>>>>>>>>>>>> lower and upper >>>>>>>>>>>>>>>>>>>> bounds are equal); There's some nuance in how we need to >>>>>>>>>>>>>>>>>>>> handle identity >>>>>>>>>>>>>>>>>>>> partition values since for string/binary they cannot be >>>>>>>>>>>>>>>>>>>> truncated. >>>>>>>>>>>>>>>>>>>> Another potential option is to treat all equality deletes >>>>>>>>>>>>>>>>>>>> as effectively >>>>>>>>>>>>>>>>>>>> global and narrow their application based on the stats >>>>>>>>>>>>>>>>>>>> values. This may >>>>>>>>>>>>>>>>>>>> require defining tight bounds. I'm still collecting my >>>>>>>>>>>>>>>>>>>> thoughts on this one. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks folks! Please also let me know if any of the >>>>>>>>>>>>>>>>>>>> following links are inaccessible for any reason. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Meeting recording link: >>>>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1gv8TrR5xzqqNxek7_sTZkpbwQx1M3dhK/view >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Meeting summary: >>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/131N0CDpzZczURxitN0HGS7dTqRxQT_YS9jMECkGGvQU >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Mon, Sep 8, 2025 at 3:40 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Update: I moved the discussion time to this Friday at >>>>>>>>>>>>>>>>>>>>> 9 am PST since I found out that quite a few folks >>>>>>>>>>>>>>>>>>>>> involved in the proposals >>>>>>>>>>>>>>>>>>>>> will be out next week, and I also know some folks will >>>>>>>>>>>>>>>>>>>>> also be out the week >>>>>>>>>>>>>>>>>>>>> after that. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> Amogh J >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Mon, Sep 8, 2025 at 8:57 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hey folks sorry for the late follow up here, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks @Kevin Liu <[email protected]> for >>>>>>>>>>>>>>>>>>>>>> sharing the recording link of the previous discussion! >>>>>>>>>>>>>>>>>>>>>> I've set up another >>>>>>>>>>>>>>>>>>>>>> sync for next Tuesday 09/16 at 9am PST. This time I've >>>>>>>>>>>>>>>>>>>>>> set it up from my >>>>>>>>>>>>>>>>>>>>>> corporate email so we can get recordings and >>>>>>>>>>>>>>>>>>>>>> transcriptions (and I've made >>>>>>>>>>>>>>>>>>>>>> sure to keep the meeting invite open so we don't have to >>>>>>>>>>>>>>>>>>>>>> manually let >>>>>>>>>>>>>>>>>>>>>> people in). >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> In terms of next steps of areas which I think would >>>>>>>>>>>>>>>>>>>>>> be good to focus on for establishing consensus: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1. How do we model the manifest entry structure >>>>>>>>>>>>>>>>>>>>>> so that changes to manifest DVs can be obtained easily >>>>>>>>>>>>>>>>>>>>>> from the root? There >>>>>>>>>>>>>>>>>>>>>> are a few options here; the most promising approach is >>>>>>>>>>>>>>>>>>>>>> to keep an >>>>>>>>>>>>>>>>>>>>>> additional DV which encodes the diff in additional >>>>>>>>>>>>>>>>>>>>>> positions which have >>>>>>>>>>>>>>>>>>>>>> been removed from a leaf manifest. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 2. Modeling partition transforms via expressions and >>>>>>>>>>>>>>>>>>>>>> establishing a unified table ID space so that we can >>>>>>>>>>>>>>>>>>>>>> simplify how partition >>>>>>>>>>>>>>>>>>>>>> tuples may be represented via stats and also have a way >>>>>>>>>>>>>>>>>>>>>> in the future to >>>>>>>>>>>>>>>>>>>>>> store stats on any derived column. I have a short >>>>>>>>>>>>>>>>>>>>>> proposal >>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1oV8dapKVzB4pZy5pKHUCj5j9i2_1p37BJSeT7hyKPpg/edit?tab=t.0> >>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>> this that probably still needs some tightening up on the >>>>>>>>>>>>>>>>>>>>>> expression >>>>>>>>>>>>>>>>>>>>>> modeling itself (and some prototyping) but the general >>>>>>>>>>>>>>>>>>>>>> idea for >>>>>>>>>>>>>>>>>>>>>> establishing a unified table ID space is covered. All >>>>>>>>>>>>>>>>>>>>>> feedback welcome! >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Mon, Aug 25, 2025 at 1:34 PM Kevin Liu < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks Amogh. Looks like the recording for last >>>>>>>>>>>>>>>>>>>>>>> week's sync is available on Youtube. Here's the link, >>>>>>>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=uWm-p--8oVQ >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>> Kevin Liu >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 12, 2025 at 9:10 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Just following up on this to give the community as >>>>>>>>>>>>>>>>>>>>>>>> to where we're at and my proposed next steps. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I've been editing and merging the contents from our >>>>>>>>>>>>>>>>>>>>>>>> proposal into the proposal >>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw> >>>>>>>>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>>>>> Russell and others. For any future comments on docs, >>>>>>>>>>>>>>>>>>>>>>>> please comment on the >>>>>>>>>>>>>>>>>>>>>>>> linked proposal. I've also marked it on our doc in red >>>>>>>>>>>>>>>>>>>>>>>> text so it's clear >>>>>>>>>>>>>>>>>>>>>>>> to redirect to the other proposal as a source of truth >>>>>>>>>>>>>>>>>>>>>>>> for comments. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> In terms of next steps, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 1. An important design decision point is around >>>>>>>>>>>>>>>>>>>>>>>> inline manifest DVs, external manifest DVs or enabling >>>>>>>>>>>>>>>>>>>>>>>> both. I'm working on >>>>>>>>>>>>>>>>>>>>>>>> measuring different approaches for representing the >>>>>>>>>>>>>>>>>>>>>>>> compressed DV >>>>>>>>>>>>>>>>>>>>>>>> representation since that will inform how many entries >>>>>>>>>>>>>>>>>>>>>>>> can reasonably fit >>>>>>>>>>>>>>>>>>>>>>>> in a small root manifest; from that we can derive >>>>>>>>>>>>>>>>>>>>>>>> implications on different >>>>>>>>>>>>>>>>>>>>>>>> write patterns and determine the right approach for >>>>>>>>>>>>>>>>>>>>>>>> storing these manifest >>>>>>>>>>>>>>>>>>>>>>>> DVs. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 2. Another key point is around determining if/how >>>>>>>>>>>>>>>>>>>>>>>> we can reasonably enable V4 to represent changes in >>>>>>>>>>>>>>>>>>>>>>>> the root manifest so >>>>>>>>>>>>>>>>>>>>>>>> that readers can effectively just infer file level >>>>>>>>>>>>>>>>>>>>>>>> changes from the root. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 3. One of the aspects of the proposal is getting >>>>>>>>>>>>>>>>>>>>>>>> away from partition tuple requirement in the root >>>>>>>>>>>>>>>>>>>>>>>> which currently holds us >>>>>>>>>>>>>>>>>>>>>>>> to have associativity between a partition spec and a >>>>>>>>>>>>>>>>>>>>>>>> manifest. These >>>>>>>>>>>>>>>>>>>>>>>> aspects can be modeled as essentially column stats >>>>>>>>>>>>>>>>>>>>>>>> which gives a lot of >>>>>>>>>>>>>>>>>>>>>>>> flexibility into the organization of the manifest. >>>>>>>>>>>>>>>>>>>>>>>> There are important >>>>>>>>>>>>>>>>>>>>>>>> details around field ID spaces here which tie into how >>>>>>>>>>>>>>>>>>>>>>>> the stats are >>>>>>>>>>>>>>>>>>>>>>>> structured. What we're proposing here is to have a >>>>>>>>>>>>>>>>>>>>>>>> unified expression ID >>>>>>>>>>>>>>>>>>>>>>>> space that could also benefit us for storing things >>>>>>>>>>>>>>>>>>>>>>>> like virtual columns >>>>>>>>>>>>>>>>>>>>>>>> down the line. I go into this in the proposal but I'm >>>>>>>>>>>>>>>>>>>>>>>> working on separating >>>>>>>>>>>>>>>>>>>>>>>> the appropriate parts so that the original proposal >>>>>>>>>>>>>>>>>>>>>>>> can mostly just focus >>>>>>>>>>>>>>>>>>>>>>>> on the organization of the content metadata tree and >>>>>>>>>>>>>>>>>>>>>>>> not how we want to >>>>>>>>>>>>>>>>>>>>>>>> solve this particular ID space problem. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 4. I'm planning on scheduling a recurring community >>>>>>>>>>>>>>>>>>>>>>>> sync starting next Tuesday at 9am PST, every 2 weeks. >>>>>>>>>>>>>>>>>>>>>>>> If I get feedback >>>>>>>>>>>>>>>>>>>>>>>> from folks that this time will never work, I can >>>>>>>>>>>>>>>>>>>>>>>> certainly adjust. For some >>>>>>>>>>>>>>>>>>>>>>>> reason, I don't have the ability to add to the Iceberg >>>>>>>>>>>>>>>>>>>>>>>> Dev calendar, so >>>>>>>>>>>>>>>>>>>>>>>> I'll figure that out and update the thread when the >>>>>>>>>>>>>>>>>>>>>>>> event is scheduled. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I think this is a great way forward, starting out >>>>>>>>>>>>>>>>>>>>>>>>> with this much parallel development shows that we >>>>>>>>>>>>>>>>>>>>>>>>> have a lot of consensus >>>>>>>>>>>>>>>>>>>>>>>>> already :) >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 12:42 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hey folks, just following up on this. It looks >>>>>>>>>>>>>>>>>>>>>>>>>> like our proposal and the proposal that @Russell >>>>>>>>>>>>>>>>>>>>>>>>>> Spitzer <[email protected]> shared are >>>>>>>>>>>>>>>>>>>>>>>>>> pretty aligned. I was just chatting with Russell >>>>>>>>>>>>>>>>>>>>>>>>>> about this, and we think >>>>>>>>>>>>>>>>>>>>>>>>>> it'd be best to combine both proposals and have a >>>>>>>>>>>>>>>>>>>>>>>>>> singular large effort on >>>>>>>>>>>>>>>>>>>>>>>>>> this. I can also set up a focused community >>>>>>>>>>>>>>>>>>>>>>>>>> discussion (similar to what >>>>>>>>>>>>>>>>>>>>>>>>>> we're doing on the other V4 proposals) on this >>>>>>>>>>>>>>>>>>>>>>>>>> starting sometime next week >>>>>>>>>>>>>>>>>>>>>>>>>> just to get things moving, if that works for people. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 9:48 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hey Russell, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for sharing the proposal! A few of us >>>>>>>>>>>>>>>>>>>>>>>>>>> (Ryan, Dan, Anoop and I) have also been working on >>>>>>>>>>>>>>>>>>>>>>>>>>> a proposal for an >>>>>>>>>>>>>>>>>>>>>>>>>>> adaptive metadata tree structure as part of >>>>>>>>>>>>>>>>>>>>>>>>>>> enabling more efficient one >>>>>>>>>>>>>>>>>>>>>>>>>>> file commits. From a read of the summary, it's >>>>>>>>>>>>>>>>>>>>>>>>>>> great to see that we're >>>>>>>>>>>>>>>>>>>>>>>>>>> thinking along the same lines about how to tackle >>>>>>>>>>>>>>>>>>>>>>>>>>> this fundamental area! >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Here is our proposal: >>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0 >>>>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 8:08 PM Russell Spitzer < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey y'all! >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> We (Yi Fang, Steven Wu and Myself) wanted to >>>>>>>>>>>>>>>>>>>>>>>>>>>> share some >>>>>>>>>>>>>>>>>>>>>>>>>>>> of the thoughts we had on how one-file commits >>>>>>>>>>>>>>>>>>>>>>>>>>>> could work in Iceberg. This is pretty >>>>>>>>>>>>>>>>>>>>>>>>>>>> much just a high level overview of the concepts >>>>>>>>>>>>>>>>>>>>>>>>>>>> we think we need and how Iceberg would behave. >>>>>>>>>>>>>>>>>>>>>>>>>>>> We haven't gone very far into the actual >>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation and changes that would need to >>>>>>>>>>>>>>>>>>>>>>>>>>>> occur in the >>>>>>>>>>>>>>>>>>>>>>>>>>>> SDK to make this happen. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The high level summary is: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Manifest Lists are out >>>>>>>>>>>>>>>>>>>>>>>>>>>> Root Manifests take their place >>>>>>>>>>>>>>>>>>>>>>>>>>>> A Root manifest can have data manifests, >>>>>>>>>>>>>>>>>>>>>>>>>>>> delete manifests, manifest delete vectors, data >>>>>>>>>>>>>>>>>>>>>>>>>>>> delete vectors and data >>>>>>>>>>>>>>>>>>>>>>>>>>>> files >>>>>>>>>>>>>>>>>>>>>>>>>>>> Manifest delete vectors allow for modifying a >>>>>>>>>>>>>>>>>>>>>>>>>>>> manifest without deleting it entirely >>>>>>>>>>>>>>>>>>>>>>>>>>>> Data files let you append without writing an >>>>>>>>>>>>>>>>>>>>>>>>>>>> intermediary manifest >>>>>>>>>>>>>>>>>>>>>>>>>>>> Having child data and delete manifests lets >>>>>>>>>>>>>>>>>>>>>>>>>>>> you still scale >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Please take a look if you like, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm excited to see what other proposals and >>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideas are floating around the community, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Russ >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge < >>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very excited about the idea! >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson < >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm very interested in this initiative. Micah >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Kornfield and I presented >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on high-throughput ingestion for Iceberg tables >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the 2024 Iceberg Summit, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which leveraged Google infrastructure like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Colossus for efficient appends. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This new proposal is particularly exciting >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it offers significant advancements in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> commit latency and metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> storage footprint. Furthermore, a consistent >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifest structure promises to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simplify the design and codebase, which is a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> major benefit. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A related idea I've been exploring is having >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a loose affinity between data and delete >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifests. While the current >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separation of data and delete manifests in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg is valuable for avoiding >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data file rewrites (and stats updates) when >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deletes change, it does >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessitate a join operation during reads. I'd >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be keen to discuss >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approaches that could potentially reduce this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> read-side cost while >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retaining the benefits of separate manifests. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Anoop >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sidhu <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am new to the Iceberg community but would >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> love to participate in these discussions to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce the number of file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> writes, especially for small writes/commits. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jagdeep >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mantripragada <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We have been hitting all the metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problems you mentioned, Ryan. I’m on-board to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help however I can to improve >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this area. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~ Anurag Mantripragada >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheng <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am interested in this idea and looking >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forward to collaboration. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Huang-Hsiang >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am interested in contributing to this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Namratha >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for kicking this thread off Ryan, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm interested in helping out here! I've been >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working on a proposal in this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area and it would be great to collaborate >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with different folks and exchange >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas here, since I think a lot of people are >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in solving this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Like Russell’s recent note, I’m starting >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a thread to connect those of us that are >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in the idea of changing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg’s metadata in v4 so that in most >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases committing a change only >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires writing one additional metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *Idea: One-file commits* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The current Iceberg metadata structure >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires writing at least one manifest and a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new manifest list to produce a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new snapshot. The goal of this work is to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allow more flexibility by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allowing the manifest list layer to store >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data and delete files. As a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> result, only one file write would be needed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before committing the new >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snapshot. In addition, this work will also >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try to explore: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Avoiding small manifests that must >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be read in parallel and later compacted >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (metadata maintenance changes) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Extend metadata skipping to use >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aggregated column ranges that are >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatible with geospatial data (manifest >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Using soft deletes to avoid >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting existing manifests (metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DVs) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you’re interested in these problems, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> please reply! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>>>> John Zhuge >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
