Very excited about the idea! On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson <anoop.k.john...@gmail.com> wrote:
> I'm very interested in this initiative. Micah Kornfield and I presented > <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405> on > high-throughput ingestion for Iceberg tables at the 2024 Iceberg Summit, > which leveraged Google infrastructure like Colossus for efficient appends. > > This new proposal is particularly exciting because it offers significant > advancements in commit latency and metadata storage footprint. Furthermore, > a consistent manifest structure promises to simplify the design and > codebase, which is a major benefit. > > A related idea I've been exploring is having a loose affinity between data > and delete manifests. While the current separation of data and delete > manifests in Iceberg is valuable for avoiding data file rewrites (and stats > updates) when deletes change, it does necessitate a join operation during > reads. I'd be keen to discuss approaches that could potentially reduce this > read-side cost while retaining the benefits of separate manifests. > > Best, > Anoop > > > > On Fri, Jun 13, 2025 at 11:06 AM Jagdeep Sidhu <sidhujagde...@gmail.com> > wrote: > >> Hi everyone, >> >> I am new to the Iceberg community but would love to participate in these >> discussions to reduce the number of file writes, especially for small >> writes/commits. >> >> Thank you! >> -Jagdeep >> >> On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada >> <amantriprag...@apple.com.invalid> wrote: >> >>> We have been hitting all the metadata problems you mentioned, Ryan. I’m >>> on-board to help however I can to improve this area. >>> >>> >>> ~ Anurag Mantripragada >>> >>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng <hua...@apple.com.INVALID> >>> wrote: >>> >>> I am interested in this idea and looking forward to collaboration. >>> >>> Thanks, >>> Huang-Hsiang >>> >>> On Jun 2, 2025, at 10:14 AM, namratha mk <nmk...@gmail.com> wrote: >>> >>> Hello, >>> >>> I am interested in contributing to this effort. >>> >>> Thanks, >>> Namratha >>> >>> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.com> >>> wrote: >>> >>>> Thanks for kicking this thread off Ryan, I'm interested in helping out >>>> here! I've been working on a proposal in this area and it would be great to >>>> collaborate with different folks and exchange ideas here, since I think a >>>> lot of people are interested in solving this problem. >>>> >>>> Thanks, >>>> Amogh Jahagirdar >>>> >>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> Like Russell’s recent note, I’m starting a thread to connect those of >>>>> us that are interested in the idea of changing Iceberg’s metadata in v4 so >>>>> that in most cases committing a change only requires writing one >>>>> additional >>>>> metadata file. >>>>> >>>>> *Idea: One-file commits* >>>>> >>>>> The current Iceberg metadata structure requires writing at least one >>>>> manifest and a new manifest list to produce a new snapshot. The goal of >>>>> this work is to allow more flexibility by allowing the manifest list layer >>>>> to store data and delete files. As a result, only one file write would be >>>>> needed before committing the new snapshot. In addition, this work will >>>>> also >>>>> try to explore: >>>>> >>>>> - Avoiding small manifests that must be read in parallel and later >>>>> compacted (metadata maintenance changes) >>>>> - Extend metadata skipping to use aggregated column ranges that >>>>> are compatible with geospatial data (manifest metadata) >>>>> - Using soft deletes to avoid rewriting existing manifests >>>>> (metadata DVs) >>>>> >>>>> If you’re interested in these problems, please reply! >>>>> >>>>> Ryan >>>>> >>>> >>> >>> -- John Zhuge