I'm very interested in this initiative. Micah Kornfield and I presented
<https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405> on
high-throughput ingestion for Iceberg tables at the 2024 Iceberg Summit,
which leveraged Google infrastructure like Colossus for efficient appends.

This new proposal is particularly exciting because it offers significant
advancements in commit latency and metadata storage footprint. Furthermore,
a consistent manifest structure promises to simplify the design and
codebase, which is a major benefit.

A related idea I've been exploring is having a loose affinity between data
and delete manifests. While the current separation of data and delete
manifests in Iceberg is valuable for avoiding data file rewrites (and stats
updates) when deletes change, it does necessitate a join operation during
reads. I'd be keen to discuss approaches that could potentially reduce this
read-side cost while retaining the benefits of separate manifests.

Best,
Anoop



On Fri, Jun 13, 2025 at 11:06 AM Jagdeep Sidhu <sidhujagde...@gmail.com>
wrote:

> Hi everyone,
>
> I am new to the Iceberg community but would love to participate in these
> discussions to reduce the number of file writes, especially for small
> writes/commits.
>
> Thank you!
> -Jagdeep
>
> On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada
> <amantriprag...@apple.com.invalid> wrote:
>
>> We have been hitting all the metadata problems you mentioned, Ryan. I’m
>> on-board to help however I can to improve this area.
>>
>>
>> ~ Anurag Mantripragada
>>
>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng <hua...@apple.com.INVALID>
>> wrote:
>>
>> I am interested in this idea and looking forward to collaboration.
>>
>> Thanks,
>> Huang-Hsiang
>>
>> On Jun 2, 2025, at 10:14 AM, namratha mk <nmk...@gmail.com> wrote:
>>
>> Hello,
>>
>> I am interested in contributing to this effort.
>>
>> Thanks,
>> Namratha
>>
>> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.com>
>> wrote:
>>
>>> Thanks for kicking this thread off Ryan, I'm interested in helping out
>>> here! I've been working on a proposal in this area and it would be great to
>>> collaborate with different folks and exchange ideas here, since I think a
>>> lot of people are interested in solving this problem.
>>>
>>> Thanks,
>>> Amogh Jahagirdar
>>>
>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Like Russell’s recent note, I’m starting a thread to connect those of
>>>> us that are interested in the idea of changing Iceberg’s metadata in v4 so
>>>> that in most cases committing a change only requires writing one additional
>>>> metadata file.
>>>>
>>>> *Idea: One-file commits*
>>>>
>>>> The current Iceberg metadata structure requires writing at least one
>>>> manifest and a new manifest list to produce a new snapshot. The goal of
>>>> this work is to allow more flexibility by allowing the manifest list layer
>>>> to store data and delete files. As a result, only one file write would be
>>>> needed before committing the new snapshot. In addition, this work will also
>>>> try to explore:
>>>>
>>>>    - Avoiding small manifests that must be read in parallel and later
>>>>    compacted (metadata maintenance changes)
>>>>    - Extend metadata skipping to use aggregated column ranges that are
>>>>    compatible with geospatial data (manifest metadata)
>>>>    - Using soft deletes to avoid rewriting existing manifests
>>>>    (metadata DVs)
>>>>
>>>> If you’re interested in these problems, please reply!
>>>>
>>>> Ryan
>>>>
>>>
>>
>>

Reply via email to