Very excited about the idea!

On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson <anoop.k.john...@gmail.com>
wrote:

> I'm very interested in this initiative. Micah Kornfield and I presented
> <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405> on
> high-throughput ingestion for Iceberg tables at the 2024 Iceberg Summit,
> which leveraged Google infrastructure like Colossus for efficient appends.
>
> This new proposal is particularly exciting because it offers significant
> advancements in commit latency and metadata storage footprint. Furthermore,
> a consistent manifest structure promises to simplify the design and
> codebase, which is a major benefit.
>
> A related idea I've been exploring is having a loose affinity between data
> and delete manifests. While the current separation of data and delete
> manifests in Iceberg is valuable for avoiding data file rewrites (and stats
> updates) when deletes change, it does necessitate a join operation during
> reads. I'd be keen to discuss approaches that could potentially reduce this
> read-side cost while retaining the benefits of separate manifests.
>
> Best,
> Anoop
>
>
>
> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep Sidhu <sidhujagde...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I am new to the Iceberg community but would love to participate in these
>> discussions to reduce the number of file writes, especially for small
>> writes/commits.
>>
>> Thank you!
>> -Jagdeep
>>
>> On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada
>> <amantriprag...@apple.com.invalid> wrote:
>>
>>> We have been hitting all the metadata problems you mentioned, Ryan. I’m
>>> on-board to help however I can to improve this area.
>>>
>>>
>>> ~ Anurag Mantripragada
>>>
>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng <hua...@apple.com.INVALID>
>>> wrote:
>>>
>>> I am interested in this idea and looking forward to collaboration.
>>>
>>> Thanks,
>>> Huang-Hsiang
>>>
>>> On Jun 2, 2025, at 10:14 AM, namratha mk <nmk...@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I am interested in contributing to this effort.
>>>
>>> Thanks,
>>> Namratha
>>>
>>> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for kicking this thread off Ryan, I'm interested in helping out
>>>> here! I've been working on a proposal in this area and it would be great to
>>>> collaborate with different folks and exchange ideas here, since I think a
>>>> lot of people are interested in solving this problem.
>>>>
>>>> Thanks,
>>>> Amogh Jahagirdar
>>>>
>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Like Russell’s recent note, I’m starting a thread to connect those of
>>>>> us that are interested in the idea of changing Iceberg’s metadata in v4 so
>>>>> that in most cases committing a change only requires writing one 
>>>>> additional
>>>>> metadata file.
>>>>>
>>>>> *Idea: One-file commits*
>>>>>
>>>>> The current Iceberg metadata structure requires writing at least one
>>>>> manifest and a new manifest list to produce a new snapshot. The goal of
>>>>> this work is to allow more flexibility by allowing the manifest list layer
>>>>> to store data and delete files. As a result, only one file write would be
>>>>> needed before committing the new snapshot. In addition, this work will 
>>>>> also
>>>>> try to explore:
>>>>>
>>>>>    - Avoiding small manifests that must be read in parallel and later
>>>>>    compacted (metadata maintenance changes)
>>>>>    - Extend metadata skipping to use aggregated column ranges that
>>>>>    are compatible with geospatial data (manifest metadata)
>>>>>    - Using soft deletes to avoid rewriting existing manifests
>>>>>    (metadata DVs)
>>>>>
>>>>> If you’re interested in these problems, please reply!
>>>>>
>>>>> Ryan
>>>>>
>>>>
>>>
>>>

-- 
John Zhuge

Reply via email to