Count me in!
Do we plan to store this files in columnar format as well?

On Fri, May 30, 2025, 04:00 Prashant Singh <prashant010...@gmail.com> wrote:

> I am also super excited about the idea ! I would love to contribute.
>
> On Thu, May 29, 2025 at 6:54 PM Yufei Gu <flyrain...@gmail.com> wrote:
>
>> BTW, does it make sense to take metadata json file into consideration as
>>> well? Currently it is just a large json string containing all snapshots.
>>> Since it is also on the critical path of a commit, I'm not sure if we can
>>> explore incremental semantics on it together with manifest list files to
>>> reduce the commit overhead.
>>
>>
>> For metadata.json file, the REST APIs have provided an incremental style
>> update already via a variety of table update requests. The community is
>> also working on the lift of a mandatory physical metadata.json file in the
>> storage, in which case, the REST catalog doesn't have to deal with file IO
>> anymore. Metadata.json could live within a key-value, RDMS or even just in
>> memory.
>>
>> Yufei
>>
>>
>> On Thu, May 29, 2025 at 6:35 PM Gang Wu <ust...@gmail.com> wrote:
>>
>>> This is a long-awaited discussion!
>>>
>>> BTW, does it make sense to take metadata json file into consideration as
>>> well? Currently it is just a large json string containing all snapshots.
>>> Since it is also on the critical path of a commit, I'm not sure if we can
>>> explore incremental semantics on it together with manifest list files to
>>> reduce the commit overhead.
>>>
>>> Best,
>>> Gang
>>>
>>> On Fri, May 30, 2025 at 7:10 AM Steven Wu <stevenz...@gmail.com> wrote:
>>>
>>>> This will be great for users. metadata can self adapt. Start with a
>>>> compacted one file. As the table grows in size, the metadata can adapt to a
>>>> tree or linked structure.
>>>>
>>>> On Thu, May 29, 2025 at 3:44 PM Russell Spitzer <
>>>> russell.spit...@gmail.com> wrote:
>>>>
>>>>> I’m also super excited about this idea
>>>>>
>>>>> On Thu, May 29, 2025 at 3:37 PM Amogh Jahagirdar <2am...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for kicking this thread off Ryan, I'm interested in helping
>>>>>> out here! I've been working on a proposal in this area and it would be
>>>>>> great to collaborate with different folks and exchange ideas here, since 
>>>>>> I
>>>>>> think a lot of people are interested in solving this problem.
>>>>>>
>>>>>> Thanks,
>>>>>> Amogh Jahagirdar
>>>>>>
>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Like Russell’s recent note, I’m starting a thread to connect those
>>>>>>> of us that are interested in the idea of changing Iceberg’s metadata in 
>>>>>>> v4
>>>>>>> so that in most cases committing a change only requires writing one
>>>>>>> additional metadata file.
>>>>>>>
>>>>>>> *Idea: One-file commits*
>>>>>>>
>>>>>>> The current Iceberg metadata structure requires writing at least one
>>>>>>> manifest and a new manifest list to produce a new snapshot. The goal of
>>>>>>> this work is to allow more flexibility by allowing the manifest list 
>>>>>>> layer
>>>>>>> to store data and delete files. As a result, only one file write would 
>>>>>>> be
>>>>>>> needed before committing the new snapshot. In addition, this work will 
>>>>>>> also
>>>>>>> try to explore:
>>>>>>>
>>>>>>>    - Avoiding small manifests that must be read in parallel and
>>>>>>>    later compacted (metadata maintenance changes)
>>>>>>>    - Extend metadata skipping to use aggregated column ranges that
>>>>>>>    are compatible with geospatial data (manifest metadata)
>>>>>>>    - Using soft deletes to avoid rewriting existing manifests
>>>>>>>    (metadata DVs)
>>>>>>>
>>>>>>> If you’re interested in these problems, please reply!
>>>>>>>
>>>>>>> Ryan
>>>>>>>
>>>>>>

Reply via email to