Count me in! Do we plan to store this files in columnar format as well? On Fri, May 30, 2025, 04:00 Prashant Singh <prashant010...@gmail.com> wrote:
> I am also super excited about the idea ! I would love to contribute. > > On Thu, May 29, 2025 at 6:54 PM Yufei Gu <flyrain...@gmail.com> wrote: > >> BTW, does it make sense to take metadata json file into consideration as >>> well? Currently it is just a large json string containing all snapshots. >>> Since it is also on the critical path of a commit, I'm not sure if we can >>> explore incremental semantics on it together with manifest list files to >>> reduce the commit overhead. >> >> >> For metadata.json file, the REST APIs have provided an incremental style >> update already via a variety of table update requests. The community is >> also working on the lift of a mandatory physical metadata.json file in the >> storage, in which case, the REST catalog doesn't have to deal with file IO >> anymore. Metadata.json could live within a key-value, RDMS or even just in >> memory. >> >> Yufei >> >> >> On Thu, May 29, 2025 at 6:35 PM Gang Wu <ust...@gmail.com> wrote: >> >>> This is a long-awaited discussion! >>> >>> BTW, does it make sense to take metadata json file into consideration as >>> well? Currently it is just a large json string containing all snapshots. >>> Since it is also on the critical path of a commit, I'm not sure if we can >>> explore incremental semantics on it together with manifest list files to >>> reduce the commit overhead. >>> >>> Best, >>> Gang >>> >>> On Fri, May 30, 2025 at 7:10 AM Steven Wu <stevenz...@gmail.com> wrote: >>> >>>> This will be great for users. metadata can self adapt. Start with a >>>> compacted one file. As the table grows in size, the metadata can adapt to a >>>> tree or linked structure. >>>> >>>> On Thu, May 29, 2025 at 3:44 PM Russell Spitzer < >>>> russell.spit...@gmail.com> wrote: >>>> >>>>> I’m also super excited about this idea >>>>> >>>>> On Thu, May 29, 2025 at 3:37 PM Amogh Jahagirdar <2am...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks for kicking this thread off Ryan, I'm interested in helping >>>>>> out here! I've been working on a proposal in this area and it would be >>>>>> great to collaborate with different folks and exchange ideas here, since >>>>>> I >>>>>> think a lot of people are interested in solving this problem. >>>>>> >>>>>> Thanks, >>>>>> Amogh Jahagirdar >>>>>> >>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote: >>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> Like Russell’s recent note, I’m starting a thread to connect those >>>>>>> of us that are interested in the idea of changing Iceberg’s metadata in >>>>>>> v4 >>>>>>> so that in most cases committing a change only requires writing one >>>>>>> additional metadata file. >>>>>>> >>>>>>> *Idea: One-file commits* >>>>>>> >>>>>>> The current Iceberg metadata structure requires writing at least one >>>>>>> manifest and a new manifest list to produce a new snapshot. The goal of >>>>>>> this work is to allow more flexibility by allowing the manifest list >>>>>>> layer >>>>>>> to store data and delete files. As a result, only one file write would >>>>>>> be >>>>>>> needed before committing the new snapshot. In addition, this work will >>>>>>> also >>>>>>> try to explore: >>>>>>> >>>>>>> - Avoiding small manifests that must be read in parallel and >>>>>>> later compacted (metadata maintenance changes) >>>>>>> - Extend metadata skipping to use aggregated column ranges that >>>>>>> are compatible with geospatial data (manifest metadata) >>>>>>> - Using soft deletes to avoid rewriting existing manifests >>>>>>> (metadata DVs) >>>>>>> >>>>>>> If you’re interested in these problems, please reply! >>>>>>> >>>>>>> Ryan >>>>>>> >>>>>>