Thanks a lot to everyone, I am selected for GSoC 2026 with Apache Software Foundation as an org and Apache Iceberg as a project. I will be working on this idea with Russell Spitzer mentoring me and will keep the community posted.
On Tue, Mar 31, 2026 at 7:40 PM Varun Lakhyani <[email protected]> wrote: > Thanks a lot - Russell Spitzer has kindly agreed to mentor this project. > > I would also like to confirm if there are any formal steps required for > mentor registration on the GSoC/ASF side for such vetted ideas. If so, I > would appreciate any guidance to ensure everything is in order before the > deadline. > > On Tue, Mar 31, 2026 at 6:13 PM Varun Lakhyani <[email protected]> > wrote: > >> Thanks a lot to everyone for vetting this proposal. >> >> Today's the deadline for submitting proposals and I have mine ready but I >> got to know that before 1st April (Tomorrow) all proposals must have >> accepted mentors. >> I am looking for a community member whom I list as mentor and If possible >> I would need someone to register as a mentor for ASF organization and >> approve my proposal. >> I have applied using email id: [email protected] >> Proposal that I will be submitting: >> https://docs.google.com/document/d/1ZEgzQj1cxt1fQLXh7auZE7E1xCDGkTNt7dNpA0PaG7U/edit?usp=sharing >> : >> >> I have already reached out to Russell Spitzer and Peter Vary directly, >> but given the tight timeline I wanted to flag this to the broader community >> as well. >> Thanks and Apologies for last minute request >> >> On Wed, Mar 18, 2026 at 12:19 AM Varun Lakhyani < >> [email protected]> wrote: >> >>> Hey All, >>> >>> I previously started a discussion on making Spark readers work in >>> parallel (asynchronously), which is beneficial in cases with large numbers >>> of small files such as compaction, and I have worked on a POC, high-level >>> design, implementation, and benchmarking for various scenarios. I presented >>> my approach and benchmarking results in the Iceberg Spark sync; the >>> recording may be available in the Iceberg Spark Community Sync Notes [0]. >>> >>> I am planning to submit this work as a GSoC 2026 proposal based on this >>> idea and was advised to seek formal community vetting on the dev mailing >>> list. >>> >>> Previous DISCUSS thread: >>> https://lists.apache.org/thread/b5jrlyv61lmw867kksw05sot2tro5ybn >>> >>> Issue: >>> https://github.com/apache/iceberg/issues/15287 >>> >>> Prototype implementation: >>> https://github.com/apache/iceberg/pull/15341 >>> >>> Design document and benchmarking details: >>> >>> https://docs.google.com/document/d/17vBz5t-gSDdmB0S40MYRceyvmcBSzw9Gii-FcU97Lds/edit?usp=sharing >>> >>> Initial benchmarking shows noticeable improvements for workloads >>> involving many small files, particularly when IO latency is present >>> (details in the design document). >>> >>> Any feedback (+1 / concerns / suggestions) would be appreciated. >>> I am specifically looking for community consensus on whether this is a >>> viable direction for Iceberg before formalizing the GSoC proposal. The GSoC >>> 2026 proposal deadline is March 31 - early feedback would be especially >>> appreciated. >>> >>> [0] Iceberg Spark Community Sync Notes: >>> https://docs.google.com/document/d/19nno1RoPznbbxKOZZddZNHHafa7XULjbN6RPExdr2n4/edit?usp=sharing >>> -- >>> Lakhyani Varun >>> Indian Institute of Technology Roorkee >>> >>>
