Thanks a lot to everyone for vetting this proposal. Today's the deadline for submitting proposals and I have mine ready but I got to know that before 1st April (Tomorrow) all proposals must have accepted mentors. I am looking for a community member whom I list as mentor and If possible I would need someone to register as a mentor for ASF organization and approve my proposal. I have applied using email id: [email protected] Proposal that I will be submitting: https://docs.google.com/document/d/1ZEgzQj1cxt1fQLXh7auZE7E1xCDGkTNt7dNpA0PaG7U/edit?usp=sharing :
I have already reached out to Russell Spitzer and Peter Vary directly, but given the tight timeline I wanted to flag this to the broader community as well. Thanks and Apologies for last minute request On Wed, Mar 18, 2026 at 12:19 AM Varun Lakhyani <[email protected]> wrote: > Hey All, > > I previously started a discussion on making Spark readers work in parallel > (asynchronously), which is beneficial in cases with large numbers of small > files such as compaction, and I have worked on a POC, high-level design, > implementation, and benchmarking for various scenarios. I presented my > approach and benchmarking results in the Iceberg Spark sync; the recording > may be available in the Iceberg Spark Community Sync Notes [0]. > > I am planning to submit this work as a GSoC 2026 proposal based on this > idea and was advised to seek formal community vetting on the dev mailing > list. > > Previous DISCUSS thread: > https://lists.apache.org/thread/b5jrlyv61lmw867kksw05sot2tro5ybn > > Issue: > https://github.com/apache/iceberg/issues/15287 > > Prototype implementation: > https://github.com/apache/iceberg/pull/15341 > > Design document and benchmarking details: > > https://docs.google.com/document/d/17vBz5t-gSDdmB0S40MYRceyvmcBSzw9Gii-FcU97Lds/edit?usp=sharing > > Initial benchmarking shows noticeable improvements for workloads involving > many small files, particularly when IO latency is present (details in the > design document). > > Any feedback (+1 / concerns / suggestions) would be appreciated. > I am specifically looking for community consensus on whether this is a > viable direction for Iceberg before formalizing the GSoC proposal. The GSoC > 2026 proposal deadline is March 31 - early feedback would be especially > appreciated. > > [0] Iceberg Spark Community Sync Notes: > https://docs.google.com/document/d/19nno1RoPznbbxKOZZddZNHHafa7XULjbN6RPExdr2n4/edit?usp=sharing > -- > Lakhyani Varun > Indian Institute of Technology Roorkee > >
