Thanks a lot to everyone, I am selected for GSoC 2026 with Apache Software
Foundation as an org and Apache Iceberg as a project.
I will be working on this idea with Russell Spitzer mentoring me and will
keep the community posted.

On Tue, Mar 31, 2026 at 7:40 PM Varun Lakhyani <[email protected]>
wrote:

> Thanks a lot - Russell Spitzer has kindly agreed to mentor this project.
>
> I would also like to confirm if there are any formal steps required for
> mentor registration on the GSoC/ASF side for such vetted ideas. If so, I
> would appreciate any guidance to ensure everything is in order before the
> deadline.
>
> On Tue, Mar 31, 2026 at 6:13 PM Varun Lakhyani <[email protected]>
> wrote:
>
>> Thanks a lot to everyone for vetting this proposal.
>>
>> Today's the deadline for submitting proposals and I have mine ready but I
>> got to know that before 1st April (Tomorrow) all proposals must have
>> accepted mentors.
>> I am looking for a community member whom I list as mentor and If possible
>> I would need someone to register as a mentor for ASF organization and
>> approve my proposal.
>> I have applied using email id: [email protected]
>> Proposal that I will be submitting:
>> https://docs.google.com/document/d/1ZEgzQj1cxt1fQLXh7auZE7E1xCDGkTNt7dNpA0PaG7U/edit?usp=sharing
>> :
>>
>> I have already reached out to Russell Spitzer and Peter Vary directly,
>> but given the tight timeline I wanted to flag this to the broader community
>> as well.
>> Thanks and Apologies for last minute request
>>
>> On Wed, Mar 18, 2026 at 12:19 AM Varun Lakhyani <
>> [email protected]> wrote:
>>
>>> Hey All,
>>>
>>> I previously started a discussion on making Spark readers work in
>>> parallel (asynchronously), which is beneficial in cases with large numbers
>>> of small files such as compaction, and I have worked on a POC, high-level
>>> design, implementation, and benchmarking for various scenarios. I presented
>>> my approach and benchmarking results in the Iceberg Spark sync; the
>>> recording may be available in the Iceberg Spark Community Sync Notes [0].
>>>
>>> I am planning to submit this work as a GSoC 2026 proposal based on this
>>> idea and was advised to seek formal community vetting on the dev mailing
>>> list.
>>>
>>> Previous DISCUSS thread:
>>> https://lists.apache.org/thread/b5jrlyv61lmw867kksw05sot2tro5ybn
>>>
>>> Issue:
>>> https://github.com/apache/iceberg/issues/15287
>>>
>>> Prototype implementation:
>>> https://github.com/apache/iceberg/pull/15341
>>>
>>> Design document and benchmarking details:
>>>
>>> https://docs.google.com/document/d/17vBz5t-gSDdmB0S40MYRceyvmcBSzw9Gii-FcU97Lds/edit?usp=sharing
>>>
>>> Initial benchmarking shows noticeable improvements for workloads
>>> involving many small files, particularly when IO latency is present
>>> (details in the design document).
>>>
>>> Any feedback (+1 / concerns / suggestions) would be appreciated.
>>> I am specifically looking for community consensus on whether this is a
>>> viable direction for Iceberg before formalizing the GSoC proposal. The GSoC
>>> 2026 proposal deadline is March 31 - early feedback would be especially
>>> appreciated.
>>>
>>> [0] Iceberg Spark Community Sync Notes:
>>> https://docs.google.com/document/d/19nno1RoPznbbxKOZZddZNHHafa7XULjbN6RPExdr2n4/edit?usp=sharing
>>> --
>>> Lakhyani Varun
>>> Indian Institute of Technology Roorkee
>>>
>>>

Reply via email to