Yes of course, what do I need to do? On Tue, Mar 31, 2026 at 7:35 AM Varun Lakhyani <[email protected]> wrote:
> Hello Russell > > Today's the deadline for submitting proposals and I have mine ready but I > got to know that before 1st April (Tomorrow) all proposals must have > accepted mentors. > Can I mention you as a potential mentor and If possible I would need you > to register as a mentor for ASF organization and approve my proposal. > I have applied using email id: [email protected] > Attached is my proposal that I will be submitting. > > Thanks and Apologies for last minute request > > On Sat, Mar 21, 2026 at 1:08 PM Varun Lakhyani <[email protected]> > wrote: > >> Thanks a lot Russell, >> Voting thread got a good response. >> I am already working on final proposal to submit, will share it soon, >> >> On Sat, Mar 21, 2026 at 1:48 AM Russell Spitzer < >> [email protected]> wrote: >> >>> That seems about right, (I actually thought it may be even worse) . I >>> will try my best to get to that thread but It's a very busy week for me as >>> Iceberg summit is just 1 work week away for me >>> >>> On Fri, Mar 20, 2026 at 2:21 PM Varun Lakhyani < >>> [email protected]> wrote: >>> >>>> Hi Russell, >>>> >>>> I benchmarked it against AWS S3 as source and destination to get >>>> natural IO overhead for cloud instead of manually adding it. >>>> AWS S3 (1000 files - 14.6 Kb each): >>>> >>>> - >>>> >>>> Sync time : 219.694 s >>>> - >>>> >>>> Async time = 51.853 s >>>> - >>>> >>>> % Improvement = 76.4% >>>> >>>> I think this might help you to get a better overview. >>>> I would really appreciate any feedback on >>>> https://lists.apache.org/thread/rvbwmcbrlr3syd1movflw3vmprm27nmz >>>> >>>> On Wed, Mar 18, 2026 at 12:39 AM Varun Lakhyani < >>>> [email protected]> wrote: >>>> >>>>> Hi Russell, >>>>> >>>>> Thanks again for the discussion and feedback during the Spark sync >>>>> call. >>>>> I have raised a DISCUSS thread on the dev mailing list for formal GSoC >>>>> idea vetting for the Spark readers parallel execution work. I would really >>>>> appreciate it if you could take a look when you get time and share any >>>>> feedback. >>>>> >>>>> Vetting Discussion thread: >>>>> https://lists.apache.org/thread/rvbwmcbrlr3syd1movflw3vmprm27nmz >>>>> >>>>> Further I would check the comet's reader code path and am thinking of >>>>> the next step as going through parallel iterable in the Iceberg codebase >>>>> and making required changes for this use case (if any). >>>>> Thanks >>>>> >>>>> On Wed, Feb 25, 2026 at 12:21 AM Russell Spitzer < >>>>> [email protected]> wrote: >>>>> >>>>>> You can always ping me but you should keep up with the dev mailing >>>>>> list thread, and add a item to the Spark Iceberg Community meetup. You >>>>>> should be able to find it on the dev calendar >>>>>> >>>>>> >>>>>> On Tue, Feb 24, 2026 at 11:19 AM Varun Lakhyani < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hello Russell, >>>>>>> >>>>>>> Apologies for contacting you again personally. Please take a look at >>>>>>> this whenever you are available. >>>>>>> I have completed high level design/POC upto certain level along with >>>>>>> some benchmarking by creating benchmark file similar to other iceberg >>>>>>> services. (Benchmarking numbers seems good to me as of now). >>>>>>> >>>>>>> Please if you can refer those once: >>>>>>> PR with code changes <https://github.com/apache/iceberg/pull/15341> >>>>>>> | Issue raised <https://github.com/apache/iceberg/issues/15287> | >>>>>>> Reference >>>>>>> documents showing detailed approach and benchmarks >>>>>>> <https://docs.google.com/document/d/17vBz5t-gSDdmB0S40MYRceyvmcBSzw9Gii-FcU97Lds/edit?usp=sharing> >>>>>>> >>>>>>> >>>>>>> If you could give me further direction on this. I am keeping dev >>>>>>> mailing thread updated >>>>>>> <https://lists.apache.org/thread/b5jrlyv61lmw867kksw05sot2tro5ybn> >>>>>>> with these but I would need a formal vetting for this idea to get my >>>>>>> proposal considered. >>>>>>> Roughly what could be an appropriate date for me to post for >>>>>>> vetting, Proposal submission starts on 16th March. >>>>>>> >>>>>>> I would be happy to include anything else or try out any different >>>>>>> approach or include any suggestions that you might have. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On Thu, Feb 12, 2026 at 5:21 AM Russell Spitzer < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Looks good to me >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Feb 11, 2026 at 8:43 AM Varun Lakhyani < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Whenever you get a chance, please if you can take a look at this. >>>>>>>>> >>>>>>>>> Thanks[image: ltp|17708280969932948] >>>>>>>>> >>>>>>>>> On Wed, Feb 11, 2026 at 2:00 AM Varun Lakhyani < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hello Russell, >>>>>>>>>> I reviewed both the ideas in detail and I think I'll be able to >>>>>>>>>> work on the 2nd task: Making Spark readers run tasks parallely. >>>>>>>>>> >>>>>>>>>> I went through the BaseReader.java which seems to be >>>>>>>>>> foundational for all readers and each of them have their own >>>>>>>>>> implementations specifically of open() function. >>>>>>>>>> I will figure out a formal proposal till 25th including estimated >>>>>>>>>> designs, task-time distribution till 25th February, official proposal >>>>>>>>>> submission timeline is 16th March - 31st March. >>>>>>>>>> As of now, I have raised this issue/feature request on github and >>>>>>>>>> I am thinking of raising this discussion at dev mailing list of >>>>>>>>>> iceberg to >>>>>>>>>> get it vetted as It has to be approved on project's mailing list for >>>>>>>>>> ASF to >>>>>>>>>> consider proposal. >>>>>>>>>> >>>>>>>>>> I have attached a draft of that discussion that I will raise, >>>>>>>>>> Please let me know your thoughts and If you think there should be any >>>>>>>>>> changes. >>>>>>>>>> Also, can I contact you for any specific clarifications or >>>>>>>>>> anything bothering me related to this project or is there any >>>>>>>>>> appropriate >>>>>>>>>> point of contact for this? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 10, 2026 at 4:11 AM Russell Spitzer < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I definitely think this is medium (if not smaller.) I don't >>>>>>>>>>> think we have any problem with that >>>>>>>>>>> >>>>>>>>>>> Another task idea, and this is probably more of a medium to large >>>>>>>>>>> >>>>>>>>>>> Make our Spark readers optionally function asynchronously for >>>>>>>>>>> tasks with many small files. The general thought there is that when >>>>>>>>>>> we have >>>>>>>>>>> say 1000 4kb files, we currently open them one at a time in order. >>>>>>>>>>> This is >>>>>>>>>>> slow and bad. We should instead try to open some number of those >>>>>>>>>>> data files >>>>>>>>>>> in parallel then stitch them together into a buffer or iterator for >>>>>>>>>>> downstream processing. This is abit of a bigger refactor but would >>>>>>>>>>> dramatically help a lot of use cases in Iceberg including cleanup >>>>>>>>>>> of small >>>>>>>>>>> files in compaction. >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 9, 2026 at 3:46 PM Varun Lakhyani < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Yes, Russell, I am very much interested in working on tasks or >>>>>>>>>>>> ideas that the community already wants, especially if reviewers >>>>>>>>>>>> can be >>>>>>>>>>>> pre-identified. >>>>>>>>>>>> >>>>>>>>>>>> I looked into the Spark defaults issue you mentioned, >>>>>>>>>>>> specifically >>>>>>>>>>>> org/apache/iceberg/spark/sql/TestSparkDefaultValues.java, >>>>>>>>>>>> including testCreateTableWithDefaultsUnsupported() and >>>>>>>>>>>> testAlterTableAddColumnWithDefaultUnsupported(). >>>>>>>>>>>> From my initial analysis: >>>>>>>>>>>> >>>>>>>>>>>> - The *ALTER TABLE* path passes Spark’s validation stage >>>>>>>>>>>> and fails at the Iceberg layer. This seems addressable by >>>>>>>>>>>> converting the >>>>>>>>>>>> Spark literal into an Iceberg literal for the data types >>>>>>>>>>>> Iceberg supports. >>>>>>>>>>>> - The *CREATE TABLE* path fails earlier during Spark >>>>>>>>>>>> analysis. This appears to be due to Spark catalog capability >>>>>>>>>>>> checks, and >>>>>>>>>>>> declaring ACCEPT_ANY_SCHEMA for the Iceberg catalog should >>>>>>>>>>>> allow defaults >>>>>>>>>>>> to pass Spark validation, after which similar Spark to Iceberg >>>>>>>>>>>> literal >>>>>>>>>>>> handling can be applied during schema creation. >>>>>>>>>>>> >>>>>>>>>>>> These are rough conclusions from a first pass. I plan to take a >>>>>>>>>>>> deeper look at the end to end flow and implementation details to >>>>>>>>>>>> ensure the >>>>>>>>>>>> approach is correct and aligns well with Iceberg’s design. >>>>>>>>>>>> >>>>>>>>>>>> ASF’s GSoC 2026 ideas list mentions two common project sizes: >>>>>>>>>>>> 175 hours (medium) and 350 hours (large). From my understanding, >>>>>>>>>>>> this idea >>>>>>>>>>>> work could reasonably fit into the 175-hour category. >>>>>>>>>>>> >>>>>>>>>>>> I’d really appreciate your advice on what would be best: >>>>>>>>>>>> >>>>>>>>>>>> - Whether it makes sense to propose this Spark defaults >>>>>>>>>>>> work as a GSoC idea and get it vetted on the dev mailing list, >>>>>>>>>>>> or >>>>>>>>>>>> - Whether you’d recommend proposing a different idea for >>>>>>>>>>>> GSoC and doing this particular work independently before the >>>>>>>>>>>> coding period. >>>>>>>>>>>> >>>>>>>>>>>> [image: ltp|17706715353593601] >>>>>>>>>>>> >>>>>>>>>>>> Thanks a lot for taking the time. I really appreciate your >>>>>>>>>>>> guidance. >>>>>>>>>>>> Looking forward to your response. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Feb 9, 2026 at 10:54 PM Russell Spitzer < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I think we are always interested, but we tend to be stretched >>>>>>>>>>>>> thin on Reviewing resources at the moment. If you are interested >>>>>>>>>>>>> I would >>>>>>>>>>>>> try to find something that folks are already very interested in >>>>>>>>>>>>> and have >>>>>>>>>>>>> some reviewers pre-selected. >>>>>>>>>>>>> >>>>>>>>>>>>> At the moment we are very focused on finishing up the V4 spec >>>>>>>>>>>>> which is a pretty huge undertaking but probably isn't good for a >>>>>>>>>>>>> first >>>>>>>>>>>>> project. If you have time I think one rather contained project >>>>>>>>>>>>> could be >>>>>>>>>>>>> making Spark Defaults work when creating an Iceberg table or >>>>>>>>>>>>> using an Alter >>>>>>>>>>>>> table statement. Currently we just error out >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Feb 9, 2026 at 11:12 AM Varun Lakhyani < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello Russell, >>>>>>>>>>>>>> I am Varun Lakhyani, a final-year undergraduate student at >>>>>>>>>>>>>> IIT Roorkee, India. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've been actively understanding and contributing to Apache >>>>>>>>>>>>>> Iceberg for some time. So far, I have five merged PRs >>>>>>>>>>>>>> <https://github.com/apache/iceberg/commits/main/?author=varun-lakhyani>, >>>>>>>>>>>>>> one of which you reviewed, and one open PR involving a core >>>>>>>>>>>>>> API module change >>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/15252>. I have also >>>>>>>>>>>>>> started discussion on the Iceberg dev mailing list >>>>>>>>>>>>>> <https://lists.apache.org/thread/nmt8glsctsqrshx7fxc0ljtxp8h8jh6p> >>>>>>>>>>>>>> related >>>>>>>>>>>>>> to this open PR to get broader review and feedback. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am interested in participating in Google Summer of Code >>>>>>>>>>>>>> 2026 under the Apache Software Foundation working with Apache >>>>>>>>>>>>>> Iceberg. I >>>>>>>>>>>>>> noticed Iceberg isn't currently listed in GSoC ideas list >>>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/COMDEV/GSoC+2026+Ideas+list>. >>>>>>>>>>>>>> ASF documentation >>>>>>>>>>>>>> <https://community.apache.org/gsoc/#students-read-this> mentions >>>>>>>>>>>>>> that contributors can propose new ideas for existing Apache >>>>>>>>>>>>>> projects, >>>>>>>>>>>>>> provided those ideas are vetted on the project’s dev mailing >>>>>>>>>>>>>> list. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Given your experience with ASF projects and Apache Iceberg, I >>>>>>>>>>>>>> wanted to seek your guidance on this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Whether Iceberg would generally be open to GSoC >>>>>>>>>>>>>> participation if there is a well-scoped and project aligned >>>>>>>>>>>>>> idea. >>>>>>>>>>>>>> - Whether there are particular areas in Iceberg where a >>>>>>>>>>>>>> GSoC-sized project could realistically make sense and be >>>>>>>>>>>>>> useful to the >>>>>>>>>>>>>> community. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I’d really appreciate any direction or suggestions you may >>>>>>>>>>>>>> have. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>> Lakhyani Varun >>>>>>>>>>>>>> Indian Institute of Technology Roorkee >>>>>>>>>>>>>> Github <https://github.com/varun-lakhyani> | LinkedIn >>>>>>>>>>>>>> <https://www.linkedin.com/in/varun-lakhyani-154a35250/> | >>>>>>>>>>>>>> Codeforces <https://codeforces.com/profile/progskipper> | >>>>>>>>>>>>>> Codechef <https://www.codechef.com/users/v_k_18> >>>>>>>>>>>>>> Contact: +91 96246 46174 >>>>>>>>>>>>>> [image: ltp|17706565003431324] >>>>>>>>>>>>>> >>>>>>>>>>>>>
