AjayBoddeda4 commented on issue #297:
URL: https://github.com/apache/wayang/issues/297#issuecomment-4131212704

   Hi, I am Ajay Boddeda, a GSoC 2026 applicant working on the DataFrames API 
proposal for Apache Wayang.
   This TODO is directly relevant to my proposal. The fields parameter in 
JavaCSVTableSource represents column projection — which is exactly what the 
select() operation in the DataFrame API needs to implement. Currently all 
columns are read even when only a subset is needed, which is inefficient.
   When the DataFrame API executes a df.select('name', 'age') operation, it 
should push the field projection down to the source level so that only the 
required columns are read from the CSV file. Incorporating the ImmutableIntList 
fields properly would enable this projection pushdown optimization.
   This is a key performance feature for the DataFrame API and I would love to 
investigate this further as part of my GSoC project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to