AjayBoddeda4 commented on issue #297:
URL: https://github.com/apache/wayang/issues/297#issuecomment-4131212704
Hi, I am Ajay Boddeda, a GSoC 2026 applicant working on the DataFrames API
proposal for Apache Wayang.
This TODO is directly relevant to my proposal. The fields parameter in
JavaCSVTableSource represents column projection — which is exactly what the
select() operation in the DataFrame API needs to implement. Currently all
columns are read even when only a subset is needed, which is inefficient.
When the DataFrame API executes a df.select('name', 'age') operation, it
should push the field projection down to the source level so that only the
required columns are read from the CSV file. Incorporating the ImmutableIntList
fields properly would enable this projection pushdown optimization.
This is a key performance feature for the DataFrame API and I would love to
investigate this further as part of my GSoC project.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]