Marc Garcia created ARROW-17766: ----------------------------------- Summary: Add option to only load specific columns from csv Key: ARROW-17766 URL: https://issues.apache.org/jira/browse/ARROW-17766 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Marc Garcia
I may be missing something, but after checking in detail the documentation of pyarrow, I can't find an option equivalent to `pandas.read_csv(..., usecols=['col_to_load_1', 'col_to_load_2'])`. This would be useful for example when loading CSV files with lots of columns where only few are needed. Having to load all the information in the CSV seems like a significant waste of resources. I guess a parameter `load_columns` could be added to `pyarrow.csv.ReadOptions`. -- This message was sent by Atlassian Jira (v8.20.10#820010)