Anthony Abate created ARROW-6830:
------------------------------------
Summary: Question / Feature Request- Select Subset of Columns in
read_arrow
Key: ARROW-6830
URL: https://issues.apache.org/jira/browse/ARROW-6830
Project: Apache Arrow
Issue Type: New Feature
Components: C++, R
Reporter: Anthony Abate
*Note:* Not sure if this is a limitation of the R library or the underlying
C++ code:
I have a ~30 gig arrow file with almost 1000 columns - it has 12,000 record
batches of varying row sizes
1. Is it possible at to use *read_arrow* to filter out columns? (similar to
how *read_feather* has a (col_select =... )
2. Or is it possible using *RecordBatchFileReader* to filter columns?
The only thing I seem to be able to do (please confirm if this is my only
option) is loop over all record batches, select a single column at a time, and
construct the data I need to pull out manually. ie like the following:
data_rbfr <- arrow::RecordBatchFileReader("arrowfile")
FOREACH BATCH:
batch <- data_rbfr$get_batch(i)
col4 <- batch$column(4)
col5 <- batch$column(7)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)