jonkeane commented on a change in pull request #10888:
URL: https://github.com/apache/arrow/pull/10888#discussion_r697660485
##########
File path: r/R/dataset-scan.R
##########
@@ -85,9 +86,30 @@ Scanner$create <- function(dataset,
# To handle mutate() on Table/RecordBatch, we need to
collect(as_data_frame=FALSE) now
dataset <- dplyr::collect(dataset, as_data_frame = FALSE)
}
+
+ proj <- c(dataset$selected_columns, dataset$temp_columns)
+
+ if (!is.null(projection)) {
+ if (is.character(projection)) {
+ proj <- proj[projection]
+ } else if (is_list_of(projection, "Expression")) {
+ # TODO: need to check and see if there are any Expressions that are
simply
+ # field refs in projections, but are richer expressions in proj?
+ proj <- projection
Review comment:
This TODO are for cases like the following:
```
ds %>%
filter(int > 7) %>%
select(int, dbl, lgl) %>%
mutate(int_plus = int + 1) %>%
Scanner$create(projection = list(
int = Expression$field_ref("int"),
int_plus = Expression$field_ref("int_plus"),
dbl_minus = Expression$create(
"subtract_checked",
Expression$field_ref("dbl"),
Expression$scalar(1)
)
```
Selecting `int` is fine with `Expression$field_ref()`, but the variable
previously added with `mutate()` (`int_plus`) is not a field to reference at
this point. We could munge the list of expressions to extract the expression
creating `int_plus` from `dataset$temp_columns` and munge the `projection` list
to include that though that feels hacky / opening up a 🗑️ of 🐍 s.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]