nealrichardson commented on a change in pull request #10888: URL: https://github.com/apache/arrow/pull/10888#discussion_r697434894
########## File path: r/R/dplyr-filter.R ########## @@ -71,8 +71,14 @@ filter.Dataset <- filter.ArrowTabular <- filter.arrow_dplyr_query set_filters <- function(.data, expressions) { if (length(expressions)) { - # expressions is a list of Expressions. AND them together and set them on .data - new_filter <- Reduce("&", expressions) + if (is.list(expressions)) { + # expressions is a list of Expressions. AND them together and set them on .data + new_filter <- Reduce("&", expressions) + } else { + # expressions is an expression already Review comment: Should we assert that it is an expression? ########## File path: r/R/dataset-scan.R ########## @@ -85,9 +85,26 @@ Scanner$create <- function(dataset, # To handle mutate() on Table/RecordBatch, we need to collect(as_data_frame=FALSE) now dataset <- dplyr::collect(dataset, as_data_frame = FALSE) } + + proj <- c(dataset$selected_columns, dataset$temp_columns) + if (!is.null(projection)) { + # grab all of the projection elements that are characters to select columns + # TODO: should we check names and make sure we only project once per? Review comment: why? ########## File path: r/R/dataset-scan.R ########## @@ -85,9 +85,26 @@ Scanner$create <- function(dataset, # To handle mutate() on Table/RecordBatch, we need to collect(as_data_frame=FALSE) now dataset <- dplyr::collect(dataset, as_data_frame = FALSE) } + + proj <- c(dataset$selected_columns, dataset$temp_columns) + if (!is.null(projection)) { Review comment: Can you update the docstrings above to explain what the accepted inputs and behaviors are? (I think it's not 100% up to date before this change too, but I want to be clear about what the expected inputs are (and I'm curious whether that will sound sensible when you write it out).) ########## File path: r/tests/testthat/test-dataset.R ########## @@ -1071,6 +1071,60 @@ test_that("Scanner$ToRecordBatchReader()", { ) }) +test_that("Scanner$create() filter/projection pushdown", { + ds <- open_dataset(dataset_dir, partitioning = "part") + + # the standard to compare all Scanner$create()s against + scan_one <- ds %>% + filter(int > 7 & dbl < 57) %>% + select(int, dbl, lgl) %>% + mutate(int_plus = int + 1, dbl_minus = dbl - 1) %>% + Scanner$create() + + # add a column in projection + scan_two <- ds %>% + filter(int > 7 & dbl < 57) %>% + select(int, dbl, lgl) %>% + mutate(int_plus = int + 1) %>% + Scanner$create(projection = list( + "int", "dbl", "lgl", "int_plus", Review comment: A combination of bare strings and named expressions? That seems odd. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org