nealrichardson commented on a change in pull request #10269: URL: https://github.com/apache/arrow/pull/10269#discussion_r632571798
########## File path: r/R/record-batch.R ########## @@ -161,7 +161,18 @@ RecordBatch$create <- function(..., schema = NULL) { out <- RecordBatch__from_arrays(schema, arrays) return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]]))) } - + + # If any arrays are length 1, recycle them + arr_lens <- map_int(arrays, length) Review comment: FWIW there is a base R function `lengths()` that does this (though I don't recall what version it was added in) ########## File path: r/R/record-batch.R ########## @@ -161,7 +161,18 @@ RecordBatch$create <- function(..., schema = NULL) { out <- RecordBatch__from_arrays(schema, arrays) return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]]))) } - + + # If any arrays are length 1, recycle them + arr_lens <- map_int(arrays, length) + if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)) { + max_array_len <- max(arr_lens) + arrays <- modify2( + arrays, + arr_lens == 1, + ~if (.y) repeat_value_as_array(.x, max_array_len) else .x + ) Review comment: Call me 👴 but I personally find ``` arrays[arr_lens == 1] <- lapply(arrays[arr_lens == 1], repeat_value_as_array, max_array_len) ``` easier to read than `modify2(...)`. ########## File path: r/R/scalar.R ########## @@ -33,7 +33,7 @@ Scalar <- R6Class("Scalar", public = list( ToString = function() Scalar__ToString(self), as_vector = function() Scalar__as_vector(self), - as_array = function() MakeArrayFromScalar(self), + as_array = function() MakeArrayFromScalar(self, 1L), Review comment: If you pass this through, maybe you don't need `repeat_value_as_array` ```suggestion as_array = function(length = 1L) MakeArrayFromScalar(self, as.integer(length)), ``` ########## File path: r/tests/testthat/test-RecordBatch.R ########## @@ -415,14 +414,42 @@ test_that("record_batch() handles null type (ARROW-7064)", { expect_equivalent(batch$schema, schema(a = int32(), n = null())) }) -test_that("record_batch() scalar recycling", { - skip("Not implemented (ARROW-11705)") +test_that("record_batch() scalar recycling with vectors", { expect_data_frame( record_batch(a = 1:10, b = 5), tibble::tibble(a = 1:10, b = 5) ) }) +test_that("record_batch() scalar recycling with Scalars, Arrays, and ChunkedArrays", { + + expect_data_frame( + record_batch(a = Array$create(1:10), b = Scalar$create(5)), + tibble::tibble(a = 1:10, b = 5) + ) + + expect_data_frame( + record_batch(a = Array$create(1:10), b = Array$create(5)), + tibble::tibble(a = 1:10, b = 5) + ) + + expect_data_frame( + record_batch(a = Array$create(1:10), b = ChunkedArray$create(5)), + tibble::tibble(a = 1:10, b = 5) + ) + +}) + +test_that("record_batch() no recycling with tibbles", { Review comment: Why does this error, and should it? ########## File path: r/R/util.R ########## @@ -110,3 +110,21 @@ handle_embedded_nul_error <- function(e) { } stop(e) } + +#' Take an object of length 1 and repeat it. +#' +#' @param object Object to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray` +#' @param n Number of repetitions +#' +#' @return `Array` of length `n` +#' +#' @keywords internal +repeat_value_as_array <- function(object, n) { + if (length(object) != 1) { Review comment: If you're checking length again here (unnecessary IMO since you're only calling this in the case where you've validated length == 1 already), you could simplify your `modify2()` wrapper and just `map` over all `arrays`, and in here only do the recycling if length == 1. ########## File path: r/R/record-batch.R ########## @@ -161,7 +161,18 @@ RecordBatch$create <- function(..., schema = NULL) { out <- RecordBatch__from_arrays(schema, arrays) return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]]))) } - + + # If any arrays are length 1, recycle them Review comment: This code block seems to be repeated in both the Table and RecordBatch code, so it might be better factored out as a helper function. I also wonder whether the logic would actually be simpler in C++ because you could do it at a later point where you know exactly that you have a vector of Arrays and don't have to worry about whether it is an R vector, a Scalar, an Array, etc. See the `check_consistent_array_size` function for example in `r/src`--you could drop in around there and instead of erroring if you don't have consistent lengths, handle the recycling case. (Also ok to (1) ignore this suggestion or (2) defer to a followup) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org