jonkeane commented on a change in pull request #9972: URL: https://github.com/apache/arrow/pull/9972#discussion_r612498334
########## File path: r/tests/testthat/test-dataset.R ########## @@ -1778,3 +1778,60 @@ test_that("Collecting zero columns from a dataset doesn't return entire dataset" c(32, 0) ) }) + +# see https://issues.apache.org/jira/browse/ARROW-12315 +test_that("Max partitions fails with non-integer values and less than required partitions values", { + skip_if_not_available("parquet") + tmp <- tempfile() + + # this example needs 3 partitions + + # max_partitions = chr => error + expect_error( + mtcars %>% + group_by(cyl) %>% + write_dataset(tmp, format = "parquet", max_partitions = "foobar") + ) Review comment: We should assert what each of these errors contain. We don't need to do the full thing, but let's make sure that they are erroring with something useful about partitions ########## File path: r/tests/testthat/test-dataset.R ########## @@ -1778,3 +1778,60 @@ test_that("Collecting zero columns from a dataset doesn't return entire dataset" c(32, 0) ) }) + +# see https://issues.apache.org/jira/browse/ARROW-12315 +test_that("Max partitions fails with non-integer values and less than required partitions values", { + skip_if_not_available("parquet") + tmp <- tempfile() + + # this example needs 3 partitions + + # max_partitions = chr => error + expect_error( + mtcars %>% + group_by(cyl) %>% + write_dataset(tmp, format = "parquet", max_partitions = "foobar") + ) + + # max_partitions < 3 => error + expect_error( + mtcars %>% + group_by(cyl) %>% + write_dataset(tmp, format = "parquet", max_partitions = -3) + ) + + # max_partitions < 3 => error + expect_error( + mtcars %>% + group_by(cyl) %>% + write_dataset(tmp, format = "parquet", max_partitions = 1) + ) Review comment: We especially want to make sure that this error is clear + actionable ########## File path: r/R/dataset-write.R ########## @@ -60,8 +62,13 @@ write_dataset <- function(dataset, format = c("parquet", "feather", "arrow", "ipc"), partitioning = dplyr::group_vars(dataset), basename_template = paste0("part-{i}.", as.character(format)), - hive_style = TRUE, + hive_style = TRUE, max_partitions = 1024L, Review comment: Minor: in the .R code, we should follow the style here with each argument on a new line. ########## File path: r/R/dataset-write.R ########## @@ -60,8 +62,13 @@ write_dataset <- function(dataset, format = c("parquet", "feather", "arrow", "ipc"), partitioning = dplyr::group_vars(dataset), basename_template = paste0("part-{i}.", as.character(format)), - hive_style = TRUE, + hive_style = TRUE, max_partitions = 1024L, ...) { + stopifnot( + max_partitions == round(max_partitions, 0), + max_partitions == abs(max_partitions), + !is.null(max_partitions) + ) Review comment: Have you tried to leave this checking off and seen what errors the c++ code returns? If those errors are reasonable, we should use them instead of writing our own here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org