jonkeane commented on a change in pull request #9972:
URL: https://github.com/apache/arrow/pull/9972#discussion_r612498334



##########
File path: r/tests/testthat/test-dataset.R
##########
@@ -1778,3 +1778,60 @@ test_that("Collecting zero columns from a dataset 
doesn't return entire dataset"
     c(32, 0)
   )
 })
+
+# see https://issues.apache.org/jira/browse/ARROW-12315
+test_that("Max partitions fails with non-integer values and less than required 
partitions values", {
+  skip_if_not_available("parquet")
+  tmp <- tempfile()
+
+  # this example needs 3 partitions
+
+  # max_partitions = chr => error
+  expect_error(
+    mtcars %>%
+      group_by(cyl) %>%
+      write_dataset(tmp, format = "parquet", max_partitions = "foobar")
+  )

Review comment:
       We should assert what each of these errors contain. We don't need to do 
the full thing, but let's make sure that they are erroring with something 
useful about partitions

##########
File path: r/tests/testthat/test-dataset.R
##########
@@ -1778,3 +1778,60 @@ test_that("Collecting zero columns from a dataset 
doesn't return entire dataset"
     c(32, 0)
   )
 })
+
+# see https://issues.apache.org/jira/browse/ARROW-12315
+test_that("Max partitions fails with non-integer values and less than required 
partitions values", {
+  skip_if_not_available("parquet")
+  tmp <- tempfile()
+
+  # this example needs 3 partitions
+
+  # max_partitions = chr => error
+  expect_error(
+    mtcars %>%
+      group_by(cyl) %>%
+      write_dataset(tmp, format = "parquet", max_partitions = "foobar")
+  )
+
+  # max_partitions < 3 => error
+  expect_error(
+    mtcars %>%
+      group_by(cyl) %>%
+      write_dataset(tmp, format = "parquet", max_partitions = -3)
+  )
+
+  # max_partitions < 3 => error
+  expect_error(
+    mtcars %>%
+      group_by(cyl) %>%
+      write_dataset(tmp, format = "parquet", max_partitions = 1)
+  )

Review comment:
       We especially want to make sure that this error is clear + actionable

##########
File path: r/R/dataset-write.R
##########
@@ -60,8 +62,13 @@ write_dataset <- function(dataset,
                           format = c("parquet", "feather", "arrow", "ipc"),
                           partitioning = dplyr::group_vars(dataset),
                           basename_template = paste0("part-{i}.", 
as.character(format)),
-                          hive_style = TRUE,
+                          hive_style = TRUE, max_partitions = 1024L,

Review comment:
       Minor: in the .R code, we should follow the style here with each 
argument on a new line.

##########
File path: r/R/dataset-write.R
##########
@@ -60,8 +62,13 @@ write_dataset <- function(dataset,
                           format = c("parquet", "feather", "arrow", "ipc"),
                           partitioning = dplyr::group_vars(dataset),
                           basename_template = paste0("part-{i}.", 
as.character(format)),
-                          hive_style = TRUE,
+                          hive_style = TRUE, max_partitions = 1024L,
                           ...) {
+  stopifnot(
+    max_partitions == round(max_partitions, 0),
+    max_partitions == abs(max_partitions),
+    !is.null(max_partitions)
+  )

Review comment:
       Have you tried to leave this checking off and seen what errors the c++ 
code returns? If those errors are reasonable, we should use them instead of 
writing our own here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to