nealrichardson commented on a change in pull request #9566:
URL: https://github.com/apache/arrow/pull/9566#discussion_r582406620
##########
File path: r/tests/testthat/test-dataset.R
##########
@@ -194,6 +194,38 @@ test_that("Hive partitioning", {
)
})
+test_that("open_dataset fails when partitioning is passed as schema", {
Review comment:
I'd call this something about "input validation"
##########
File path: r/R/dataset.R
##########
@@ -64,6 +64,9 @@ open_dataset <- function(sources,
partitioning = hive_partition(),
unify_schemas = NULL,
...) {
+ if (isFALSE(inherits(schema, "Schema")) && isFALSE(is.null(schema))) {
+ stop("'schema' must be a Schema object", call. = FALSE)
+ }
Review comment:
I think the better place for this check is lower down, in
DatasetFactory$Finish. Add `assert_is(schema, "Schema")` before L30 in
dataset-factory.R
The reasoning is that since we're trying to prevent a segfault in the C++
code, we should put the validation as close to the place where we get into the
C++ code. That way, if someone were to be using the lower-level interface to
the C++ library, they're still protected from these cases.
##########
File path: r/tests/testthat/test-dataset.R
##########
@@ -194,6 +194,38 @@ test_that("Hive partitioning", {
)
})
+test_that("open_dataset fails when partitioning is passed as schema", {
+ expect_error(
+ open_dataset(hive_dir, hive_partition(other = utf8(), group = uint8()))
+ )
+
+ # skipping schema
+ ds <- open_dataset(hive_dir, , hive_partition(other = utf8(), group =
uint8()))
Review comment:
I don't think these next test cases are necessary, you're just testing
how R matches function arguments, and they would test without your change.
##########
File path: r/R/dataset.R
##########
@@ -64,6 +64,9 @@ open_dataset <- function(sources,
partitioning = hive_partition(),
unify_schemas = NULL,
...) {
+ if (isFALSE(inherits(schema, "Schema")) && isFALSE(is.null(schema))) {
+ stop("'schema' must be a Schema object", call. = FALSE)
+ }
Review comment:
By the same reasoning, we should `assert_is(schema, "Schema")` on L80 as
well because there is a different C++ entry point on L85.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]