paleolimbot commented on a change in pull request #12152:
URL: https://github.com/apache/arrow/pull/12152#discussion_r784824022
##########
File path: r/R/util.R
##########
@@ -209,5 +209,5 @@ handle_csv_read_error <- function(e, schema) {
))
}
- abort(e)
+ stop(e)
Review comment:
Is there a reason for using `stop()` rather than `abort()`? Most calls
to `stop()` that I've seen in arrow have `stop("msg", call. = FALSE)`, is that
worth doing here?
##########
File path: r/tests/testthat/test-dataset-csv.R
##########
@@ -280,13 +280,48 @@ test_that("Error if no format specified and files are not
parquet", {
)
})
-test_that("Column names inferred from schema for headerless CSVs
(ARROW-14063)", {
- headerless_csv_dir <- make_temp_dir()
+test_that("Column names can be inferred from schema", {
+
tbl <- df1[, c("int", "dbl")]
+
+ # Data containing a header row
+ header_csv_dir <- make_temp_dir()
+ write.table(tbl, file.path(header_csv_dir, "file1.csv"), sep = ",",
row.names = FALSE)
+
+ # First row must be skipped if file has header
+ ds <- open_dataset(
+ header_csv_dir,
+ format = "csv",
+ schema = schema(int = int32(), dbl = float64()),
+ skip_rows = 1
+ )
+ expect_equal(collect(ds), tbl)
+
+ # If first row isn't skipped, supply user-friendly error
+ ds <- open_dataset(
+ header_csv_dir,
+ format = "csv",
+ schema = schema(int = int32(), dbl = float64())
+ )
+
+ expect_error(
+ collect(ds),
+ regexp = paste0("If you have supplied a schema and your data contains a ",
+ "header row, you should supply the argument `skip = 1` to
",
+ "prevent the header being read in as data.")
Review comment:
I believe the style guide would have this be
``` r
paste0(
"thing1 ",
"thing2"
...
)
```
(...but I don't feel strongly about it at all!)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]