[GitHub] [arrow] paleolimbot commented on a change in pull request #12152: ARROW-15123: [R] CSV dataset file header read in as data

GitBox Fri, 14 Jan 2022 05:04:02 -0800


paleolimbot commented on a change in pull request #12152:
URL: https://github.com/apache/arrow/pull/12152#discussion_r784824022




##########
File path: r/R/util.R
##########
@@ -209,5 +209,5 @@ handle_csv_read_error <- function(e, schema) {
     ))
   }
 
-  abort(e)
+  stop(e)

Review comment:
       Is there a reason for using `stop()` rather than `abort()`? Most calls 
to `stop()` that I've seen in arrow have `stop("msg", call. = FALSE)`, is that 
worth doing here?

##########
File path: r/tests/testthat/test-dataset-csv.R
##########
@@ -280,13 +280,48 @@ test_that("Error if no format specified and files are not 
parquet", {
   )
 })
 
-test_that("Column names inferred from schema for headerless CSVs 
(ARROW-14063)", {
-  headerless_csv_dir <- make_temp_dir()
+test_that("Column names can be inferred from schema", {
+
   tbl <- df1[, c("int", "dbl")]
+
+  # Data containing a header row
+  header_csv_dir <- make_temp_dir()
+  write.table(tbl, file.path(header_csv_dir, "file1.csv"), sep = ",", 
row.names = FALSE)
+
+  # First row must be skipped if file has header
+  ds <- open_dataset(
+    header_csv_dir,
+    format = "csv",
+    schema = schema(int = int32(), dbl = float64()),
+    skip_rows = 1
+  )
+  expect_equal(collect(ds), tbl)
+
+  # If first row isn't skipped, supply user-friendly error
+  ds <- open_dataset(
+    header_csv_dir,
+    format = "csv",
+    schema = schema(int = int32(), dbl = float64())
+  )
+
+  expect_error(
+    collect(ds),
+    regexp = paste0("If you have supplied a schema and your data contains a ",
+                    "header row, you should supply the argument `skip = 1` to 
",
+                    "prevent the header being read in as data.")

Review comment:
       I believe the style guide would have this be
   
   ``` r
   paste0(
     "thing1 ",
     "thing2"
     ...
   )
   ```
   
   (...but I don't feel strongly about it at all!)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] paleolimbot commented on a change in pull request #12152: ARROW-15123: [R] CSV dataset file header read in as data

Reply via email to