jonkeane commented on a change in pull request #12277:
URL: https://github.com/apache/arrow/pull/12277#discussion_r794636116



##########
File path: r/R/dataset-format.R
##########
@@ -133,10 +133,36 @@ CsvFileFormat$create <- function(...,
   schema_names <- names(schema)
 
   if (!is.null(schema) & !identical(schema_names, column_names)) {
+    missing_from_schema <- setdiff(column_names, schema_names)
+    missing_from_colnames <- setdiff(schema_names, column_names)
+    message_colnames <- NULL
+    message_schema <- NULL
+    message_order <- NULL
+
+    if (length(missing_from_colnames) > 0) {
+      message_colnames <- paste(
+        oxford_paste(missing_from_colnames, quote_symbol = "`"),
+        "not present in `column_names`"
+      )
+    }

Review comment:
       We don't need to do this as part of this PR, but I've seen this pattern 
a few times now:
   
   ```
   missing_from <- setdiff(set_a, set_b)
   if (length(missing_rom) > 0) {
     # construct a message
     # sometimes also abort()
   }
   ```
   
   Maybe the message bits are too unique and there's a bit too many types of 
them that we couldn't do something like a function that `check_match(x = set_a, 
y = set_b, x_name = "column_names", y_name = "schema")` that would produce 
messages like "X, Y, and Z not present in `column_names`".
   
   If you think that's feasible, would you mind making a jira linking this code 
to it as an improvement we could make?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to