dragosmg commented on a change in pull request #12433:
URL: https://github.com/apache/arrow/pull/12433#discussion_r816625715



##########
File path: r/R/dplyr-funcs-type.R
##########
@@ -76,6 +76,64 @@ register_bindings_type_cast <- function() {
   register_binding("as.numeric", function(x) {
     Expression$create("cast", x, options = cast_options(to_type = float64()))
   })
+  register_binding("as.Date", function(x,
+                                       format = NULL,
+                                       tryFormats = "%Y-%m-%d",
+                                       origin = "1970-01-01",
+                                       tz = "UTC") {
+
+    if (call_binding("is.Date", x)) {
+      # base::as.Date() first converts to the desired timezone and then 
extracts
+      # the date, which is why we need to go through timestamp() first
+      return(x)
+
+    # cast from POSIXct
+    } else if (call_binding("is.POSIXct", x)) {
+      if (tz == "UTC") {
+        interim_x <- build_expr("cast", x, options = cast_options(to_type = 
timestamp(timezone = tz)))
+      } else {
+        abort("`as.Date()` with a timezone different to 'UTC' is not supported 
in Arrow")
+      }
+
+    # cast from character
+    } else if (call_binding("is.character", x)) {
+      # this could be improved with tryFormats once strptime returns NA and we
+      # can use coalesce - https://issues.apache.org/jira/browse/ARROW-15659
+      # TODO revisit once https://issues.apache.org/jira/browse/ARROW-15659 is 
done
+      if (is.null(format)) {
+        if (length(tryFormats) == 1) {
+          format <- tryFormats[1]
+        } else {
+          abort("`as.Date()` with multiple `tryFormats` is not supported in 
Arrow yet")
+        }
+      }
+      # if x is not an expression (e.g. passed as filter), convert it to one
+      if (!inherits(x, "Expression")) {
+        x <- build_expr("cast", x, options = cast_options(to_type = type(x)))
+      }

Review comment:
       I introduced that step (converting a character `x` to `Expression`) as a 
way of dealing with the errors we were seeing with `filter()`. In 
`filter.arrow_dplyr_query()` we get the following error after tidy-eval-ing the 
filter expression:
   ```r
   > filters <- lapply(filts, arrow_eval, arrow_mask(.data))
   > filters
   [[1]]
   [1] "Expression arguments must be Expression objects"
   attr(,"class")
   [1] "try-error"
   attr(,"condition")
   <assertError: Expression arguments must be Expression objects>
   ``` 
   
   where `filts` is:
   ```r
   > filts
   <list_of<quosure>>
   
   [[1]]
   <quosure>
   expr: ^ts >= as.Date("2015-05-04")
   env:  0x297eaa768
   ``` 
   
   and the original test is:
   ```r
   # ds is an open dataset
   ds %>%
         filter(ts >= as.Date("2015-05-04")) %>%
         filter(part == 1) %>%
         select(ts) %>%
         collect()
   ```
   
   As far as I understand it, it fails due to `"2015-05-04"` being a character 
and not an `Expression` object. Inside the `filter. ...` method:
   ```r
   > b <- arrow_eval(filts[[1]], arrow_mask(.data))
   > b
   [1] "Expression arguments must be Expression objects"
   attr(,"class")
   [1] "try-error"
   attr(,"condition")
   <assertError: Expression arguments must be Expression objects>
   ```
   Hence my approach: if we somehow get to that point in the function body and, 
for some reason, character `x` is not yet an `Expression`, make it into one. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to