jonkeane commented on code in PR #12589: URL: https://github.com/apache/arrow/pull/12589#discussion_r864099536
########## r/R/dplyr-funcs-datetime.R: ########## @@ -357,6 +357,49 @@ register_bindings_duration <- function() { delta <- delta$cast(int64()) start + delta$cast(duration("s")) }) + + register_binding("parse_date_time", function(x, + orders, + tz = "UTC") { + + supported_orders <- c("ymd", "ydm", "mdy", "myd", "dmy", "dym") + unsupported_passed_orders <- setdiff(orders, supported_orders) + + if (length(unsupported_passed_orders) > 0) { + arrow_not_supported( + paste0( + oxford_paste( + unsupported_passed_orders + ), + " `orders`" + ) + ) + } + + # make all separators (non-letters and non-numbers) into "-" + x <- call_binding("gsub", "[^A-Za-z0-9]", "-", x) + # collapse multiple separators into a single one + x <- call_binding("gsub", "-{2,}", "-", x) + + # TODO figure out how to parse strings that have no separators + # https://issues.apache.org/jira/browse/ARROW-16446 + # we could insert separators at the "likely" positions, but it might be + # tricky given the possible combinations between dmy formats + locale Review Comment: One thing we could do here is to construct the formats without a separator, though that has other problems like erroring on ambiguous inputs (which happens in lubridate too...) and might be quite a bit slower if we're parsing through all of them for each row. We should benchmark + explore this when we come back to it. Is there something we should do to call attention to this gap? Add to NEWS? or document it somewhere else? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org