jonkeane commented on a change in pull request #12707: URL: https://github.com/apache/arrow/pull/12707#discussion_r837685361
########## File path: r/R/dplyr-funcs-datetime.R ########## @@ -300,6 +314,34 @@ register_bindings_duration <- function() { build_expr("cast", x, options = cast_options(to_type = duration(unit = "s"))) }) + register_binding("decimal_date", function(date) { + y <- build_expr("year", date) + start <- call_binding("make_datetime", year = y, tz = "UTC") + sofar <- call_binding("difftime", date, start, units = "secs") + total <- call_binding( + "if_else", + build_expr("is_leap_year", date), + call_binding("as.integer64", 31622400L), # number of seconds in a leap year (366 days) + call_binding("as.integer64", 31536000L) # number of seconds in a regular year (365 days) Review comment: Could these be `Expression$scalar(31622400L)`? ########## File path: r/R/dplyr-funcs-datetime.R ########## @@ -300,6 +314,34 @@ register_bindings_duration <- function() { build_expr("cast", x, options = cast_options(to_type = duration(unit = "s"))) }) + register_binding("decimal_date", function(date) { + y <- build_expr("year", date) + start <- call_binding("make_datetime", year = y, tz = "UTC") + sofar <- call_binding("difftime", date, start, units = "secs") + total <- call_binding( + "if_else", + build_expr("is_leap_year", date), + call_binding("as.integer64", 31622400L), # number of seconds in a leap year (366 days) + call_binding("as.integer64", 31536000L) # number of seconds in a regular year (365 days) + ) + y + sofar$cast(int64()) / total + }) + register_binding("date_decimal", function(decimal, tz = "UTC") { + y <- build_expr("floor", decimal) + + start <- call_binding("make_datetime", year = y, tz = tz) + seconds <- call_binding( + "if_else", + build_expr("is_leap_year", start), + call_binding("as.integer64", 31622400L), # number of seconds in a leap year (366 days) + call_binding("as.integer64", 31536000L) # number of seconds in a regular year (365 days) + ) Review comment: I think this implementation is fine for now, but we might consider raising a jira for a C++ function that would return the number of seconds in a year. That seems like it might be useful elsewhere, and would let us catch some of the very very edge cases that might happen here (e.g. where we have leap seconds). ########## File path: r/R/dplyr-funcs-datetime.R ########## @@ -261,6 +261,20 @@ register_bindings_duration <- function() { time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp(timezone = "UTC"))) } + # if time1 or time2 are timestamps they cannot be expressed in "s" /seconds + # otherwise they cannot be added subtracted with durations + # TODO delete the casting to "us" once + # https://issues.apache.org/jira/browse/ARROW-16060 is solved + if (inherits(time1, "Expression") && + time1$type_id() %in% Type[c("TIMESTAMP")] && time1$type()$unit() != 2L) { + time1 <- build_expr("cast", time1, options = cast_options(to_type = timestamp("us"))) + } + + if (inherits(time2, "Expression") && + time2$type_id() %in% Type[c("TIMESTAMP")] && time2$type()$unit() != 2L) { + time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp("us"))) + } + Review comment: This is mostly to ensure that I'm reading this correctly: We no longer actually need these lines of code fro `date_decimal` and `decimal_date`, right? I know it's helpful for if someone has timestamps as seconds, so we can leave it in here, but want to confirm that's a separate thing now. ########## File path: r/R/dplyr-funcs-datetime.R ########## @@ -300,6 +314,34 @@ register_bindings_duration <- function() { build_expr("cast", x, options = cast_options(to_type = duration(unit = "s"))) }) + register_binding("decimal_date", function(date) { + y <- build_expr("year", date) + start <- call_binding("make_datetime", year = y, tz = "UTC") + sofar <- call_binding("difftime", date, start, units = "secs") + total <- call_binding( + "if_else", + build_expr("is_leap_year", date), + call_binding("as.integer64", 31622400L), # number of seconds in a leap year (366 days) + call_binding("as.integer64", 31536000L) # number of seconds in a regular year (365 days) Review comment: It would be nice to do a bit of benchmarking (even micro bench with {bench}) on these implementation details (honestly, even the old implementation of doing strptime more...) I would be curious if any of them are better or worse for performance (my gut says this is quicker than strptime a bunch, but I might be totally wrong about that!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org