[ https://issues.apache.org/jira/browse/ARROW-14471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440722#comment-17440722 ]
Dewey Dunnington commented on ARROW-14471: ------------------------------------------ I did a bit of looking into this...lubridate uses a [custom C parser for its order-based datetime parsers|https://github.com/tidyverse/lubridate/blob/main/src/tparse.c#L46-L391]. That said, its functionality can be approximated by {{{}coalesce(strptime(dt_string, "format1"), strptime(dt_string, "format2"), ...){}}}. Is it worth translating the functions with an approximation that handles most of the use cases? Some testing that might be useful when putting together a PR: {code:r} library(arrow, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) test_dates <- tibble::tibble( string_ymd = c("2021-09-10", "2021/09/10", "20210910", "2021 Sep 10", "2021 September 10", NA), string_dmy = c("10-09-2021", "10/09/2021", "10092021", "10 Sep 2021", "10 September 2021", NA), string_mdy = c("09-10-2021", "09/10/2021", "09102021", "Sep 10 2021", "September 10 2021", NA), date = c(rep(as.Date("2021-09-10"), 5), NA), date_midnight = c(rep(as.POSIXct("2021-09-10 00:00:00", tz = "UTC"), 5), NA) ) # these get dropped by as.POSIXct if the system tz is UTC? attr(test_dates$date_midnight, "tzone") <- "UTC" test_datetimes <- tibble::tibble( string_ymd_hms = stringr::str_c(test_dates$string_ymd, "01:23:45"), string_dmy_hms = stringr::str_c(test_dates$string_dmy, "01:23:45"), string_mdy_hms = stringr::str_c(test_dates$string_mdy, "01:23:45"), string_ymd_hm = stringr::str_c(test_dates$string_ymd, "01:23"), string_dmy_hm = stringr::str_c(test_dates$string_dmy, "01:23"), string_mdy_hm = stringr::str_c(test_dates$string_mdy, "01:23"), string_ymd_h = stringr::str_c(test_dates$string_ymd, "01"), string_dmy_h = stringr::str_c(test_dates$string_dmy, "01"), string_mdy_h = stringr::str_c(test_dates$string_mdy, "01"), date_second = c(rep(as.POSIXct("2021-09-10 01:23:45", tz = "UTC"), 5), NA), date_minute = c(rep(as.POSIXct("2021-09-10 01:23", tz = "UTC"), 5), NA), date_hour = c(rep(as.POSIXct("2021-09-10", tz = "UTC") + 60 * 60, 5), NA) ) # these get dropped by as.POSIXct if the system tz is UTC? attr(test_datetimes$date_second, "tzone") <- "UTC" attr(test_datetimes$date_minute, "tzone") <- "UTC" attr(test_datetimes$date_hour, "tzone") <- "UTC" # tests with lubridate, R eval library(testthat, warn.conflicts = FALSE) library(lubridate, warn.conflicts = FALSE) expect_identical(ymd(test_dates$string_ymd), test_dates$date) expect_identical(dmy(test_dates$string_dmy), test_dates$date) expect_identical(mdy(test_dates$string_mdy), test_dates$date) expect_identical(ymd(test_dates$string_ymd, tz = "UTC"), test_dates$date_midnight) expect_identical(dmy(test_dates$string_dmy, tz = "UTC"), test_dates$date_midnight) expect_identical(mdy(test_dates$string_mdy, tz = "UTC"), test_dates$date_midnight) expect_identical( ymd_hms(test_datetimes$string_ymd_hms, tz = "UTC"), test_datetimes$date_second ) expect_identical( dmy_hms(test_datetimes$string_dmy_hms, tz = "UTC"), test_datetimes$date_second ) expect_identical( mdy_hms(test_datetimes$string_mdy_hms, tz = "UTC"), test_datetimes$date_second ) expect_identical( ymd_hm(test_datetimes$string_ymd_hm, tz = "UTC"), test_datetimes$date_minute ) expect_identical( dmy_hm(test_datetimes$string_dmy_hm, tz = "UTC"), test_datetimes$date_minute ) expect_identical( mdy_hm(test_datetimes$string_mdy_hm, tz = "UTC"), test_datetimes$date_minute ) expect_identical( ymd_h(test_datetimes$string_ymd_h, tz = "UTC"), test_datetimes$date_hour ) expect_identical( dmy_h(test_datetimes$string_dmy_h, tz = "UTC"), test_datetimes$date_hour ) expect_identical( mdy_h(test_datetimes$string_mdy_h, tz = "UTC"), test_datetimes$date_hour ) {code} > [R] Implement lubridate's date/time parsing functions > ----------------------------------------------------- > > Key: ARROW-14471 > URL: https://issues.apache.org/jira/browse/ARROW-14471 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Nicola Crane > Assignee: Dewey Dunnington > Priority: Major > Fix For: 7.0.0 > > > Parse dates with year, month, and day components: > ymd() ydm() mdy() myd() dmy() dym() yq() ym() my() > > Parse date-times with year, month, and day, hour, minute, and second > components: > ymd_hms() ymd_hm() ymd_h() dmy_hms() dmy_hm() dmy_h() mdy_hms() mdy_hm() > mdy_h() ydm_hms() ydm_hm() ydm_h() > Parse periods with hour, minute, and second components: > ms() hm() hms() > -- This message was sent by Atlassian Jira (v8.20.1#820001)