[ https://issues.apache.org/jira/browse/ARROW-14471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488048#comment-17488048 ]
Dragoș Moldovan-Grünfeld edited comment on ARROW-14471 at 2/7/22, 3:08 PM: --------------------------------------------------------------------------- [~paleolimbot] I don't think we can rely on {{coalesce()}} to iterate through the various formats supported for {{{}ymd(){}}}. It would need to rely on the assumption that the passed {{format}} matches the data or otherwise fail. Sadly, arrow works with a wrong format resulting in weird timestamps: {code:r} suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(arrow)) suppressPackageStartupMessages(library(lubridate)) df <- tibble(x = c("09-01-01", "09-01-02", "09-01-03")) df #> # A tibble: 3 × 1 #> x #> <chr> #> 1 09-01-01 #> 2 09-01-02 #> 3 09-01-03 # lubridate::ymd() df %>% mutate(y = ymd(x)) #> # A tibble: 3 × 2 #> x y #> <chr> <date> #> 1 09-01-01 2009-01-01 #> 2 09-01-02 2009-01-02 #> 3 09-01-03 2009-01-03 # y = short year correct df %>% record_batch() %>% mutate(y = strptime(x, format = "%y-%m-%d", unit = "us")) %>% collect() #> # A tibble: 3 × 2 #> x y #> <chr> <dttm> #> 1 09-01-01 2009-01-01 00:00:00 #> 2 09-01-02 2009-01-02 00:00:00 #> 3 09-01-03 2009-01-03 00:00:00 # Y = long year this should fail in order for us to rely on coalesce df %>% record_batch() %>% mutate(y = strptime(x, format = "%Y-%m-%d", unit = "us")) %>% collect() #> # A tibble: 3 × 2 #> x y #> <chr> <dttm> #> 1 09-01-01 0008-12-31 23:58:45 #> 2 09-01-02 0009-01-01 23:58:45 #> 3 09-01-03 0009-01-02 23:58:45 {code} Therefore, my early (and somewhat naive) conclusion would be that we cannot implement {{arrow::ymd()}} binding as {{{}coalesce(strptime(x, format1), strptime(x, format2), ...){}}}. What do you think? was (Author: dragosmg): [~paleolimbot] I don't think we can rely on {{coalesce()}} to iterate through the various formats supported for {{ymd()}}. It would need to rely on the assumption that the passed {{format}} matches the data or otherwise fail. Sadly, arrow works with a wrong format resulting in weird timestamps: {code:r} suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(arrow)) suppressPackageStartupMessages(library(lubridate)) df <- tibble(x = c("09-01-01", "09-01-02", "09-01-03")) df #> # A tibble: 3 × 1 #> x #> <chr> #> 1 09-01-01 #> 2 09-01-02 #> 3 09-01-03 # lubridate::ymd() df %>% mutate(y = ymd(x)) #> # A tibble: 3 × 2 #> x y #> <chr> <date> #> 1 09-01-01 2009-01-01 #> 2 09-01-02 2009-01-02 #> 3 09-01-03 2009-01-03 # y = short year correct df %>% record_batch() %>% mutate(y = strptime(x, format = "%y-%m-%d", unit = "us")) %>% collect() #> # A tibble: 3 × 2 #> x y #> <chr> <dttm> #> 1 09-01-01 2009-01-01 00:00:00 #> 2 09-01-02 2009-01-02 00:00:00 #> 3 09-01-03 2009-01-03 00:00:00 # Y = long year this should fail in order for us to rely on coalesce df %>% record_batch() %>% mutate(y = strptime(x, format = "%Y-%m-%d", unit = "us")) %>% collect() #> # A tibble: 3 × 2 #> x y #> <chr> <dttm> #> 1 09-01-01 0008-12-31 23:58:45 #> 2 09-01-02 0009-01-01 23:58:45 #> 3 09-01-03 0009-01-02 23:58:45 {code} Therefore, my conclusion would be that we cannot implement {{arrow::ymd()}} binding as {{coalesce(strptime(x, format1), strptime(x, format2), ...)}}. What do you think? > [R] Implement lubridate's date/time parsing functions > ----------------------------------------------------- > > Key: ARROW-14471 > URL: https://issues.apache.org/jira/browse/ARROW-14471 > Project: Apache Arrow > Issue Type: Sub-task > Components: R > Reporter: Nicola Crane > Assignee: Dragoș Moldovan-Grünfeld > Priority: Major > Labels: pull-request-available > Fix For: 8.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Parse dates with year, month, and day components: > ymd() ydm() mdy() myd() dmy() dym() yq() ym() my() > > Parse date-times with year, month, and day, hour, minute, and second > components: > ymd_hms() ymd_hm() ymd_h() dmy_hms() dmy_hm() dmy_h() mdy_hms() mdy_hm() > mdy_h() ydm_hms() ydm_hm() ydm_h() > Parse periods with hour, minute, and second components: > ms() hm() hms() > -- This message was sent by Atlassian Jira (v8.20.1#820001)