[
https://issues.apache.org/jira/browse/ARROW-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560679#comment-17560679
]
Jonathan Keane commented on ARROW-15805:
This is alluded to in the PR comments, but taking a step back and thinking
about the behavior:
{code}
dates_dash_first <- c("2022-01-01", "2022/02/02", "2022/02/02", "2022/02/02",
"2022-01-01", "2022-01-01")
dates_slash_first <- c("2022/02/02", "2022-01-01", "2022/02/02", "2022/02/02",
"2022-01-01", "2022-01-01")
as.Date(dates_dash_first, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"))
#> [1] "2022-01-01" NA NA NA "2022-01-01"
#> [6] "2022-01-01"
as.Date(dates_slash_first, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"))
#> [1] "2022-02-02" NA "2022-02-02" "2022-02-02" NA
#> [6] NA
{code}
Which format is chosen and used is dependent on the underlying data, and
critically the order that data is in. Given that we can't always guaranty the
order of the data we are processing[1] we should not attempt to implement this
behavior right now. Instead, we should have an error message if someone tries
to specify {{tryFormats}} suggesting that they might use {{lubridate::
as_date()}} if they want to specify multiple formats (and can accept that you
don't get NAs for all formats other than the first that matches), or they
should pick which format they want to use and use that.
[1] and even if we could, it would take some tricky expression writing to pick
the right format
> [R] Update the as.Date() binding
>
>
> Key: ARROW-15805
> URL: https://issues.apache.org/jira/browse/ARROW-15805
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
>Reporter: Dragoș Moldovan-Grünfeld
>Priority: Major
> Fix For: 9.0.0
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)