[jira] [Commented] (ARROW-15805) [R] Update the as.Date() binding

2022-07-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562111#comment-17562111
 ] 

Dragoș Moldovan-Grünfeld commented on ARROW-15805:
--

I think a message advising users to try the dedicated date/time parsing 
functionality might be better than pointing them to 
{{{}lubridate::as_date(){}}}. I have implemented that in 
[https://github.com/apache/arrow/pull/13070.] Happy to change it.

> [R] Update the as.Date() binding
> 
>
> Key: ARROW-15805
> URL: https://issues.apache.org/jira/browse/ARROW-15805
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Dragoș Moldovan-Grünfeld
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15805) [R] Update the as.Date() binding

2022-06-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560978#comment-17560978
 ] 

Dragoș Moldovan-Grünfeld commented on ARROW-15805:
--

I totally agree. 

> [R] Update the as.Date() binding
> 
>
> Key: ARROW-15805
> URL: https://issues.apache.org/jira/browse/ARROW-15805
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Dragoș Moldovan-Grünfeld
>Priority: Major
> Fix For: 9.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15805) [R] Update the as.Date() binding

2022-06-29 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560679#comment-17560679
 ] 

Jonathan Keane commented on ARROW-15805:


This is alluded to in the PR comments, but taking a step back and thinking 
about the behavior:

{code}
dates_dash_first <- c("2022-01-01", "2022/02/02", "2022/02/02", "2022/02/02", 
"2022-01-01", "2022-01-01")
dates_slash_first <- c("2022/02/02", "2022-01-01", "2022/02/02", "2022/02/02", 
"2022-01-01", "2022-01-01")

as.Date(dates_dash_first, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"))
#> [1] "2022-01-01" NA   NA   NA   "2022-01-01"
#> [6] "2022-01-01"

as.Date(dates_slash_first, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"))
#> [1] "2022-02-02" NA   "2022-02-02" "2022-02-02" NA  
#> [6] NA
{code}

Which format is chosen and used is dependent on the underlying data, and 
critically the order that data is in. Given that we can't always guaranty the 
order of the data we are processing[1] we should not attempt to implement this 
behavior right now. Instead, we should have an error message if someone tries 
to specify {{tryFormats}} suggesting that they might use {{lubridate:: 
as_date()}} if they want to specify multiple formats (and can accept that you 
don't get NAs for all formats other than the first that matches), or they 
should pick which format they want to use and use that.


[1] and even if we could, it would take some tricky expression writing to pick 
the right format

> [R] Update the as.Date() binding
> 
>
> Key: ARROW-15805
> URL: https://issues.apache.org/jira/browse/ARROW-15805
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Dragoș Moldovan-Grünfeld
>Priority: Major
> Fix For: 9.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)