Dragoș Moldovan-Grünfeld created ARROW-16596:
------------------------------------------------

             Summary: [C++] Add option to control the cutoff between 1900 and 
2000 when %y 
                 Key: ARROW-16596
                 URL: https://issues.apache.org/jira/browse/ARROW-16596
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, R
    Affects Versions: 8.0.0
            Reporter: Dragoș Moldovan-Grünfeld


When parsing to datetime a string with year in the short format ({{{}%y{}}}), 
it would be great if we could have control over the cutoff point between 1900 
and 2000. Currently it is implicitly set to 68:
{code:r}
library(arrow, warn.conflicts = FALSE)

a <- Array$create(c("68-05-17", "69-05-17"))
call_function("strptime", a, options = list(format = "%y-%m-%d", unit = 0L))
#> Array
#> <timestamp[s]>
#> [
#>   2068-05-17 00:00:00,
#>   1969-05-17 00:00:00
#> ]
{code}
For example, lubridate has names this argument {{cutoff_2000}} argument (e.g. 
for {{{}fast_strptime{}}}. This works as follows:
{code:r}
library(lubridate, warn.conflicts = FALSE)

dates_vector <- c("68-05-17", "69-05-17", "55-05-17")
fast_strptime(dates_vector, format = "%y-%m-%d")
#> [1] "2068-05-17 UTC" "1969-05-17 UTC" "2055-05-17 UTC"
fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 50)
#> [1] "1968-05-17 UTC" "1969-05-17 UTC" "1955-05-17 UTC"
fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 70)
#> [1] "2068-05-17 UTC" "2069-05-17 UTC" "2055-05-17 UTC"
{code}
In the {{lubridate::fast_strptime()}} documentation it is described as follows:
{quote}
cutoff_2000 
integer. For y format, two-digit numbers smaller or equal to cutoff_2000 are 
parsed as though starting with 20, otherwise parsed as though starting with 19. 
Available only for functions relying on lubridates internal parser.
{quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to