Dragoș Moldovan-Grünfeld created ARROW-16596: ------------------------------------------------
Summary: [C++] Add option to control the cutoff between 1900 and 2000 when %y Key: ARROW-16596 URL: https://issues.apache.org/jira/browse/ARROW-16596 Project: Apache Arrow Issue Type: Improvement Components: C++, R Affects Versions: 8.0.0 Reporter: Dragoș Moldovan-Grünfeld When parsing to datetime a string with year in the short format ({{{}%y{}}}), it would be great if we could have control over the cutoff point between 1900 and 2000. Currently it is implicitly set to 68: {code:r} library(arrow, warn.conflicts = FALSE) a <- Array$create(c("68-05-17", "69-05-17")) call_function("strptime", a, options = list(format = "%y-%m-%d", unit = 0L)) #> Array #> <timestamp[s]> #> [ #> 2068-05-17 00:00:00, #> 1969-05-17 00:00:00 #> ] {code} For example, lubridate has names this argument {{cutoff_2000}} argument (e.g. for {{{}fast_strptime{}}}. This works as follows: {code:r} library(lubridate, warn.conflicts = FALSE) dates_vector <- c("68-05-17", "69-05-17", "55-05-17") fast_strptime(dates_vector, format = "%y-%m-%d") #> [1] "2068-05-17 UTC" "1969-05-17 UTC" "2055-05-17 UTC" fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 50) #> [1] "1968-05-17 UTC" "1969-05-17 UTC" "1955-05-17 UTC" fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 70) #> [1] "2068-05-17 UTC" "2069-05-17 UTC" "2055-05-17 UTC" {code} In the {{lubridate::fast_strptime()}} documentation it is described as follows: {quote} cutoff_2000 integer. For y format, two-digit numbers smaller or equal to cutoff_2000 are parsed as though starting with 20, otherwise parsed as though starting with 19. Available only for functions relying on lubridates internal parser. {quote} -- This message was sent by Atlassian Jira (v8.20.7#820007)