djnavarro commented on code in PR #12154:
URL: https://github.com/apache/arrow/pull/12154#discussion_r926134717
##########
r/R/util.R:
##########
@@ -215,3 +215,138 @@ handle_csv_read_error <- function(e, schema, call) {
is_compressed <- function(compression) {
!identical(compression, "uncompressed")
}
+
+parse_period_unit <- function(x) {
+
+ # the regexp matches against fractional units, but per lubridate
+ # supports integer multiples of a known unit only
+ match_info <- regexpr(
+ pattern = " *(?<multiple>[0-9.,]+)? *(?<unit>[^ \t\n]+)",
+ text = x[[1]],
+ perl = TRUE
+ )
+
+ capture_start <- attr(match_info, "capture.start")
+ capture_length <- attr(match_info, "capture.length")
+ capture_end <- capture_start + capture_length - 1L
+
+ str_unit <- substr(x, capture_start[[2]], capture_end[[2]])
+ str_multiple <- substr(x, capture_start[[1]], capture_end[[1]])
+
+ known_units <- c("nanosecond", "microsecond", "millisecond", "second",
+ "minute", "hour", "day", "week", "month", "quarter", "year")
+
+ # match the period unit
+ str_unit_start <- substr(str_unit, 1, 3)
+ unit <- as.integer(pmatch(str_unit_start, known_units)) - 1L
Review Comment:
Yeah, this is bothering me too. Personally I would prefer to be stricter to
avoid the `"3 mickeys"` problem, but if I were to do that it would break
compatibility with lubridate. The substring matching I've implemented here is
an exact mirror of how lubridate handles it (I think). In this instance I felt
that lubridate compatibility was the more important consideration, because
someone might be relying on it, e.g., by using something like `"3 microsecs"`
or some other abbreviation that I can't predict. Unless you feel strongly I'd
prefer to leave it as is, and add a comment explaining the logic for this
design choice so that we remember why we did it this way
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]