djnavarro commented on code in PR #12154:
URL: https://github.com/apache/arrow/pull/12154#discussion_r926134717


##########
r/R/util.R:
##########
@@ -215,3 +215,138 @@ handle_csv_read_error <- function(e, schema, call) {
 is_compressed <- function(compression) {
   !identical(compression, "uncompressed")
 }
+
+parse_period_unit <- function(x) {
+
+  # the regexp matches against fractional units, but per lubridate
+  # supports integer multiples of a known unit only
+  match_info <- regexpr(
+    pattern = " *(?<multiple>[0-9.,]+)? *(?<unit>[^ \t\n]+)",
+    text = x[[1]],
+    perl = TRUE
+  )
+
+  capture_start <- attr(match_info, "capture.start")
+  capture_length <- attr(match_info, "capture.length")
+  capture_end <- capture_start + capture_length - 1L
+
+  str_unit <- substr(x, capture_start[[2]], capture_end[[2]])
+  str_multiple <- substr(x, capture_start[[1]], capture_end[[1]])
+
+  known_units <- c("nanosecond", "microsecond", "millisecond", "second",
+                   "minute", "hour", "day", "week", "month", "quarter", "year")
+
+  # match the period unit
+  str_unit_start <- substr(str_unit, 1, 3)
+  unit <- as.integer(pmatch(str_unit_start, known_units)) - 1L

Review Comment:
   Yeah, this is bothering me too. Personally I would prefer to be stricter to 
avoid the `"3 mickeys"` problem, but if I were to do that it would break 
compatibility with lubridate. The substring matching I've implemented here is 
an exact mirror of how lubridate handles it (I think). In this instance I felt 
that lubridate compatibility was the more important consideration, because 
someone might be relying on it, e.g., by using something like `"3 microsecs"` 
or some other abbreviation that I can't predict. Unless you feel strongly I'd 
prefer to leave it as is, and add a comment explaining the logic for this 
design choice so that we remember why we did it this way



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to