Andy Teucher created ARROW-16007: ------------------------------------ Summary: [R] binding for grepl has different behaviour with NA compared to R base grepl Key: ARROW-16007 URL: https://issues.apache.org/jira/browse/ARROW-16007 Project: Apache Arrow Issue Type: Improvement Affects Versions: 7.0.0 Reporter: Andy Teucher
The arrow binding to {{grepl}} behaves slightly differently than the base R {{{}grepl{}}}, in that it returns {{NA}} for {{NA}} inputs, whereas base {{grepl}} returns {{FALSE }}with{{ NA }}inputs. arrow's implementention is consistent with {{{}stringr::str_detect(){}}}, and both {{str_detect()}} and {{grepl()}} are bound to {{match_substring_regex}} and {{match_substring}} in arrow. I don't know if this is something you would want to change so that the {{grepl}} behaviour aligns with base {{{}grepl{}}}, or simply document this difference? Reprex: {code:r} library(arrow, warn.conflicts = FALSE, quietly = TRUE) library(dplyr, warn.conflicts = FALSE, quietly = TRUE) library(stringr, quietly = TRUE) alpha_df <- data.frame(alpha = c("alpha", "bet", NA_character_)) alpha_dataset <- InMemoryDataset$create(alpha_df) mutate(alpha_df, grepl_is_a = grepl("a", alpha), stringr_is_a = str_detect(alpha, "a")) #> alpha grepl_is_a stringr_is_a #> 1 alpha TRUE TRUE #> 2 bet FALSE FALSE #> 3 <NA> FALSE NA mutate(alpha_dataset, grepl_is_a = grepl("a", alpha), stringr_is_a = str_detect(alpha, "a")) |> collect() #> alpha grepl_is_a stringr_is_a #> 1 alpha TRUE TRUE #> 2 bet FALSE FALSE #> 3 <NA> NA NA # base R grepl returns FALSE for NA grepl("a", alpha_df$alpha) # bound to arrow_match_substring_regex #> [1] TRUE FALSE FALSE grepl("a", alpha_df$alpha, fixed = TRUE) # bound to arrow_match_substring #> [1] TRUE FALSE FALSE # stringr::str_dectect returns NA for NA str_detect(alpha_df$alpha, "a") #> [1] TRUE FALSE NA alpha_array <- Array$create(alpha_df$alpha) # arrow functions return null for null (NA) call_function("match_substring_regex", alpha_array, options = list(pattern = "a")) #> Array #> <bool> #> [ #> true, #> false, #> null #> ] call_function("match_substring", alpha_array, options = list(pattern = "a")) #> Array #> <bool> #> [ #> true, #> false, #> null #> ] {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)