paleolimbot commented on PR #13789:
URL: https://github.com/apache/arrow/pull/13789#issuecomment-1214992856
This is very cool! It's the most important type of user-defined function
because it's 100% translatable using Arrow kernels so it runs in parallel...a
lot of applications will benefit from this!
Have you considered adding a registration step? If you do, you may be able
to simplify some of this. The dream, of course, is to not require
pre-registration at all, which will require an approach much like the one
you've sketched out here, (i.e., preprocessing the expression).
<details>
``` r
library(dplyr, warn.conflicts = FALSE)
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()`
for more information.
register_user_binding <- function(name, f, env = rlang::caller_env()) {
# copy the bindings environment because we don't want to set the parent
# of the one-and-only official bindings environment
bindings_env <- as.environment(as.list(arrow:::nse_funcs))
parent.env(bindings_env) <- env
environment(f) <- bindings_env
# register for use in Arrow (non-agg)
arrow:::register_binding(name, f, update_cache = TRUE)
# in case this is a recursive function
arrow:::register_binding(name, f, bindings_env)
# so that the user can call this function, too (most Arrow bindings accept
# regular input, too)
invisible(f)
}
nchar2 <- register_user_binding("nchar2", function(x) {
1 + nchar(x)
})
record_batch(my_string = "1234") %>%
mutate(
var1 = nchar(my_string),
var2 = nchar2(my_string)) %>%
collect()
#> # A tibble: 1 × 3
#> my_string var1 var2
#> <chr> <int> <dbl>
#> 1 1234 4 5
```
<sup>Created on 2022-08-15 by the [reprex
package](https://reprex.tidyverse.org) (v2.0.1)</sup>
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]