thisisnic commented on a change in pull request #11915:
URL: https://github.com/apache/arrow/pull/11915#discussion_r773224132



##########
File path: r/vignettes/developers/bindings.Rmd
##########
@@ -0,0 +1,225 @@
+# Writing Bindings
+
+```{r, include=FALSE}
+library(arrow, warn.conflicts = FALSE)
+library(dplyr, warn.conflicts = FALSE)
+```
+
+When writing bindings between C++ compute functions and R functions, the aim 
is 
+to expose the C++ functionality via the same interface as existing R 
functions. The syntax and 
+functionality should match that of the existing R functions 
+(though there are some exceptions) so that users are able to use existing 
tidyverse 
+or base R syntax, whilst taking advantage of the speed and functionality of 
the 
+underlying arrow package.
+
+One of main ways in which users interact with arrow is via 
+[dplyr](https://dplyr.tidyverse.org/) syntax called on Arrow objects.  For 
+example, when a user calls `dplyr::mutate()` on an Arrow Tabular, 
+Dataset, or arrow data query object, the Arrow implementation of `mutate()` is 
+used and under the hood, translates the dplyr code into Arrow C++ code.
+
+When using `dplyr::mutate()` or `dplyr::filter()`, you may want to use 
functions
+from other packages.  The example below uses `stringr::str_detect()`.
+
+```{r}
+library(dplyr)
+library(stringr)
+starwars %>%
+  filter(str_detect(name, "Darth"))
+```
+This functionality has also been implemented in Arrow, e.g.:
+
+```{r}
+library(arrow)
+arrow_table(starwars) %>%
+  filter(str_detect(name, "Darth")) %>%
+  collect()
+```
+
+This is possible as a **binding** has been created between the call to the 
+stringr function `str_detect()` and the Arrow C++ code, here as a direct 
mapping
+to `match_substring_regex`.  You can see this for yourself by inspecting the 
+arrow data query object without retrieving the results via `collect()`.
+
+
+```{r}
+arrow_table(starwars) %>%
+  filter(str_detect(name, "Darth"))
+```
+
+In the following sections, we'll walk through how to create a binding between 
an 
+R function and an Arrow C++ function.
+
+# Walkthrough
+
+Imagine you are writing the bindings for the C++ function 
+[`starts_with()`](https://arrow.apache.org/docs/cpp/compute.html#containment-tests)
 
+and want to bind it to the (base) R function `startsWith()`.
+
+First, take a look at the docs for both of those functions.
+
+## Examining the R function
+
+Here are the docs for R's `startsWith()` (also available at 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/startsWith.html)
+
+```{r, echo=FALSE, out.width="50%"}
+knitr::include_graphics("./startswithdocs.png")
+```
+
+It takes 2 parameters; `x` - the input, and `prefix` - the characters to check 
+if `x` starts with.
+
+## Examining the C++ function
+
+Now, go to 
+[the compute function 
documentation](https://arrow.apache.org/docs/cpp/compute.html#containment-tests)
+and look for the Arrow C++ library's `starts_with()` function:
+
+```{r, echo=FALSE, out.width="100%"}
+knitr::include_graphics("./starts_with_docs.png")
+```
+
+The docs show that `starts_with()` is a unary function, which means that it 
takes a
+single data input. The data input must be a string-like class, and the 
returned 
+value is boolean, both of which match up to R's `startsWith()`.
+
+There is an options class associated with `starts_with()` - called 
[`MatchSubstringOptions`](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4N5arrow7compute21MatchSubstringOptionsE)
+- so let's take a look at that.
+
+```{r, echo=FALSE, out.width="100%"}
+knitr::include_graphics("./matchsubstringoptions.png")
+```
+
+Options classes allow the user to control the behaviour of the function.  In 
+this case, there are two possible options which can be supplied - `pattern` 
and 
+`ignore_case`, which are described in the docs shown above.
+
+## Comparing the R and C++ functions
+
+What conclusions can be drawn from what you've seen so far?
+
+Base R's `startsWith()` and Arrow's `starts_with()` operate on equivalent data 
+types, return equivalent data types, and as there are no options implemented 
in 
+R that Arrow doesn't have, this should be fairly simple to map without a great 
+deal of extra work.  
+
+As `starts_with()` has an options class associated with it, we'll need to make 
+sure that it's linked up with this in the R code.
+
+In case you're wondering about the difference between arguments in R and 
options
+in Arrow, in R, arguments to functions can include the actual data to be 
+analysed as well as options governing how the function works, whereas in the 
+C++ compute functions, the arguments are the data to be analysed and the 
+options are for specifying how exactly the function works.

Review comment:
       I'll leave it for now, though may come back to this when next updating




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to