jorisvandenbossche commented on a change in pull request #11679:
URL: https://github.com/apache/arrow/pull/11679#discussion_r749383061



##########
File path: docs/source/python/integration/python_r.rst
##########
@@ -0,0 +1,237 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+Integrating PyArrow with R
+==========================
+
+Arrow supports exchanging data within the same process through the
+:ref:`c-data-interface`.
+
+This can be used to exchange data between Python and R functions and
+methods so that the two languages can interact without any cost of
+marshaling and unmarshaling data.
+
+.. note::
+
+    The article takes for granted that you have a ``Python`` environment
+    with ``pyarrow`` correctly installed and an ``R`` environment with
+    ``arrow`` library correctly installed.
+
+Invoking R functions from Python
+--------------------------------
+
+Suppose we have a simple R function receiving an Arrow Array to
+add ``3`` to all its elements:
+
+.. code-block:: R
+
+    library(arrow)
+
+    addthree <- function(arr) {
+        return(arr + 3)
+    }
+
+We could save such function in a ``addthree.R`` file so that we can
+make it available for reuse.
+
+Once the ``addthree.R`` is created we can invoke any of its functions
+from Python using the 
+`rpy2 <https://rpy2.github.io/doc/latest/html/index.html>`_ library which
+enables a R runtime within the Python interpreter.
+
+``rpy2`` can be installed using ``pip`` like most python libraries
+
+.. code-block:: bash
+
+    $ pip install rpy2
+
+The most basic thing we can do with our ``addthree`` function is to
+invoke it from Python with a number and see how it will return the result.
+
+To do so we can create an ``addthree.py`` file which uses ``rpy2`` to
+import the ``addthree`` function from ``addthree.R`` file and invoke it:
+
+.. code-block:: python
+
+    import rpy2.robjects as robjects
+
+    # Load the addthree.R file
+    r_source = robjects.r["source"]
+    r_source("addthree.R")
+
+    # Get a reference to the addthree function
+    addthree = robjects.r["addthree"]
+
+    # Invoke the function
+    r = addthree(3)
+
+    # Access the returned value
+    value = r[0]
+    print(value)
+
+Running the ``addthree.py`` file will show how our Python code is able
+to access the ``R`` function and print the expected result:
+
+.. code-block:: bash
+
+    $ python addthree.py 
+    6.0
+
+If instead of passing around basic data types we want to pass around
+Arrow Arrays, that can be done exporting the array from Python to the C Data
+interface and importing them back from R.
+
+To enable importing the Arrow Array from the C Data interface we have to
+wrap our ``addthree`` function in a function that does the extra work
+necessary to import an Arrow Array in R from the C Data interface.
+
+That work will be done by the ``addthree_cdata`` function which invokes the
+``addthree`` function once the Array is imported.
+
+Our ``addthree.R`` will thus have both the ``addthree_cdata`` and the 
+``addthree`` functions:
+
+.. code-block:: R
+
+    library(arrow)
+
+    addthree_cdata <- function(array_ptr_s, schema_ptr_s) {
+        array_ptr <- as.numeric(array_ptr_s)
+        schema_ptr <- as.numeric(schema_ptr_s)
+
+        a <- Array$import_from_c(array_ptr, schema_ptr)
+
+        return(addthree(a))
+    }
+
+    addthree <- function(arr) {
+        return(arr + 3)
+    }
+
+We can now provide to R the array and its schema from Python through the
+``array_ptr_s`` and ``schema_ptr_s`` arguments so that R can build back
+an ``Array`` from them and then invoke ``addthree`` with the array.
+
+Invoking ``addthree_cdata`` from Python involves building the Array we
+want to pass to ``R``, exporting it to the C Data interface and then
+passing the exported references to the ``R`` function.
+
+Our ``addthree.py`` will thus become:
+
+.. code-block:: python
+
+    # Get a reference to the addthree_cdata R function
+    import rpy2.robjects as robjects
+    r_source = robjects.r["source"]
+    r_source("addthree.R")
+    addthree = robjects.r["addthree_cdata"]
+
+    # Create the pyarrow array we want to pass to R
+    import pyarrow
+    array = pyarrow.array((1, 2, 3))
+
+    # Import the pyarrow module that provides access to the C Data interface
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Allocate structures where we will export the Array data 
+    # and the Array schema. They will be released when we exit the with block.
+    with arrow_c.new("struct ArrowArray*") as c_array, \
+         arrow_c.new("struct ArrowSchema*") as c_schema:
+        # Get the references to the C Data structures.
+        c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+        c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+
+        # Export the Array and its schema to the C Data structures.
+        array._export_to_c(c_array_ptr)
+        array.type._export_to_c(c_schema_ptr)
+
+        # Invoke the R addthree_cdata function passing the references
+        # to the array and schema C Data structures. 
+        # Those references are passed as strings as R doesn't have
+        # native support for 64bit integers, so the integers are
+        # converted to their string representation for R to convert it back.
+        r_result_array = addthree(str(c_array_ptr), str(c_schema_ptr))

Review comment:
       ```suggestion
           r_result_array = addthree_cdata(str(c_array_ptr), str(c_schema_ptr))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to