jorisvandenbossche commented on code in PR #13687:
URL: https://github.com/apache/arrow/pull/13687#discussion_r965650420


##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,174 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package)
+using their registered function name.
+
+To register a UDF, a function name, function docs, input types and
+output type need to be defined. Using 
:func:`pyarrow.compute.register_scalar_function`,
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+Generalizing Usage
+------------------
+
+PyArrow UDFs accept input types of both scalar and array. Also it can have
+any combination of these types. It is important that the UDF author ensures
+the UDF can handle such combinations correctly. Also the ability to use UDFs
+with existing data processing libraries is very useful.
+
+Let's consider a scenario where we have a function
+which computes a scalar `y` value based on scalar/array inputs 
+`m`, `x` and `c` using Numpy arithmetic operations.
+
+.. code-block:: python
+
+   >>> import pyarrow as pa
+   >>> import numpy as np
+   >>> function_name = "affine_with_numpy"
+   >>> function_docs = {
+   ...        "summary": "Calculate y = mx + c with Numpy",
+   ...        "description":
+   ...            "Compute the affine function y = mx + c.\n"
+   ...            "This function takes three inputs, m, x and c, in order."
+   ... }
+   >>> input_types = {
+   ...    "m" : pa.float64(),
+   ...    "x" : pa.float64(),
+   ...    "c" : pa.float64(),
+   ... }
+   >>> output_type = pa.float64()
+   >>> 
+   >>> def to_numpy(val):
+   ...     if isinstance(val, pa.Scalar):
+   ...         return val.as_py()
+   ...     else:
+   ...         return np.array(val)
+   ... 
+   >>> def affine_with_numpy(ctx, m, x, c):
+   ...     m = to_numpy(m)
+   ...     x = to_numpy(x)
+   ...     c = to_numpy(c)
+   ...     return pa.array(m * x + c)
+   ... 
+   >>> pc.register_scalar_function(affine_with_numpy,
+   ...                             function_name,
+   ...                             function_docs,
+   ...                             input_types,
+   ...                             output_type)
+   >>> pc.call_function(function_name, [pa.scalar(10.1), pa.scalar(10.2), 
pa.scalar(20.2)])
+   <pyarrow.DoubleScalar: 123.22>
+   >>> pc.call_function(function_name, [pa.scalar(10.1), pa.array([10.2, 
20.2]), pa.scalar(20.2)])
+   <pyarrow.lib.DoubleArray object at 0x10e38eb20>
+   [
+      123.22,
+      224.21999999999997
+   ]
+
+Note that there is a helper function `to_numpy` to handle the conversion of 
scalar an array inputs
+to the UDf. Also, the final output is returned as a scalr or an array 
depending on the inputs.

Review Comment:
   ```suggestion
   to the UDF. Also, the final output is returned as a scalar or an array 
depending on the inputs.
   ```



##########
docs/source/python/api/compute.rst:
##########
@@ -555,3 +555,12 @@ Compute Options
    TrimOptions
    VarianceOptions
    WeekOptions
+
+Custom Functions

Review Comment:
   ```suggestion
   User-Defined Functions
   ```
   
   (to keep the titles consistent)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to