[ 
https://issues.apache.org/jira/browse/ARROW-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490429#comment-17490429
 ] 

Weston Pace commented on ARROW-15635:
-------------------------------------

Thanks, this is helpful.  This might be interesting to 
[~jorisvandenbossche][~amol-] as well.

> [C++][Python] UDF Integration 
> ------------------------------
>
>                 Key: ARROW-15635
>                 URL: https://issues.apache.org/jira/browse/ARROW-15635
>             Project: Apache Arrow
>          Issue Type: Task
>          Components: C++, Python
>            Reporter: Vibhatha Lakmal Abeykoon
>            Assignee: Vibhatha Lakmal Abeykoon
>            Priority: Major
>
> The objective is to list down a set of tasks required to provide UDF support 
> for Apache Arrow streaming execution engine. In the first iteration we will 
> be focusing on providing support for Python-based UDFs which can support 
> Python functions. 
> The UDF Integration is going to pan out with a series of sub-tasks associated 
> with the development and PoCs. Note that this is going to be the first 
> iteration of UDF integrations with a limited scope. This ticket will cover 
> the following topics;
>  # POC for UDF integration: The objective is to evaluate the existing 
> components in the source and evaluate the required modifications and new 
> building blocks required to integrate UDFs.
>  # The language will be limited to C+{+}/{+}Python users can register Python 
> function as a UDF and use it with an `apply` method on Arrow Tables or 
> provide a computation API endpoint via arrow::compute API. Note that the C+ 
> API already provides a way to register custom functions via the function 
> registry API. At the moment this is not exposed to Python. 
>  # Planned features for this ticket are;
>  ## Scalar UDFs : UDFs executed per value (per row)
>  ## Vector UDFs : UDFs executed per batch (a full array or partial array)
>  ## Aggregate UDFs : UDFs associated with an aggregation operation
>  # Integration limitations
>  ## Doesn't support custom data types which doesn't support Numpy or Pandas
>  ## Complex processing with parallelism within UDFs are not supported
>  ## Parallel UDFs are not supported in the initial version of UDFs. Allthough 
> we are documenting what is required and a rough sketch for the next phase. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to