[jira] [Updated] (ARROW-15635) [C++][Python] UDF Integration

Vibhatha Lakmal Abeykoon (Jira) Thu, 10 Feb 2022 06:09:16 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vibhatha Lakmal Abeykoon updated ARROW-15635:
---------------------------------------------
    Description: 
The objective is to list down a set of tasks required to provide UDF support 
for Apache Arrow streaming execution engine. In the first iteration we will be 
focusing on providing support for Python-based UDFs which can support Python 
functions. 

The UDF Integration is going to pan out with a series of sub-tasks associated 
with the development and PoCs. Note that this is going to be the first 
iteration of UDF integrations with a limited scope. This ticket will cover the 
following topics;
 # POC for UDF integration: The objective is to evaluate the existing 
components in the source and evaluate the required modifications and new 
building blocks required to integrate UDFs.
 # The language will be limited to C+{+}/{+}Python users can register Python 
function as a UDF and use it with an `apply` method on Arrow Tables or provide 
a computation API endpoint via arrow::compute API. Note that the C+ API already 
provides a way to register custom functions via the function registry API. At 
the moment this is not exposed to Python. 
 # Planned features for this ticket are;
 ## Scalar UDFs : UDFs executed per value (per row)
 ## Vector UDFs : UDFs executed per batch (a full array or partial array)
 ## Aggregate UDFs : UDFs associated with an aggregation operation
 # Integration limitations
 ## Doesn't support custom data types which doesn't support Numpy or Pandas
 ## Complex processing with parallelism within UDFs are not supported
 ## Parallel UDFs are not supported in the initial version of UDFs. Allthough 
we are documenting what is required and a rough sketch for the next phase. 

  was:
The objective is to list down a set of tasks required to provide UDF support 
for Apache Arrow streaming execution engine. In the first iteration we will be 
focusing on providing support for Python-based UDFs which can support Python 
functions. 

The UDF Integration is going to pan out with a series of sub-tasks associated 
with the development and PoCs. Note that this is going to be the first 
iteration of UDF integrations with a limited scope. This ticket will cover the 
following topics;
 # POC for UDF integration: The objective is to evaluate the existing 
components in the source and evaluate the required modifications and new 
building blocks required to integrate UDFs.
 # The language will be limited to C+{+}/{+}Python users can register Python 
function as a UDF and use it with an `apply` method on Arrow Tables or provide 
a computation API endpoint via arrow::compute API. Note that the C+ API already 
provides a way to register custom functions via the function registry API. At 
the moment this is not exposed to Python. 
 # Planned features for this ticket are;
 ## Scalar UDFs : UDFs executed per value (per row)
 ## Vector UDFs : UDFs executed per batch (a full array or partial array)
 ## Aggregate UDFs : UDFs associated with an aggregation operation
 # Integration limitations
 ## Doesn't support custom data types which doesn't support Numpy or Pandas
 ## Complex processing with parallelism within UDFs are not supported


> [C++][Python] UDF Integration 
> ------------------------------
>
>                 Key: ARROW-15635
>                 URL: https://issues.apache.org/jira/browse/ARROW-15635
>             Project: Apache Arrow
>          Issue Type: Task
>          Components: C++, Python
>            Reporter: Vibhatha Lakmal Abeykoon
>            Assignee: Vibhatha Lakmal Abeykoon
>            Priority: Major
>
> The objective is to list down a set of tasks required to provide UDF support 
> for Apache Arrow streaming execution engine. In the first iteration we will 
> be focusing on providing support for Python-based UDFs which can support 
> Python functions. 
> The UDF Integration is going to pan out with a series of sub-tasks associated 
> with the development and PoCs. Note that this is going to be the first 
> iteration of UDF integrations with a limited scope. This ticket will cover 
> the following topics;
>  # POC for UDF integration: The objective is to evaluate the existing 
> components in the source and evaluate the required modifications and new 
> building blocks required to integrate UDFs.
>  # The language will be limited to C+{+}/{+}Python users can register Python 
> function as a UDF and use it with an `apply` method on Arrow Tables or 
> provide a computation API endpoint via arrow::compute API. Note that the C+ 
> API already provides a way to register custom functions via the function 
> registry API. At the moment this is not exposed to Python. 
>  # Planned features for this ticket are;
>  ## Scalar UDFs : UDFs executed per value (per row)
>  ## Vector UDFs : UDFs executed per batch (a full array or partial array)
>  ## Aggregate UDFs : UDFs associated with an aggregation operation
>  # Integration limitations
>  ## Doesn't support custom data types which doesn't support Numpy or Pandas
>  ## Complex processing with parallelism within UDFs are not supported
>  ## Parallel UDFs are not supported in the initial version of UDFs. Allthough 
> we are documenting what is required and a rough sketch for the next phase. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (ARROW-15635) [C++][Python] UDF Integration

Reply via email to