[ 
https://issues.apache.org/jira/browse/FLINK-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-16114.
---------------------------
    Resolution: Resolved

> Support Scalar Vectorized Python UDF in PyFlink
> -----------------------------------------------
>
>                 Key: FLINK-16114
>                 URL: https://issues.apache.org/jira/browse/FLINK-16114
>             Project: Flink
>          Issue Type: New Feature
>          Components: API / Python
>            Reporter: Dian Fu
>            Assignee: Dian Fu
>            Priority: Major
>             Fix For: 1.11.0
>
>
> Scalar Python UDF has already been supported in Flink 1.10 
> ([FLIP-58|https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table])
>  and it operates one row at a time. It works in the way that the Java 
> operator serializes one input row to bytes and sends them to the Python 
> worker; the Python worker deserializes the input row and evaluates the Python 
> UDF with it; the result row is serialized and sent back to the Java operator.
> It suffers from the following problems:
>  # High serialization/deserialization overhead
>  # It’s difficult to leverage the popular Python libraries used by data 
> scientists, such as Pandas, Numpy, etc which provide high performance data 
> structure and functions.
> We want to introduce vectorized Python UDF to address this problem. For 
> vectorized Python UDF, a batch of rows are transferred between JVM and Python 
> VM in columnar format. The batch of rows will be converted to a collection of 
> Pandas.Series and given to the vectorized Python UDF which could then 
> leverage the popular Python libraries such as Pandas, Numpy, etc for the 
> Python UDF implementation.
> More details could be found in 
> [FLIP-97.|https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to