[jira] [Updated] (SPARK-21190) SPIP: Vectorized UDFs in Python

Reynold Xin (JIRA) Fri, 23 Jun 2017 00:51:47 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Reynold Xin updated SPARK-21190:
--------------------------------
    Summary: SPIP: Vectorized UDFs in Python  (was: SPIP: Vectorized UDFs for 
Python)

> SPIP: Vectorized UDFs in Python
> -------------------------------
>
>                 Key: SPARK-21190
>                 URL: https://issues.apache.org/jira/browse/SPARK-21190
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark, SQL
>    Affects Versions: 2.2.0
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>              Labels: SPIP
>
> *Background and Motivation*
>  
> Python is one of the most popular programming languages among Spark users. 
> Spark currently exposes a row-at-a-time interface for defining and executing 
> user-defined functions (UDFs). This introduces high overhead in serialization 
> and deserialization, and also makes it difficult to leverage Python libraries 
> that are written in native code. This proposal advocates introducing new APIs 
> to support vectorized UDFs in Python, in which a block of data is transferred 
> over to Python in some column format for execution.
>  
>  
> *Target Personas*
> Data scientists, data engineers, library developers.
>  
> *Goals*
> ... todo ...
>  
> *Non-Goals*
> - Define block oriented UDFs in other languages (that are not Python).
> - Define aggregate UDFs
>  
>  
> *Proposed API Changes*
>  
> ... todo ...
>  
>  
>  
> *Optional Design Sketch*
> The implementation should be pretty straightforward and is not a huge concern 
> at this point. I’m more concerned about getting proper feedback for API 
> design.
>  
>  
> *Optional Rejected Designs*
> See above.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21190) SPIP: Vectorized UDFs in Python

Reply via email to