[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-21190: -------------------------------- Summary: SPIP: Vectorized UDFs in Python (was: SPIP: Vectorized UDFs for Python) > SPIP: Vectorized UDFs in Python > ------------------------------- > > Key: SPARK-21190 > URL: https://issues.apache.org/jira/browse/SPARK-21190 > Project: Spark > Issue Type: New Feature > Components: PySpark, SQL > Affects Versions: 2.2.0 > Reporter: Reynold Xin > Assignee: Reynold Xin > Labels: SPIP > > *Background and Motivation* > > Python is one of the most popular programming languages among Spark users. > Spark currently exposes a row-at-a-time interface for defining and executing > user-defined functions (UDFs). This introduces high overhead in serialization > and deserialization, and also makes it difficult to leverage Python libraries > that are written in native code. This proposal advocates introducing new APIs > to support vectorized UDFs in Python, in which a block of data is transferred > over to Python in some column format for execution. > > > *Target Personas* > Data scientists, data engineers, library developers. > > *Goals* > ... todo ... > > *Non-Goals* > - Define block oriented UDFs in other languages (that are not Python). > - Define aggregate UDFs > > > *Proposed API Changes* > > ... todo ... > > > > *Optional Design Sketch* > The implementation should be pretty straightforward and is not a huge concern > at this point. I’m more concerned about getting proper feedback for API > design. > > > *Optional Rejected Designs* > See above. > > > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org