Interesting, thanks for the heads up.
On 7/6/15, 7:19 PM, Davies Liu dav...@databricks.com wrote:
Currently, Python UDFs run in a Python instances, are MUCH slower than
Scala ones (from 10 to 100x). There is JIRA to improve the
performance: https://issues.apache.org/jira/browse/SPARK-8632, After
Hi there,
I’m trying to get a feel for how User Defined Functions from SparkSQL (as
written in Python and registered using the udf function from
pyspark.sql.functions) are run behind the scenes. Trying to grok the source it
seems that the native Python function is serialized for distribution
Currently, Python UDFs run in a Python instances, are MUCH slower than
Scala ones (from 10 to 100x). There is JIRA to improve the
performance: https://issues.apache.org/jira/browse/SPARK-8632, After
that, they will be still much slower than Scala ones (because Python
is lower and the overhead for