Interesting, thanks for the heads up.

On 7/6/15, 7:19 PM, "Davies Liu" <dav...@databricks.com> wrote:

>Currently, Python UDFs run in a Python instances, are MUCH slower than
>Scala ones (from 10 to 100x). There is JIRA to improve the
>performance: https://issues.apache.org/jira/browse/SPARK-8632, After
>that, they will be still much slower than Scala ones (because Python
>is lower and the overhead for calling Python).
>
>On Mon, Jul 6, 2015 at 12:55 PM, Eskilson,Aleksander
><alek.eskil...@cerner.com> wrote:
>> Hi there,
>>
>> I’m trying to get a feel for how User Defined Functions from SparkSQL
>>(as
>> written in Python and registered using the udf function from
>> pyspark.sql.functions) are run behind the scenes. Trying to grok the
>>source
>> it seems that the native Python function is serialized for distribution
>>to
>> the clusters. In practice, it seems to be able to check for other
>>variables
>> and functions defined elsewhere in the namepsace and include those in
>>the
>> function’s serialization.
>>
>> Following all this though, when actually run, are Python interpreter
>> instances on each node brought up to actually run the function against
>>the
>> RDDs, or can the serialized function somehow be run on just the JVM? If
>> bringing up Python instances is the execution model, what is the
>>overhead of
>> PySpark UDFs like as compared to those registered in Scala?
>>
>> Thanks,
>> Alek
>> CONFIDENTIALITY NOTICE This message and any included attachments are
>>from
>> Cerner Corporation and are intended only for the addressee. The
>>information
>> contained in this message is confidential and may constitute inside or
>> non-public information under international, federal, or state securities
>> laws. Unauthorized forwarding, printing, copying, distribution, or use
>>of
>> such information is strictly prohibited and may be unlawful. If you are
>>not
>> the addressee, please promptly delete this message and notify the
>>sender of
>> the delivery error by e-mail or you may call Cerner's corporate offices
>>in
>> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to