Re: Profiling PySpark Pandas UDF

Gourav Sengupta Thu, 25 Aug 2022 09:21:56 -0700

Hi,

May be I am jumping to conclusions and making stupid guesses, but have you
tried koalas now that it is natively integrated with pyspark??


Regards
Gourav

On Thu, 25 Aug 2022, 11:07 Subash Prabanantham, <subashpraba...@gmail.com>
wrote:

> Hi All,
>
> I was wondering if we have any best practices on using pandas UDF ?
> Profiling UDF is not an easy task and our case requires some drilling down
> on the logic of the function.
>
>
> Our use case:
> We are using func(Dataframe) => Dataframe as interface to use Pandas UDF,
> while running locally only the function, it runs faster but when executed
> in Spark environment - the processing time is more than expected. We have
> one column where the value is large (BinaryType -> 600KB), wondering
> whether this could make the Arrow computation slower ?
>
> Is there any profiling or best way to debug the cost incurred using pandas
> UDF ?
>
>
> Thanks,
> Subash
>
>

Re: Profiling PySpark Pandas UDF

Reply via email to