Hi, May be I am jumping to conclusions and making stupid guesses, but have you tried koalas now that it is natively integrated with pyspark??
Regards Gourav On Thu, 25 Aug 2022, 11:07 Subash Prabanantham, <subashpraba...@gmail.com> wrote: > Hi All, > > I was wondering if we have any best practices on using pandas UDF ? > Profiling UDF is not an easy task and our case requires some drilling down > on the logic of the function. > > > Our use case: > We are using func(Dataframe) => Dataframe as interface to use Pandas UDF, > while running locally only the function, it runs faster but when executed > in Spark environment - the processing time is more than expected. We have > one column where the value is large (BinaryType -> 600KB), wondering > whether this could make the Arrow computation slower ? > > Is there any profiling or best way to debug the cost incurred using pandas > UDF ? > > > Thanks, > Subash > >