Hello All,

I need help deciding on what is better, pandas udfs or inbuilt functions
I have to perform a transformation where I managed to compare the two for a
few thousand records
and pandas_udf infact performed better.
Given the complexity of the transformation, I also found pandas_udf makes
it more readable.
I also found a lot of comparisons made between normal udfs and pandas_udfs

What I am looking forward to is whether pandas_udfs will behave as a normal
pyspark in-built data.
How do pandas_udfs work internally, and will they be equally performant on
bigger sets of data.?
I did go through a few documents but wasn't able to get a clear idea.
I am mainly looking from the performance perspective.

Thanks in advance


Regards,
Neha R.Garde.

Reply via email to