Hello All, I need help deciding on what is better, pandas udfs or inbuilt functions I have to perform a transformation where I managed to compare the two for a few thousand records and pandas_udf infact performed better. Given the complexity of the transformation, I also found pandas_udf makes it more readable. I also found a lot of comparisons made between normal udfs and pandas_udfs
What I am looking forward to is whether pandas_udfs will behave as a normal pyspark in-built data. How do pandas_udfs work internally, and will they be equally performant on bigger sets of data.? I did go through a few documents but wasn't able to get a clear idea. I am mainly looking from the performance perspective. Thanks in advance Regards, Neha R.Garde.