Thanks Georg. But I'm not sure how mapPartitions is relevant here. Can you elaborate?
On Thu, Jun 15, 2017 at 4:18 AM, Georg Heiler <georg.kf.hei...@gmail.com> wrote: > What about using map partitions instead? > > RD <rdsr...@gmail.com> schrieb am Do. 15. Juni 2017 um 06:52: > >> Hi Spark folks, >> >> Is there any plan to support the richer UDF API that Hive supports >> for Spark UDFs ? Hive supports the GenericUDF API which has, among others >> methods like initialize(), configure() (called once on the cluster) etc, >> which a lot of our users use. We have now a lot of UDFs in Hive which make >> use of these methods. We plan to move to UDFs to Spark UDFs but are being >> limited by not having similar lifecycle methods. >> Are there plans to address these? Or do people usually adopt some sort >> of workaround? >> >> If we directly use the Hive UDFs in Spark we pay a performance >> penalty. I think Spark anyways does a conversion from InternalRow to Row >> back to InternalRow for native spark udfs and for Hive it does InternalRow >> to Hive Object back to InternalRow but somehow the conversion in native >> udfs is more performant. >> >> -Best, >> R. >> >