I assume you want to have this life cycle in oder to create big/ heavy / complex objects only once ( per partition) map partitions should fit this usecase pretty well. RD <rdsr...@gmail.com> schrieb am Fr. 16. Juni 2017 um 17:37:
> Thanks Georg. But I'm not sure how mapPartitions is relevant here. Can > you elaborate? > > > > On Thu, Jun 15, 2017 at 4:18 AM, Georg Heiler <georg.kf.hei...@gmail.com> > wrote: > >> What about using map partitions instead? >> >> RD <rdsr...@gmail.com> schrieb am Do. 15. Juni 2017 um 06:52: >> >>> Hi Spark folks, >>> >>> Is there any plan to support the richer UDF API that Hive supports >>> for Spark UDFs ? Hive supports the GenericUDF API which has, among others >>> methods like initialize(), configure() (called once on the cluster) etc, >>> which a lot of our users use. We have now a lot of UDFs in Hive which make >>> use of these methods. We plan to move to UDFs to Spark UDFs but are being >>> limited by not having similar lifecycle methods. >>> Are there plans to address these? Or do people usually adopt some >>> sort of workaround? >>> >>> If we directly use the Hive UDFs in Spark we pay a performance >>> penalty. I think Spark anyways does a conversion from InternalRow to Row >>> back to InternalRow for native spark udfs and for Hive it does InternalRow >>> to Hive Object back to InternalRow but somehow the conversion in native >>> udfs is more performant. >>> >>> -Best, >>> R. >>> >> >