Thanks Georg. But I'm not sure how mapPartitions is relevant here.  Can you
elaborate?



On Thu, Jun 15, 2017 at 4:18 AM, Georg Heiler <georg.kf.hei...@gmail.com>
wrote:

> What about using map partitions instead?
>
> RD <rdsr...@gmail.com> schrieb am Do. 15. Juni 2017 um 06:52:
>
>> Hi Spark folks,
>>
>>     Is there any plan to support the richer UDF API that Hive supports
>> for Spark UDFs ? Hive supports the GenericUDF API which has, among others
>> methods like initialize(), configure() (called once on the cluster) etc,
>> which a lot of our users use. We have now a lot of UDFs in Hive which make
>> use of these methods. We plan to move to UDFs to Spark UDFs but are being
>> limited by not having similar lifecycle methods.
>>    Are there plans to address these? Or do people usually adopt some sort
>> of workaround?
>>
>>    If we  directly use  the Hive UDFs  in Spark we pay a performance
>> penalty. I think Spark anyways does a conversion from InternalRow to Row
>> back to InternalRow for native spark udfs and for Hive it does InternalRow
>> to Hive Object back to InternalRow but somehow the conversion in native
>> udfs is more performant.
>>
>> -Best,
>> R.
>>
>

Reply via email to