Re: [pyspark] dataframe map_partition

Sean Owen Thu, 07 Mar 2019 12:15:22 -0800

Are you looking for SCALAR? that lets you map one row to one row, but
do it more efficiently in batch. What are you trying to do?


On Thu, Mar 7, 2019 at 2:03 PM peng yu <[email protected]> wrote:
>
> I'm looking for a mapPartition(pandas_udf) for  a pyspark.Dataframe.
>
> ```
> @pandas_udf(df.schema, PandasUDFType.MAP)
> def do_nothing(pandas_df):
>     return pandas_df
>
>
> new_df = df.mapPartition(do_nothing)
> ```
> pandas_udf only support scala or GROUPED_MAP.  Why not support just Map?
>
> On Thu, Mar 7, 2019 at 2:57 PM Sean Owen <[email protected]> wrote:
>>
>> Are you looking for @pandas_udf in Python? Or just mapPartition? Those exist 
>> already
>>
>> On Thu, Mar 7, 2019, 1:43 PM peng yu <[email protected]> wrote:
>>>
>>> There is a nice map_partition function in R `dapply`.  so that user can 
>>> pass a row to udf.
>>>
>>> I'm wondering why we don't have that in python?
>>>
>>> I'm trying to have a map_partition function with pandas_udf supported
>>>
>>> thanks!

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [pyspark] dataframe map_partition

Reply via email to