Are you looking for SCALAR? that lets you map one row to one row, but
do it more efficiently in batch. What are you trying to do?

On Thu, Mar 7, 2019 at 2:03 PM peng yu <yupb...@gmail.com> wrote:
>
> I'm looking for a mapPartition(pandas_udf) for  a pyspark.Dataframe.
>
> ```
> @pandas_udf(df.schema, PandasUDFType.MAP)
> def do_nothing(pandas_df):
>     return pandas_df
>
>
> new_df = df.mapPartition(do_nothing)
> ```
> pandas_udf only support scala or GROUPED_MAP.  Why not support just Map?
>
> On Thu, Mar 7, 2019 at 2:57 PM Sean Owen <sro...@gmail.com> wrote:
>>
>> Are you looking for @pandas_udf in Python? Or just mapPartition? Those exist 
>> already
>>
>> On Thu, Mar 7, 2019, 1:43 PM peng yu <yupb...@gmail.com> wrote:
>>>
>>> There is a nice map_partition function in R `dapply`.  so that user can 
>>> pass a row to udf.
>>>
>>> I'm wondering why we don't have that in python?
>>>
>>> I'm trying to have a map_partition function with pandas_udf supported
>>>
>>> thanks!

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to