[ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580971#comment-17580971
 ] 

Marcelo Rossini Castro commented on SPARK-40063:
------------------------------------------------

Normally I use the default, {{{}distributed-sequence{}}}, but I already tried 
{{sequence}} too and I get the same error.
So, I tried it again, combining with {{compute.ordered_head}} enabled.

I'm having to use {{compute.ops_on_diff_frames}} enabled, I think it's worth 
mentioning.

> pyspark.pandas .apply() changing rows ordering
> ----------------------------------------------
>
>                 Key: SPARK-40063
>                 URL: https://issues.apache.org/jira/browse/SPARK-40063
>             Project: Spark
>          Issue Type: Bug
>          Components: Pandas API on Spark
>    Affects Versions: 3.3.0
>         Environment: Databricks Runtime 11.1
>            Reporter: Marcelo Rossini Castro
>            Priority: Minor
>              Labels: Pandas, PySpark
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2 
> df['col_to_apply_function'] = df.apply(lambda row: 
> example_func(row['col_to_apply_function']), axis=1) {code}
> A workaround is to assign the results to a new column instead of the same 
> one, but if the old column is dropped, the same error is produced.
> Setting one column as index also didn't work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to