[ https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580971#comment-17580971 ]
Marcelo Rossini Castro commented on SPARK-40063: ------------------------------------------------ Normally I use the default, {{{}distributed-sequence{}}}, but I already tried {{sequence}} too and I get the same error. So, I tried it again, combining with {{compute.ordered_head}} enabled. I'm having to use {{compute.ops_on_diff_frames}} enabled, I think it's worth mentioning. > pyspark.pandas .apply() changing rows ordering > ---------------------------------------------- > > Key: SPARK-40063 > URL: https://issues.apache.org/jira/browse/SPARK-40063 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark > Affects Versions: 3.3.0 > Environment: Databricks Runtime 11.1 > Reporter: Marcelo Rossini Castro > Priority: Minor > Labels: Pandas, PySpark > > When using the apply function to apply a function to a DataFrame column, it > ends up mixing the column's rows ordering. > A command like this: > {code:java} > def example_func(df_col): > return df_col ** 2 > df['col_to_apply_function'] = df.apply(lambda row: > example_func(row['col_to_apply_function']), axis=1) {code} > A workaround is to assign the results to a new column instead of the same > one, but if the old column is dropped, the same error is produced. > Setting one column as index also didn't work. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org