[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() changing rows ordering

Marcelo Rossini Castro (Jira) Sun, 14 Aug 2022 15:00:31 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marcelo Rossini Castro updated SPARK-40063:
-------------------------------------------
    Description: 
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
def example_func(df_col):
  return df_col ** 2 

df['col_to_apply_function'] = df.apply(lambda row: 
example_func(row['col_to_apply_function']), axis=1) {code}
A workaround is to assign the results to a new column instead of the same one, 
but if the old column is dropped, the same error is produced.

Setting one column as index also didn't work.

  was:
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
def example_func(df_col):
  return df_col ** 2 

df['row_to_apply_function'] = df.apply(lambda row: 
example_func(row['row_to_apply_function']), axis=1) {code}
A workaround is to assign the results to a new column instead of the same one, 
but if the old column is dropped, the same error is produced.

Setting one column as index also didn't work.


> pyspark.pandas .apply() changing rows ordering
> ----------------------------------------------
>
>                 Key: SPARK-40063
>                 URL: https://issues.apache.org/jira/browse/SPARK-40063
>             Project: Spark
>          Issue Type: Bug
>          Components: Pandas API on Spark
>    Affects Versions: 3.3.0
>         Environment: Databricks Runtime 11.1
>            Reporter: Marcelo Rossini Castro
>            Priority: Minor
>              Labels: Pandas, PySpark
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2 
> df['col_to_apply_function'] = df.apply(lambda row: 
> example_func(row['col_to_apply_function']), axis=1) {code}
> A workaround is to assign the results to a new column instead of the same 
> one, but if the old column is dropped, the same error is produced.
> Setting one column as index also didn't work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() changing rows ordering

Reply via email to