[ 
https://issues.apache.org/jira/browse/SPARK-42397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687759#comment-17687759
 ] 

Hyukjin Kwon commented on SPARK-42397:
--------------------------------------

It's probably related the order which Spark doesn't guarantee. Is the actual 
value different?

> Inconsistent data produced by `FlatMapCoGroupsInPandas`
> -------------------------------------------------------
>
>                 Key: SPARK-42397
>                 URL: https://issues.apache.org/jira/browse/SPARK-42397
>             Project: Spark
>          Issue Type: Bug
>          Components: Pandas API on Spark, SQL
>    Affects Versions: 3.3.0, 3.3.1
>            Reporter: Ted Chester Jenks
>            Priority: Minor
>
> We are seeing inconsistent data returned when using 
> `FlatMapCoGroupsInPandas`. In the PySpark example from the comments, when we 
> call `grouped_df.collect()` we get:
>  
> {{[Row(left_colms="Index(['cluster', 'event', 'abc'], dtype='object')", 
> right_colms="Index(['cluster', 'event', 'def'], dtype='object')")] }}
>  
> When we call `grouped_df.show(5, truncate=False)` we get:
>  
> {{[Row(left_colms="Index(['cluster', 'abc'], dtype='object')", 
> right_colms="Index(['cluster', 'event', 'def'], dtype='object')", 
> xyz='1234')] }}
>  
> When we call `grouped_df_1.collect()` we get:
>  
> {{[Row(left_colms="Index(['cluster', 'abc'], dtype='object')", 
> right_colms="Index(['cluster', 'event', 'def'], dtype='object')", 
> xyz='1234')] }}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to