[ https://issues.apache.org/jira/browse/SPARK-12556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aravind B resolved SPARK-12556. -------------------------------- Resolution: Duplicate > Pyspark dataframe unionAll call accepts incorrect input > ------------------------------------------------------- > > Key: SPARK-12556 > URL: https://issues.apache.org/jira/browse/SPARK-12556 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.4.1 > Reporter: Aravind B > > I actually encountered this problem with two dataframes that have 8 and 10 > columns each. The below is a made up example that reproduces what I observed > going wrong. > Consider the two dataframes: > df1: > +-------+----------+ > |id | count| > +-------+----------+ > +-------+----------+ > df2: > +-------+---------+----------+ > |id |new_count| count| > +-------+---------+----------+ > | 1| 4| 6| > | 1| 5| 6| > | 3| 6| 6| > | 2| 7| 6| > +-------+---------+----------+ > The call: > df3 = df1.unionAll(df2) > returns successfully with df3 containing 2 cloumns. However, some columns now > have swapped values (with other columns). Based on my previous experience I > would say that df3's count column will actually be the new_count column. > I believe that this call should never complete successfully in the first > place and should throw an exception instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org