[ https://issues.apache.org/jira/browse/SPARK-13333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170146#comment-15170146 ]
Joseph K. Bradley commented on SPARK-13333: ------------------------------------------- [~rxin] Is this the same issue as the following? This is in the current master: {code} val e1 = sqlContext.createDataFrame(List( ("a", "b"), ("b", "c"), ("c", "d") )).toDF("src", "dst") val e2 = e1.select(e1("src").as("dst"), e1("dst").as("src")) val e3 = e1.unionAll(e2) e3.show() +---+---+ |src|dst| +---+---+ | a| b| | b| c| | c| d| | a| b| | b| c| | c| d| +---+---+ {code} > DataFrame filter + randn + unionAll has bad interaction > ------------------------------------------------------- > > Key: SPARK-13333 > URL: https://issues.apache.org/jira/browse/SPARK-13333 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.2, 1.6.1, 2.0.0 > Reporter: Joseph K. Bradley > > Buggy workflow > * Create a DataFrame df0 > * Filter df0 > * Add a randn column > * Create a copy of the DataFrame > * unionAll the two DataFrames > This fails, where randn produces the same results on the original DataFrame > and the copy before unionAll but fails to do so after unionAll. Removing the > filter fixes the problem. > The bug can be reproduced on master: > {code} > import org.apache.spark.sql.functions.randn > val df0 = sqlContext.createDataFrame(Seq(0, 1).map(Tuple1(_))).toDF("id") > // Removing the following filter() call makes this give the expected result. > val df1 = df0.filter(col("id") === 0).withColumn("b", randn(12345)) > println("DF1") > df1.show() > val df2 = df1.select("id", "b") > println("DF2") > df2.show() // same as df1.show(), as expected > val df3 = df1.unionAll(df2) > println("DF3") > df3.show() // NOT two copies of df1, which is unexpected > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org