[ https://issues.apache.org/jira/browse/SPARK-18604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Herman van Hovell resolved SPARK-18604. --------------------------------------- Resolution: Fixed Assignee: Herman van Hovell Fix Version/s: 2.1.0 > Collapse Window optimizer rule changes column order > --------------------------------------------------- > > Key: SPARK-18604 > URL: https://issues.apache.org/jira/browse/SPARK-18604 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Herman van Hovell > Assignee: Herman van Hovell > Fix For: 2.1.0 > > > The recently added CollapseWindow optimizer rule changes the column order of > attributes. This actually modifies the schema of the logical plan (which > optimization should not do), and breaks `collect()` in a subtle way (we bind > the row encoder to the output of the logical plan and not the optimized > plan). > For example the following code: > {noformat} > val customers = Seq( > ("Alice", "2016-05-01", 50.00), > ("Alice", "2016-05-03", 45.00), > ("Alice", "2016-05-04", 55.00), > ("Bob", "2016-05-01", 25.00), > ("Bob", "2016-05-04", 29.00), > ("Bob", "2016-05-06", 27.00)). > toDF("name", "date", "amountSpent") > > // Import the window functions. > import org.apache.spark.sql.expressions.Window > import org.apache.spark.sql.functions._ > > // Create a window spec. > val wSpec1 = Window.partitionBy("name").orderBy("date").rowsBetween(-1, 1) > val df2 = customers > .withColumn("total", sum(customers("amountSpent")).over(wSpec1)) > .withColumn("cnt", count(customers("amountSpent")).over(wSpec1)) > {noformat} > ...yields the following weird result: > {noformat} > +-----+----------+-----------+--------+-------------------+ > | name| date|amountSpent| total| cnt| > +-----+----------+-----------+--------+-------------------+ > | Bob|2016-05-01| 25.0|1.0E-323|4632796641680687104| > | Bob|2016-05-04| 29.0|1.5E-323|4635400285215260672| > | Bob|2016-05-06| 27.0|1.0E-323|4633078116657397760| > |Alice|2016-05-01| 50.0|1.0E-323|4636385447633747968| > |Alice|2016-05-03| 45.0|1.5E-323|4639481672377565184| > |Alice|2016-05-04| 55.0|1.0E-323|4636737291354636288| > +-----+----------+-----------+--------+-------------------+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org