[ https://issues.apache.org/jira/browse/SPARK-12989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117220#comment-15117220 ]
Denton Cockburn edited comment on SPARK-12989 at 1/26/16 2:03 PM: ------------------------------------------------------------------ It should be noted that it works if given: {code} data.select($"Data.*", max("num").over(winSpec) as "max") {code} was (Author: kanielc): It should be noted that it works if given: {code} data.select($"Data.*", max("num").over(winSpec) as "max").explain(true) {code} > Bad interaction between StarExpansion and ExtractWindowExpressions > ------------------------------------------------------------------ > > Key: SPARK-12989 > URL: https://issues.apache.org/jira/browse/SPARK-12989 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.0 > Reporter: Michael Armbrust > > Reported initially here: > http://stackoverflow.com/questions/34995376/apache-spark-window-function-with-nested-column > {code} > import sqlContext.implicits._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.expressions.Window > sql("SET spark.sql.eagerAnalysis=false") // Let us see the error even though > we are constructing an invalid tree > val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C", > "num") > .withColumn("Data", struct("A", "B", "C")) > .drop("A") > .drop("B") > .drop("C") > val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc) > data.select($"*", max("num").over(winSpec) as "max").explain(true) > {code} > When you run this, the analyzer inserts invalid columns into a projection, as > seen below: > {code} > == Parsed Logical Plan == > 'Project [*,'max('num) windowspecdefinition('Data.A,'Data.B,'num > DESC,UnspecifiedFrame) AS max#64928] > +- Project [num#64926,Data#64927] > +- Project [C#64925,num#64926,Data#64927] > +- Project [B#64924,C#64925,num#64926,Data#64927] > +- Project > [A#64923,B#64924,C#64925,num#64926,struct(A#64923,B#64924,C#64925) AS > Data#64927] > +- Project [_1#64919 AS A#64923,_2#64920 AS B#64924,_3#64921 AS > C#64925,_4#64922 AS num#64926] > +- LocalRelation [_1#64919,_2#64920,_3#64921,_4#64922], > [[a,b,c,3],[c,b,a,3]] > == Analyzed Logical Plan == > num: int, Data: struct<A:string,B:string,C:string>, max: int > Project [num#64926,Data#64927,max#64928] > +- Project [num#64926,Data#64927,A#64932,B#64933,max#64928,max#64928] > +- Window [num#64926,Data#64927,A#64932,B#64933], > [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax(num#64926) > windowspecdefinition(A#64932,B#64933,num#64926 DESC,RANGE BETWEEN UNBOUNDED > PRECEDING AND CURRENT ROW) AS max#64928], [A#64932,B#64933], [num#64926 DESC] > +- !Project [num#64926,Data#64927,A#64932,B#64933] > +- Project [num#64926,Data#64927] > +- Project [C#64925,num#64926,Data#64927] > +- Project [B#64924,C#64925,num#64926,Data#64927] > +- Project > [A#64923,B#64924,C#64925,num#64926,struct(A#64923,B#64924,C#64925) AS > Data#64927] > +- Project [_1#64919 AS A#64923,_2#64920 AS > B#64924,_3#64921 AS C#64925,_4#64922 AS num#64926] > +- LocalRelation > [_1#64919,_2#64920,_3#64921,_4#64922], [[a,b,c,3],[c,b,a,3]] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org