Michael Armbrust created SPARK-12989:
----------------------------------------

             Summary: Bad interaction between StarExpansion and 
ExtractWindowExpressions
                 Key: SPARK-12989
                 URL: https://issues.apache.org/jira/browse/SPARK-12989
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.0
            Reporter: Michael Armbrust


Reported initially here: 
http://stackoverflow.com/questions/34995376/apache-spark-window-function-with-nested-column

{code}
import sqlContext.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window

sql("SET spark.sql.eagerAnalysis=false") // Let us see the error even though we 
are constructing an invalid tree

val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C", 
"num")
  .withColumn("Data", struct("A", "B", "C"))
  .drop("A")
  .drop("B")
  .drop("C")

val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc)
data.select($"*", max("num").over(winSpec) as "max").explain(true)
{code}

When you run this, the analyzer inserts invalid columns into a projection, as 
seen below:
{code}
== Parsed Logical Plan ==
'Project [*,'max('num) windowspecdefinition('Data.A,'Data.B,'num 
DESC,UnspecifiedFrame) AS max#64928]
+- Project [num#64926,Data#64927]
   +- Project [C#64925,num#64926,Data#64927]
      +- Project [B#64924,C#64925,num#64926,Data#64927]
         +- Project 
[A#64923,B#64924,C#64925,num#64926,struct(A#64923,B#64924,C#64925) AS 
Data#64927]
            +- Project [_1#64919 AS A#64923,_2#64920 AS B#64924,_3#64921 AS 
C#64925,_4#64922 AS num#64926]
               +- LocalRelation [_1#64919,_2#64920,_3#64921,_4#64922], 
[[a,b,c,3],[c,b,a,3]]

== Analyzed Logical Plan ==
num: int, Data: struct<A:string,B:string,C:string>, max: int
Project [num#64926,Data#64927,max#64928]
+- Project [num#64926,Data#64927,A#64932,B#64933,max#64928,max#64928]
   +- Window [num#64926,Data#64927,A#64932,B#64933], 
[HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax(num#64926)
 windowspecdefinition(A#64932,B#64933,num#64926 DESC,RANGE BETWEEN UNBOUNDED 
PRECEDING AND CURRENT ROW) AS max#64928], [A#64932,B#64933], [num#64926 DESC]
      +- !Project [num#64926,Data#64927,A#64932,B#64933]
         +- Project [num#64926,Data#64927]
            +- Project [C#64925,num#64926,Data#64927]
               +- Project [B#64924,C#64925,num#64926,Data#64927]
                  +- Project 
[A#64923,B#64924,C#64925,num#64926,struct(A#64923,B#64924,C#64925) AS 
Data#64927]
                     +- Project [_1#64919 AS A#64923,_2#64920 AS 
B#64924,_3#64921 AS C#64925,_4#64922 AS num#64926]
                        +- LocalRelation [_1#64919,_2#64920,_3#64921,_4#64922], 
[[a,b,c,3],[c,b,a,3]]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to