[ https://issues.apache.org/jira/browse/SPARK-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hanbo Wang updated SPARK-15418: ------------------------------- Description: I am using AWS EMR + Spark 1.6.1 + Hive 1.0.0 I have this UDAF and have included it in the classpath of spark https://github.com/scribd/hive-udaf-maxrow/blob/master/src/com/scribd/hive/udaf/GenericUDAFMaxRow.java And registered it in spark by sqlContext.sql("CREATE TEMPORARY FUNCTION maxrow AS 'some.cool.package.hive.udf.GenericUDAFMaxRow'") However, when I call it in Spark in the following CREATE VIEW query {{CREATE VIEW VIEW_1 AS SELECT a.A, a.B, maxrow ( a.C, a.D, a.E, a.F, a.G, a.H, a.I ) as m FROM table_1 a JOIN table_2 b ON b.Z = a.D AND b.Y = a.C JOIN dummy_table GROUP BY a.A, a.B It gave me the following error 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.A was overwritten in RowResolver map: _col0: string by _col0: string 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.B was overwritten in RowResolver map: _col1: bigint by _col1: bigint 16/05/18 19:49:14 ERROR Driver: FAILED: SemanticException [Error 10002]: Line 16:32 Invalid column reference 'C' org.apache.hadoop.hive.ql.parse.SemanticException: Line 16:32 Invalid column reference 'C' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10643) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10591) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3656) Running the query without CREATE VIEW is fine. was: I am using AWS EMR + Spark 1.6.1 + Hive 1.0.0 I have this UDAF and have included it in the classpath of spark https://github.com/scribd/hive-udaf-maxrow/blob/master/src/com/scribd/hive/udaf/GenericUDAFMaxRow.java And registered it in spark by sqlContext.sql("CREATE TEMPORARY FUNCTION maxrow AS 'some.cool.package.hive.udf.GenericUDAFMaxRow'") However, when I call it in Spark in the following CREATE VIEW query {{ CREATE VIEW VIEW_1 AS SELECT a.A, a.B, maxrow ( a.C, a.D, a.E, a.F, a.G, a.H, a.I ) as m FROM table_1 a JOIN table_2 b ON b.Z = a.D AND b.Y = a.C JOIN dummy_table GROUP BY a.A, a.B }} It gave me the following error {{ 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.A was overwritten in RowResolver map: _col0: string by _col0: string 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.B was overwritten in RowResolver map: _col1: bigint by _col1: bigint 16/05/18 19:49:14 ERROR Driver: FAILED: SemanticException [Error 10002]: Line 16:32 Invalid column reference 'C' org.apache.hadoop.hive.ql.parse.SemanticException: Line 16:32 Invalid column reference 'C' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10643) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10591) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3656) }} Running the query without CREATE VIEW is fine. > SparkSQL does not support using a UDAF in a CREATE VIEW clause > -------------------------------------------------------------- > > Key: SPARK-15418 > URL: https://issues.apache.org/jira/browse/SPARK-15418 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Reporter: Hanbo Wang > Labels: spark, sparksql > > I am using AWS EMR + Spark 1.6.1 + Hive 1.0.0 > I have this UDAF and have included it in the classpath of spark > https://github.com/scribd/hive-udaf-maxrow/blob/master/src/com/scribd/hive/udaf/GenericUDAFMaxRow.java > And registered it in spark by sqlContext.sql("CREATE TEMPORARY FUNCTION > maxrow AS 'some.cool.package.hive.udf.GenericUDAFMaxRow'") > However, when I call it in Spark in the following CREATE VIEW query > {{CREATE VIEW VIEW_1 AS > SELECT > a.A, > a.B, > maxrow ( a.C, > a.D, > a.E, > a.F, > a.G, > a.H, > a.I > ) as m > FROM > table_1 a > JOIN > table_2 b > ON > b.Z = a.D > AND b.Y = a.C > JOIN dummy_table > GROUP BY > a.A, > a.B > It gave me the following error > 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.A was > overwritten in RowResolver map: _col0: string by _col0: string > 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.B was > overwritten in RowResolver map: _col1: bigint by _col1: bigint > 16/05/18 19:49:14 ERROR Driver: FAILED: SemanticException [Error 10002]: Line > 16:32 Invalid column reference 'C' > org.apache.hadoop.hive.ql.parse.SemanticException: Line 16:32 Invalid column > reference 'C' > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10643) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10591) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3656) > Running the query without CREATE VIEW is fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org