[ 
https://issues.apache.org/jira/browse/SPARK-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanbo Wang updated SPARK-15418:
-------------------------------
    Description: 
I am using AWS EMR + Spark 1.6.1 + Hive 1.0.0

I have this UDAF and have included it in the classpath of spark 
https://github.com/scribd/hive-udaf-maxrow/blob/master/src/com/scribd/hive/udaf/GenericUDAFMaxRow.java

And registered it in spark by sqlContext.sql("CREATE TEMPORARY FUNCTION maxrow 
AS 'some.cool.package.hive.udf.GenericUDAFMaxRow'")

However, when I call it in Spark in the following CREATE VIEW query

{{CREATE VIEW VIEW_1 AS
      SELECT
        a.A,
        a.B,
        maxrow ( a.C,
                 a.D,
                 a.E,
                 a.F,
                 a.G,
                 a.H,
                 a.I
            ) as m
        FROM
            table_1 a
        JOIN
            table_2 b
        ON
                b.Z = a.D
            AND b.Y  = a.C
        JOIN dummy_table
        GROUP BY
            a.A,
            a.B

It gave me the following error

16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.A was 
overwritten in RowResolver map: _col0: string by _col0: string
16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.B was 
overwritten in RowResolver map: _col1: bigint by _col1: bigint
16/05/18 19:49:14 ERROR Driver: FAILED: SemanticException [Error 10002]: Line 
16:32 Invalid column reference 'C'
org.apache.hadoop.hive.ql.parse.SemanticException: Line 16:32 Invalid column 
reference 'C'
                at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10643)
                at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10591)
                at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3656)

Running the query without CREATE VIEW is fine.

  was:

I am using AWS EMR + Spark 1.6.1 + Hive 1.0.0

I have this UDAF and have included it in the classpath of spark 
https://github.com/scribd/hive-udaf-maxrow/blob/master/src/com/scribd/hive/udaf/GenericUDAFMaxRow.java

And registered it in spark by sqlContext.sql("CREATE TEMPORARY FUNCTION maxrow 
AS 'some.cool.package.hive.udf.GenericUDAFMaxRow'")

However, when I call it in Spark in the following CREATE VIEW query

{{
CREATE VIEW VIEW_1 AS
      SELECT
        a.A,
        a.B,
        maxrow ( a.C,
                 a.D,
                 a.E,
                 a.F,
                 a.G,
                 a.H,
                 a.I
            ) as m
        FROM
            table_1 a
        JOIN
            table_2 b
        ON
                b.Z = a.D
            AND b.Y  = a.C
        JOIN dummy_table
        GROUP BY
            a.A,
            a.B
}}

It gave me the following error

{{
16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.A was 
overwritten in RowResolver map: _col0: string by _col0: string
16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.B was 
overwritten in RowResolver map: _col1: bigint by _col1: bigint
16/05/18 19:49:14 ERROR Driver: FAILED: SemanticException [Error 10002]: Line 
16:32 Invalid column reference 'C'
org.apache.hadoop.hive.ql.parse.SemanticException: Line 16:32 Invalid column 
reference 'C'
                at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10643)
                at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10591)
                at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3656)
}}

Running the query without CREATE VIEW is fine.


> SparkSQL does not support using a UDAF in a CREATE VIEW clause
> --------------------------------------------------------------
>
>                 Key: SPARK-15418
>                 URL: https://issues.apache.org/jira/browse/SPARK-15418
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Hanbo Wang
>              Labels: spark, sparksql
>
> I am using AWS EMR + Spark 1.6.1 + Hive 1.0.0
> I have this UDAF and have included it in the classpath of spark 
> https://github.com/scribd/hive-udaf-maxrow/blob/master/src/com/scribd/hive/udaf/GenericUDAFMaxRow.java
> And registered it in spark by sqlContext.sql("CREATE TEMPORARY FUNCTION 
> maxrow AS 'some.cool.package.hive.udf.GenericUDAFMaxRow'")
> However, when I call it in Spark in the following CREATE VIEW query
> {{CREATE VIEW VIEW_1 AS
>       SELECT
>         a.A,
>         a.B,
>         maxrow ( a.C,
>                  a.D,
>                  a.E,
>                  a.F,
>                  a.G,
>                  a.H,
>                  a.I
>             ) as m
>         FROM
>             table_1 a
>         JOIN
>             table_2 b
>         ON
>                 b.Z = a.D
>             AND b.Y  = a.C
>         JOIN dummy_table
>         GROUP BY
>             a.A,
>             a.B
> It gave me the following error
> 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.A was 
> overwritten in RowResolver map: _col0: string by _col0: string
> 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.B was 
> overwritten in RowResolver map: _col1: bigint by _col1: bigint
> 16/05/18 19:49:14 ERROR Driver: FAILED: SemanticException [Error 10002]: Line 
> 16:32 Invalid column reference 'C'
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 16:32 Invalid column 
> reference 'C'
>                 at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10643)
>                 at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10591)
>                 at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3656)
> Running the query without CREATE VIEW is fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to