Github user seancxmao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229155030
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -53,7 +53,20 @@ Spark SQL supports the vast majority of Hive features, 
such as:
     * View
       * If column aliases are not specified in view definition queries, both 
Spark and Hive will
         generate alias names, but in different ways. In order for Spark to be 
able to read views created
    -    by Hive, users should explicitly specify column aliases in view 
definition queries.
    +    by Hive, users should explicitly specify column aliases in view 
definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    --- End diff --
    
    It seems Hive 1.x does not allow `(` following `CREATE VIEW ... AS`, while 
Hive 2.x just works well. The following works on Hive 1.2.1, 1.2.2 and 2.3.3.
    
    ```
    CREATE VIEW v1 AS SELECT c1 + 1, upper(c2) FROM t1;
    ```
    
    Another finding is that the above view is readable by Spark though view 
column names are weird (`_c0`, `_c1`). Because Spark will add a `Project` 
between `View` and view definition query if their output attributes do not 
match. 
    
    ```
    spark-sql> explain extended v1;
    ...
    == Analyzed Logical Plan ==
    _c0: int, _c1: string
    Project [_c0#44, _c1#45]
    +- SubqueryAlias v1
       +- View (`default`.`v1`, [_c0#44,_c1#45])
          +- Project [cast((c1 + 1)#48 as int) AS _c0#44, cast(upper(c2)#49 as 
string) AS _c1#45] // this is added by AliasViewChild rule
             +- Project [(c1#46 + 1) AS (c1 + 1)#48, upper(c2#47) AS 
upper(c2)#49]
                +- SubqueryAlias t1
                   +- HiveTableRelation `default`.`t1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#46, c2#47]
    ...
    ```
    
    But, if column aliases in subqueries of the view definition query are 
missing, Spark will not be able to read the view.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to