[ 
https://issues.apache.org/jira/browse/SPARK-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-11012:
-------------------------------
    Description: 
In SPARK-10337, we added the first step of supporting view natively, which is 
basically wrapping the original view definition SQL text with an extra 
{{SELECT}} and then store the wrapped SQL text into metastore. This approach 
suffers at least two issues:

# Switching current database may break view queries
# HiveQL doesn't allow CTE as subquery, thus CTE can't be used in view 
definition

To fix these issues, we need to canonicalize the view definition. For example, 
for a SQL string
{code:sql}
SELECT a, b FROM table
{code}
we will save this text to Hive metastore as
{code:sql}
SELECT `table`.`a`, `table`.`b` FROM `currentDB`.`table`
{code}

The core infrastructure of this work is SQL query string generation 
(SPARK-12593).  Namely, converting resolved logical query plans back to 
canonicalized SQL query strings. [PR 
#10541|https://github.com/apache/spark/pull/10541] set up basic infrastructure 
of SQL generation, but more language structures need to be supported.

[PR #10541|https://github.com/apache/spark/pull/10541] added round-trip testing 
infrastructure for SQL generation.  All queries tested by test suites extending 
{{HiveComparisonTest}} are executed in the following order:

# Parsing query string to logical plan
# Converting resolved logical plan back to canonicalized SQL query string
# Executing generated SQL query string
# Comparing query results with golden answers

Note that not all resolved logical query plan can be converted back to SQL 
query string.  Either because it consists of some language structure that has 
not been supported yet, or it doesn't have a SQL representation inherently 
(e.g. query plans built on top of local Scala collections).

If a logical plan is inconvertible, {{HiveComparisonTest}} falls back to its 
original behavior, namely executing the original SQL query string and compare 
the results with golden answers.

SQL generation details are logged and can be found in 
{{sql/hive/target/unit-tests.log}} (log level should be at least DEBUG).

  was:
In SPARK-10337, we added the first step of supporting view natively, which is 
basically wrapping the original view definition SQL text with an extra 
{{SELECT}} and then store the wrapped SQL text into metastore. This approach 
suffers at least two issues:

# Switching current database may break view queries
# HiveQL doesn't allow CTE as subquery, thus CTE can't be used in view 
definition

To fix these issues, we need to canonicalize the view definition. For example, 
for a SQL string
{code:sql}
SELECT a, b FROM table
{code}
we will save this text to Hive metastore as
{code:sql}
SELECT `table`.`a`, `table`.`b` FROM `currentDB`.`table`
{code}

The core infrastructure of this work is SQL query string generation 
(SPARK-12593).  Namely, converting resolved logical query plans back to 
canonicalized SQL query strings. [PR 
#10541|https://github.com/apache/spark/pull/10541] set up basic infrastructure 
of SQL generation, but more language structures need to be supported.


> Canonicalize view definitions
> -----------------------------
>
>                 Key: SPARK-11012
>                 URL: https://issues.apache.org/jira/browse/SPARK-11012
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Yin Huai
>
> In SPARK-10337, we added the first step of supporting view natively, which is 
> basically wrapping the original view definition SQL text with an extra 
> {{SELECT}} and then store the wrapped SQL text into metastore. This approach 
> suffers at least two issues:
> # Switching current database may break view queries
> # HiveQL doesn't allow CTE as subquery, thus CTE can't be used in view 
> definition
> To fix these issues, we need to canonicalize the view definition. For 
> example, for a SQL string
> {code:sql}
> SELECT a, b FROM table
> {code}
> we will save this text to Hive metastore as
> {code:sql}
> SELECT `table`.`a`, `table`.`b` FROM `currentDB`.`table`
> {code}
> The core infrastructure of this work is SQL query string generation 
> (SPARK-12593).  Namely, converting resolved logical query plans back to 
> canonicalized SQL query strings. [PR 
> #10541|https://github.com/apache/spark/pull/10541] set up basic 
> infrastructure of SQL generation, but more language structures need to be 
> supported.
> [PR #10541|https://github.com/apache/spark/pull/10541] added round-trip 
> testing infrastructure for SQL generation.  All queries tested by test suites 
> extending {{HiveComparisonTest}} are executed in the following order:
> # Parsing query string to logical plan
> # Converting resolved logical plan back to canonicalized SQL query string
> # Executing generated SQL query string
> # Comparing query results with golden answers
> Note that not all resolved logical query plan can be converted back to SQL 
> query string.  Either because it consists of some language structure that has 
> not been supported yet, or it doesn't have a SQL representation inherently 
> (e.g. query plans built on top of local Scala collections).
> If a logical plan is inconvertible, {{HiveComparisonTest}} falls back to its 
> original behavior, namely executing the original SQL query string and compare 
> the results with golden answers.
> SQL generation details are logged and can be found in 
> {{sql/hive/target/unit-tests.log}} (log level should be at least DEBUG).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to