[ 
https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627662#comment-15627662
 ] 

Xiao Li commented on SPARK-18209:
---------------------------------

Yeah, I just posted the example to show it. I mentioned [~vssrinath] in the 
comment. : )

Regarding the temp view, unfortunately, we already allow users to reference 
temp views in perm view creation. : (

{code}
      sql("CREATE TEMPORARY VIEW temp_jt AS SELECT * FROM jt WHERE id > 0")
      sql("CREATE VIEW jtv1 AS SELECT * FROM temp_jt WHERE id > 3")
      sql("CREATE VIEW jtv2 AS SELECT * FROM jtv1 WHERE id < 6")
      sql("DESC FORMATTED jtv1").show(50, false)
      sql("DESC FORMATTED jtv2").show(50, false)
{code}

{code}
|View Expanded Text:         |SELECT `gen_attr_0` AS `id`, `gen_attr_1` AS 
`id1` FROM (SELECT `gen_attr_0`, `gen_attr_1` FROM (SELECT `gen_attr_2` AS 
`gen_attr_0`, `gen_attr_3` AS `gen_attr_1` FROM (SELECT `gen_attr_2`, 
`gen_attr_3` FROM (SELECT `gen_attr_2`, `gen_attr_3` FROM (SELECT `gen_attr_4` 
AS `gen_attr_2`, `gen_attr_5` AS `gen_attr_3` FROM (SELECT `id` AS 
`gen_attr_4`, `id1` AS `gen_attr_5` FROM `default`.`jt`) AS gen_subquery_0) AS 
gen_subquery_0 WHERE (`gen_attr_2` > CAST(0 AS BIGINT))) AS temp_jt WHERE 
(`gen_attr_2` > CAST(3 AS BIGINT))) AS temp_jt) AS jtv1 WHERE (`gen_attr_0` < 
CAST(6 AS BIGINT))) AS jtv1|       |
{code}

> More robust view canonicalization without full SQL expansion
> ------------------------------------------------------------
>
>                 Key: SPARK-18209
>                 URL: https://issues.apache.org/jira/browse/SPARK-18209
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Reynold Xin
>            Priority: Critical
>
> Spark SQL currently stores views by analyzing the provided SQL and then 
> generating fully expanded SQL out of the analyzed logical plan. This is 
> actually a very error prone way of doing it, because:
> 1. It is non-trivial to guarantee that the generated SQL is correct without 
> being extremely verbose, given the current set of operators.
> 2. We need extensive testing for all combination of operators.
> 3. Whenever we introduce a new logical plan operator, we need to be super 
> careful because it might break SQL generation. This is the main reason 
> broadcast join hint has taken forever to be merged because it is very 
> difficult to guarantee correctness.
> Given the two primary reasons to do view canonicalization is to provide the 
> context for the database as well as star expansion, I think we can this 
> through a simpler approach, by taking the user given SQL, analyze it, and 
> just wrap the original SQL with a SELECT clause at the outer and store the 
> database as a hint.
> For example, given the following view creation SQL:
> {code}
> USE DATABASE my_db;
> CREATE TABLE my_table (id int, name string);
> CREATE VIEW my_view AS SELECT * FROM my_table WHERE id > 10;
> {code}
> We store the following SQL instead:
> {code}
> SELECT /*+ current_db: `my_db` */ id, name FROM (SELECT * FROM my_table WHERE 
> id > 10);
> {code}
> During parsing time, we expand the view along using the provided database 
> context.
> (We don't need to follow exactly the same hint, as I'm merely illustrating 
> the high level approach here.)
> Note that there is a chance that the underlying base table(s)' schema change 
> and the stored schema of the view might differ from the actual SQL schema. In 
> that case, I think we should throw an exception at runtime to warn users. 
> This exception can be controlled by a flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to