subject:"\[GitHub\] spark pull request #16233\: \[SPARK\-18801\]\[SQL\] Add `View` operator to help re..."

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-28 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94104622
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,93 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
+// database with that the referenced view has, so we need to use the 
variable `defaultDatabase`
+// to track the current default database.
+// When the relation we resolve is a view, we fetch the 
view.desc(which is a CatalogTable), and
+// then set the value of `CatalogTable.viewDefaultDatabase` to the 
variable `defaultDatabase`,
+// we look up the relations that the view references using the default 
database.
+// For example:
+// |- view1 (defaultDatabase = db1)
+//   |- operator
+// |- table2 (defaultDatabase = db1)
+// |- view2 (defaultDatabase = db2)
+//|- view3 (defaultDatabase = db3)
+//   |- view4 (defaultDatabase = db4)
+// In this case, the view `view1` is a nested view, it directly 
references `table2`ã`view2`
+// and `view4`, the view `view2` references `view3`. On resolving the 
table, we look up the
+// relations `table2`ã`view2`ã`view4` using the default database 
`db1`, and look up the
+// relation `view3` using the default database `db2`.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+val relation = lookupTableFromCatalog(u, defaultDatabase)
+resolveRelation(relation, defaultDatabase)
+  // Hive support is required to resolve a persistent view, the 
logical plan returned by
+  // catalog.lookupRelation() should be:
--- End diff --

oh, we are not parsing view text when we do not have hive support. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-28 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94103444
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,93 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
--- End diff --

oh, nvm, even if we have `database.viewname`, dbname and viewname will be 
inside the table identifier. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-28 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94103404
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,93 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
--- End diff --

btw, will we allow cases like `CREATE VIEW database.viewname`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-28 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94102704
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,93 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
+// database with that the referenced view has, so we need to use the 
variable `defaultDatabase`
+// to track the current default database.
+// When the relation we resolve is a view, we fetch the 
view.desc(which is a CatalogTable), and
+// then set the value of `CatalogTable.viewDefaultDatabase` to the 
variable `defaultDatabase`,
+// we look up the relations that the view references using the default 
database.
+// For example:
+// |- view1 (defaultDatabase = db1)
+//   |- operator
+// |- table2 (defaultDatabase = db1)
+// |- view2 (defaultDatabase = db2)
+//|- view3 (defaultDatabase = db3)
+//   |- view4 (defaultDatabase = db4)
+// In this case, the view `view1` is a nested view, it directly 
references `table2`ã`view2`
+// and `view4`, the view `view2` references `view3`. On resolving the 
table, we look up the
+// relations `table2`ã`view2`ã`view4` using the default database 
`db1`, and look up the
+// relation `view3` using the default database `db2`.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+val relation = lookupTableFromCatalog(u, defaultDatabase)
+resolveRelation(relation, defaultDatabase)
+  // Hive support is required to resolve a persistent view, the 
logical plan returned by
+  // catalog.lookupRelation() should be:
--- End diff --

Sorry. I probably missed something. A persistent view is stored in the 
external catalog. So, we can always have persistent views, right? (we have a 
InMemoryCatalog, which is another external catalog. It is not very useful 
though. But it is still an external catalog)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-28 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94041766
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,121 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
+// database with that the referenced view has, so we need to use the 
variable `defaultDatabase`
+// to track the current default database.
+// When the relation we resolve is a view, we fetch the 
view.desc(which is a CatalogTable), and
+// then set the value of `CatalogTable.viewDefaultDatabase` to the 
variable `defaultDatabase`,
+// we look up the relations that the view references using the default 
database.
+// For example:
+// |- view1 (defaultDatabase = db1)
+//   |- operator
+// |- table2 (defaultDatabase = db1)
+// |- view2 (defaultDatabase = db2)
+//|- view3 (defaultDatabase = db3)
+//   |- view4 (defaultDatabase = db4)
+// In this case, the view `view1` is a nested view, it directly 
references `table2`ã`view2`
+// and `view4`, the view `view2` references `view3`. On resolving the 
table, we look up the
+// relations `table2`ã`view2`ã`view4` using the default database 
`db1`, and look up the
+// relation `view3` using the default database `db2`.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, defaultDatabase))
+}
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// Look up the table with the given name from catalog. The database we 
look up the table from
+// is decided follow the steps:
+// 1. If the database part is defined in the table identifier, use 
that database name;
+// 2. Else If the defaultDatabase is defined, use the default database 
name;
+// 3. Else use the currentDb of the SessionCatalog.
+private def lookupTableFromCatalog(
+u: UnresolvedRelation,
+defaultDatabase: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, defaultDatabase)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
-(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-27 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93996591
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,121 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
+// database with that the referenced view has, so we need to use the 
variable `defaultDatabase`
+// to track the current default database.
+// When the relation we resolve is a view, we fetch the 
view.desc(which is a CatalogTable), and
+// then set the value of `CatalogTable.viewDefaultDatabase` to the 
variable `defaultDatabase`,
+// we look up the relations that the view references using the default 
database.
+// For example:
+// |- view1 (defaultDatabase = db1)
+//   |- operator
+// |- table2 (defaultDatabase = db1)
+// |- view2 (defaultDatabase = db2)
+//|- view3 (defaultDatabase = db3)
+//   |- view4 (defaultDatabase = db4)
+// In this case, the view `view1` is a nested view, it directly 
references `table2`ã`view2`
+// and `view4`, the view `view2` references `view3`. On resolving the 
table, we look up the
+// relations `table2`ã`view2`ã`view4` using the default database 
`db1`, and look up the
+// relation `view3` using the default database `db2`.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, defaultDatabase))
+}
--- End diff --

I will try to combine the both functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-27 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93996552
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -543,4 +545,34 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
   }
 }
   }
+
+  test("correctly resolve a nested view") {
+withTempDatabase { db =>
+  withView(s"$db.view1", s"$db.view2") {
+val view1 = CatalogTable(
+  identifier = TableIdentifier("view1", Some(db)),
+  tableType = CatalogTableType.VIEW,
+  storage = CatalogStorageFormat.empty,
+  schema = new StructType().add("id", "int").add("id1", "int"),
+  provider = Some("parquet"),
--- End diff --

I have had some mis-understanding about a view that references a datasource 
table, will change that ASAF! Thank you and sorry for that trouble!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93996247
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,121 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
+// database with that the referenced view has, so we need to use the 
variable `defaultDatabase`
+// to track the current default database.
+// When the relation we resolve is a view, we fetch the 
view.desc(which is a CatalogTable), and
+// then set the value of `CatalogTable.viewDefaultDatabase` to the 
variable `defaultDatabase`,
+// we look up the relations that the view references using the default 
database.
+// For example:
+// |- view1 (defaultDatabase = db1)
+//   |- operator
+// |- table2 (defaultDatabase = db1)
+// |- view2 (defaultDatabase = db2)
+//|- view3 (defaultDatabase = db3)
+//   |- view4 (defaultDatabase = db4)
+// In this case, the view `view1` is a nested view, it directly 
references `table2`ã`view2`
+// and `view4`, the view `view2` references `view3`. On resolving the 
table, we look up the
+// relations `table2`ã`view2`ã`view4` using the default database 
`db1`, and look up the
+// relation `view3` using the default database `db2`.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, defaultDatabase))
+}
--- End diff --

The name of `resolveView` is confusing. We are also resolving regular 
tables. 

What is the reason why we are unable to use the `resolveRelation` to handle 
all the cases? (That means, can we combine `resolveView` and `resolveRelation`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93995925
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,121 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
+// database with that the referenced view has, so we need to use the 
variable `defaultDatabase`
+// to track the current default database.
+// When the relation we resolve is a view, we fetch the 
view.desc(which is a CatalogTable), and
+// then set the value of `CatalogTable.viewDefaultDatabase` to the 
variable `defaultDatabase`,
+// we look up the relations that the view references using the default 
database.
+// For example:
+// |- view1 (defaultDatabase = db1)
+//   |- operator
+// |- table2 (defaultDatabase = db1)
+// |- view2 (defaultDatabase = db2)
+//|- view3 (defaultDatabase = db3)
+//   |- view4 (defaultDatabase = db4)
+// In this case, the view `view1` is a nested view, it directly 
references `table2`ã`view2`
+// and `view4`, the view `view2` references `view3`. On resolving the 
table, we look up the
+// relations `table2`ã`view2`ã`view4` using the default database 
`db1`, and look up the
+// relation `view3` using the default database `db2`.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, defaultDatabase))
+}
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// Look up the table with the given name from catalog. The database we 
look up the table from
+// is decided follow the steps:
+// 1. If the database part is defined in the table identifier, use 
that database name;
+// 2. Else If the defaultDatabase is defined, use the default database 
name;
+// 3. Else use the currentDb of the SessionCatalog.
+private def lookupTableFromCatalog(
+u: UnresolvedRelation,
+defaultDatabase: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, defaultDatabase)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
-(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93995882
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -543,4 +545,34 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
   }
 }
   }
+
+  test("correctly resolve a nested view") {
+withTempDatabase { db =>
+  withView(s"$db.view1", s"$db.view2") {
+val view1 = CatalogTable(
+  identifier = TableIdentifier("view1", Some(db)),
+  tableType = CatalogTableType.VIEW,
+  storage = CatalogStorageFormat.empty,
+  schema = new StructType().add("id", "int").add("id1", "int"),
+  provider = Some("parquet"),
--- End diff --

When a relation is `View`, the provider is empty. I do not know understand 
why we need to provide a provider in this case? What does `parquet` means for a 
view?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93995592
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -377,6 +378,36 @@ case class InsertIntoTable(
   override lazy val resolved: Boolean = childrenResolved && table.resolved
 }
 
+/** Factory for constructing new `View` nodes. */
+object View {
+  def apply(desc: CatalogTable): View = View(desc, 
desc.schema.toAttributes, None)
+}
+
+/**
+ * A container for holding the view description(CatalogTable), and the 
output of the view. The
+ * child will be defined if the view is defined in a Hive metastore or the 
view is resolved,
+ * else it should be None.
+ * This operator will be removed at the end of analysis stage.
+ *
+ * @param desc A view description(CatalogTable) that provides necessary 
information to resolve the
+ * view.
+ * @param output The output of a view operator, this is generated during 
planning the view, so that
+ *   we are able to decouple the output from the underlying 
structure.
+ * @param child The logical plan of a view operator, it should be 
non-empty if the view is defined
+ *  in a Hive metastore or the view is resolved, else it 
should be None.
+ */
+case class View(
+desc: CatalogTable,
+output: Seq[Attribute],
+child: Option[LogicalPlan] = None) extends LogicalPlan with 
MultiInstanceRelation {
+
+  override lazy val resolved: Boolean = child.exists(_.resolved)
+
+  override def children: Seq[LogicalPlan] = child.toSeq
+
+  override def newInstance(): LogicalPlan = copy(output = 
output.map(_.newInstance()))
+}
--- End diff --

Sure. name + output attributes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-27 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93994361
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -377,6 +378,36 @@ case class InsertIntoTable(
   override lazy val resolved: Boolean = childrenResolved && table.resolved
 }
 
+/** Factory for constructing new `View` nodes. */
+object View {
+  def apply(desc: CatalogTable): View = View(desc, 
desc.schema.toAttributes, None)
+}
+
+/**
+ * A container for holding the view description(CatalogTable), and the 
output of the view. The
+ * child will be defined if the view is defined in a Hive metastore or the 
view is resolved,
+ * else it should be None.
+ * This operator will be removed at the end of analysis stage.
+ *
+ * @param desc A view description(CatalogTable) that provides necessary 
information to resolve the
+ * view.
+ * @param output The output of a view operator, this is generated during 
planning the view, so that
+ *   we are able to decouple the output from the underlying 
structure.
+ * @param child The logical plan of a view operator, it should be 
non-empty if the view is defined
+ *  in a Hive metastore or the view is resolved, else it 
should be None.
+ */
+case class View(
+desc: CatalogTable,
+output: Seq[Attribute],
+child: Option[LogicalPlan] = None) extends LogicalPlan with 
MultiInstanceRelation {
+
+  override lazy val resolved: Boolean = child.exists(_.resolved)
+
+  override def children: Seq[LogicalPlan] = child.toSeq
+
+  override def newInstance(): LogicalPlan = copy(output = 
output.map(_.newInstance()))
+}
--- End diff --

How about we output the `CatalogTable.qualifiedName`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93993539
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -377,6 +378,36 @@ case class InsertIntoTable(
   override lazy val resolved: Boolean = childrenResolved && table.resolved
 }
 
+/** Factory for constructing new `View` nodes. */
+object View {
+  def apply(desc: CatalogTable): View = View(desc, 
desc.schema.toAttributes, None)
+}
+
+/**
+ * A container for holding the view description(CatalogTable), and the 
output of the view. The
+ * child will be defined if the view is defined in a Hive metastore or the 
view is resolved,
+ * else it should be None.
+ * This operator will be removed at the end of analysis stage.
+ *
+ * @param desc A view description(CatalogTable) that provides necessary 
information to resolve the
+ * view.
+ * @param output The output of a view operator, this is generated during 
planning the view, so that
+ *   we are able to decouple the output from the underlying 
structure.
+ * @param child The logical plan of a view operator, it should be 
non-empty if the view is defined
+ *  in a Hive metastore or the view is resolved, else it 
should be None.
+ */
+case class View(
+desc: CatalogTable,
+output: Seq[Attribute],
+child: Option[LogicalPlan] = None) extends LogicalPlan with 
MultiInstanceRelation {
+
+  override lazy val resolved: Boolean = child.exists(_.resolved)
+
+  override def children: Seq[LogicalPlan] = child.toSeq
+
+  override def newInstance(): LogicalPlan = copy(output = 
output.map(_.newInstance()))
+}
--- End diff --

We also need to add `override def simpleString`, which is used for `ONE 
line description of this node`. Currently, we output the whole `CatalogTable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93898801
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
 ---
@@ -860,6 +864,24 @@ abstract class CatalogTestUtils {
   bucketSpec = Some(BucketSpec(4, Seq("col1"), Nil)))
   }
 
+  def newView(name: String, database: Option[String] = None): CatalogTable 
= {
+CatalogTable(
+  identifier = TableIdentifier(name, database),
+  tableType = CatalogTableType.VIEW,
+  storage = CatalogStorageFormat.empty,
+  schema = new StructType()
+.add("col1", "int")
+.add("col2", "string")
+.add("a", "int")
+.add("b", "string"),
+  provider = Some("hive"),
--- End diff --

Do we need a case that the provider is not set?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93898653
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -551,17 +551,26 @@ class SessionCatalog(
*
* If a database is specified in `name`, this will return the table/view 
from that database.
* If no database is specified, this will first attempt to return a 
temporary table/view with
-   * the same name, then, if that does not exist, return the table/view 
from the current database.
+   * the same name, then, if that does not exist, and defaultDatabase is 
defined, return the
+   * table/view from the defaultDatabase, else return the table/view from 
the catalog.currentDb.
*
* Note that, the global temp view database is also valid here, this 
will return the global temp
* view matching the given name.
*
-   * If the relation is a view, the relation will be wrapped in a 
[[SubqueryAlias]] which will
-   * track the name of the view.
+   * If the relation is a view, we generate a [[View]] operator from the 
view description, and
+   * wrap the logical plan in a [[SubqueryAlias]] which will track the 
name of the view.
+   *
+   * @param name The name of the table/view that we lookup.
+   * @param alias The alias name of the table/view that we lookup.
+   * @param defaultDatabase The database name we should use to lookup the 
table/view, if the
+   *database part of [[TableIdentifier]] is not 
defined.
--- End diff --

Let's also explain the precedence (db name in the table identifier -> 
default db -> current db). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93898562
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -658,6 +720,17 @@ class Analyzer(
   Generate(newG.asInstanceOf[Generator], join, outer, qualifier, 
output, child)
 }
 
+  // A special case for View, replace the output attributes with the 
attributes that have the
+  // same names from the child. If the corresponding attribute is not 
found, throw an
+  // AnalysisException.
+  // TODO: Also check the dataTypes and nullabilites of the output.
--- End diff --

Can we also explain why we need to replace output attributes in the code 
comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93898349
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,94 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, defaultDatabase))
+}
+
+// Look up the table with the given name from catalog. If 
`defaultDatabase` is set, we look up
+// the table in the database `defaultDatabase`, else we follow the 
default way.
+private def lookupTableFromCatalog(
+u: UnresolvedRelation,
+defaultDatabase: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, defaultDatabase)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
-(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-  // Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
-  // an exception from tableExists if the database does not exist.
-  u
-} else {
-  lookupTableFromCatalog(u)
+// If the database part is specified, and we support running SQL 
directly on files, and
+// it's not a temporary view, and the table does not exist, then let's 
just return the
+// original UnresolvedRelation. It is possible we are matching a query 
like "select *
+// from parquet.`/path/to/query`". The plan will get resolved later.
+// Note that we are testing (!db_exists || !table_exists) because the 
catalog throws
+// an exception from tableExists if the database does not exist.
+private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean 
= {
+  table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))
+}
+
+// Change the default database name if the plan is a view, and 
transformDown with the new
+// database name to resolve all UnresolvedRelations and Views.
+// If the view is defined in a DataSource other than Hive, and the 
view's child is empty,
+// set the view's child to a SimpleCatalogRelation, else throw an 
AnalysisException.
+def resolveView(plan: LogicalPlan): LogicalPlan = plan match {
+  case view: View =>
+val desc = view.desc
+val defaultDatabase =

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93898300
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,94 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, defaultDatabase))
+}
+
+// Look up the table with the given name from catalog. If 
`defaultDatabase` is set, we look up
+// the table in the database `defaultDatabase`, else we follow the 
default way.
+private def lookupTableFromCatalog(
+u: UnresolvedRelation,
+defaultDatabase: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, defaultDatabase)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
-(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-  // Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
-  // an exception from tableExists if the database does not exist.
-  u
-} else {
-  lookupTableFromCatalog(u)
+// If the database part is specified, and we support running SQL 
directly on files, and
+// it's not a temporary view, and the table does not exist, then let's 
just return the
+// original UnresolvedRelation. It is possible we are matching a query 
like "select *
+// from parquet.`/path/to/query`". The plan will get resolved later.
+// Note that we are testing (!db_exists || !table_exists) because the 
catalog throws
+// an exception from tableExists if the database does not exist.
+private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean 
= {
+  table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))
+}
+
+// Change the default database name if the plan is a view, and 
transformDown with the new
+// database name to resolve all UnresolvedRelations and Views.
+// If the view is defined in a DataSource other than Hive, and the 
view's child is empty,
+// set the view's child to a SimpleCatalogRelation, else throw an 
AnalysisException.
+def resolveView(plan: LogicalPlan): LogicalPlan = plan match {
+  case view: View =>
+val desc = view.desc
+val defaultDatabase =

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93898210
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,94 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, defaultDatabase))
+}
+
+// Look up the table with the given name from catalog. If 
`defaultDatabase` is set, we look up
+// the table in the database `defaultDatabase`, else we follow the 
default way.
+private def lookupTableFromCatalog(
+u: UnresolvedRelation,
+defaultDatabase: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, defaultDatabase)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
-(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-  // Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
-  // an exception from tableExists if the database does not exist.
-  u
-} else {
-  lookupTableFromCatalog(u)
+// If the database part is specified, and we support running SQL 
directly on files, and
+// it's not a temporary view, and the table does not exist, then let's 
just return the
+// original UnresolvedRelation. It is possible we are matching a query 
like "select *
+// from parquet.`/path/to/query`". The plan will get resolved later.
+// Note that we are testing (!db_exists || !table_exists) because the 
catalog throws
+// an exception from tableExists if the database does not exist.
+private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean 
= {
+  table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))
+}
+
+// Change the default database name if the plan is a view, and 
transformDown with the new
+// database name to resolve all UnresolvedRelations and Views.
+// If the view is defined in a DataSource other than Hive, and the 
view's child is empty,
+// set the view's child to a SimpleCatalogRelation, else throw an 
AnalysisException.
--- End diff --

```
// If the view is defined in a DataSource other than Hive, and the view's 
child is empty,
// set the view's child to a SimpleCatalogRelation, else

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93898130
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,94 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, defaultDatabase))
+}
+
+// Look up the table with the given name from catalog. If 
`defaultDatabase` is set, we look up
+// the table in the database `defaultDatabase`, else we follow the 
default way.
--- End diff --

Let's be specific on what "else we follow the default way" means at here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93898018
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,94 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
--- End diff --

Let's explain defaultDatabase more, like when we need it, when it is set, 
and how it is different from the current database in the session catalog.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93662142
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,62 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we 
lookup the table from catalog
+// and change the current database name if it is a view.
+def resolveRelation(
+plan: LogicalPlan,
+currentDatabase: Option[String] = None): LogicalPlan = plan match {
--- End diff --

@hvanhovell When views or tables are not resolved, our Analyzer treats them 
as unresolved relations. Conceptually, the views' `currentDatabase` should be 
part of `UnresolvedRelation`. Thus, maybe we can just move it into 
`UnresolvedRelation`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-22 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93646337
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,62 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we 
lookup the table from catalog
+// and change the current database name if it is a view.
+def resolveRelation(
+plan: LogicalPlan,
+currentDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, currentDatabase))
+}
+
+// Lookup the table with the given name from catalog. If 
`currentDatabase` is set, we lookup
+// the table in the database `currentDatabase`, else we follow the 
default way.
+private def lookupTableFromCatalog(
--- End diff --

The function is used both in function `apply` and `resolveRelation`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-22 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93635804
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,62 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we 
lookup the table from catalog
+// and change the current database name if it is a view.
+def resolveRelation(
+plan: LogicalPlan,
+currentDatabase: Option[String] = None): LogicalPlan = plan match {
--- End diff --

@gatorsmile what would you suggest?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93551189
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -549,17 +549,26 @@ class SessionCatalog(
*
* If a database is specified in `name`, this will return the table/view 
from that database.
* If no database is specified, this will first attempt to return a 
temporary table/view with
-   * the same name, then, if that does not exist, return the table/view 
from the current database.
+   * the same name, then, if that does not exist, and currentDatabase is 
defined, return the
+   * table/view from the currentDatabase, else return the table/view from 
the catalog.currentDb.
*
* Note that, the global temp view database is also valid here, this 
will return the global temp
* view matching the given name.
*
-   * If the relation is a view, the relation will be wrapped in a 
[[SubqueryAlias]] which will
-   * track the name of the view.
+   * If the relation is a view, we add a [[View]] operator over the 
relation, and wrap the logical
+   * plan in a [[SubqueryAlias]] which will track the name of the view.
+   *
+   * @param name The name of the table/view that we lookup.
+   * @param alias The alias name of the table/view that we lookup.
+   * @param currentDatabase The database name we should use to lookup the 
table/view, if the
+   *database part of [[TableIdentifier]] is not 
defined.
*/
-  def lookupRelation(name: TableIdentifier, alias: Option[String] = None): 
LogicalPlan = {
+  def lookupRelation(
+  name: TableIdentifier,
+  alias: Option[String] = None,
+  currentDatabase: Option[String] = None): LogicalPlan = {
 synchronized {
-  val db = formatDatabaseName(name.database.getOrElse(currentDb))
+  val db = 
formatDatabaseName(name.database.getOrElse(currentDatabase.getOrElse(currentDb)))
--- End diff --

I feel that currentDatabase and currentDb will confuse readers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93551078
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,62 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we 
lookup the table from catalog
+// and change the current database name if it is a view.
+def resolveRelation(
+plan: LogicalPlan,
+currentDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, currentDatabase))
+}
+
+// Lookup the table with the given name from catalog. If 
`currentDatabase` is set, we lookup
+// the table in the database `currentDatabase`, else we follow the 
default way.
+private def lookupTableFromCatalog(
+u: UnresolvedRelation,
+currentDatabase: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, currentDatabase)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
-(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-  // Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
-  // an exception from tableExists if the database does not exist.
-  u
-} else {
-  lookupTableFromCatalog(u)
+// If the database part is specified, and we support running SQL 
directly on files, and
+// it's not a temporary view, and the table does not exist, then let's 
just return the
+// original UnresolvedRelation. It is possible we are matching a query 
like "select *
+// from parquet.`/path/to/query`". The plan will get resolved later.
+// Note that we are testing (!db_exists || !table_exists) because the 
catalog throws
+// an exception from tableExists if the database does not exist.
+private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean 
= {
+  table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))
+}
+
+// Change the current database name if the plan is a view, and 
transformDown with the new
+// database name to resolve all UnresolvedRelation.
+def resolveView(plan: LogicalPlan): LogicalPlan = plan match {
+  case p @ SubqueryAlias(_, view: View, _) =>
+val currentDatabase = view.currentDatabase
+val newChild = view transform {
+  case v: View if !v.resolved =>
+resolveView(v)
+  case u: UnresolvedRelation =>
+resolveRelation(u, currentDatabase)
--- End diff --

oh, resolveRelation's second argument is the current db. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93549589
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -396,6 +396,20 @@ case class InsertIntoTable(
 }
 
 /**
+ * A container for holding the current database of a view and a query plan.
+ * This operator will be removed at the begining of the optimize stage so 
we can see what is part
+ * of a view in a analyzed plan.
+ *
+ * @param child The logical plan of this view.
+ * @param currentDatabase The database name we use to resolve the logical 
plan.
+ */
+case class View(child: LogicalPlan, currentDatabase: Option[String]) 
extends LogicalPlan {
--- End diff --

Also, I feel `currentDatabase` may not be a good name. I am wondering if 
`defaultDatabase` is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93549309
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,62 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we 
lookup the table from catalog
+// and change the current database name if it is a view.
+def resolveRelation(
+plan: LogicalPlan,
+currentDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, currentDatabase))
+}
+
+// Lookup the table with the given name from catalog. If 
`currentDatabase` is set, we lookup
+// the table in the database `currentDatabase`, else we follow the 
default way.
+private def lookupTableFromCatalog(
--- End diff --

If so, should we just inline it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93549165
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,62 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we 
lookup the table from catalog
+// and change the current database name if it is a view.
+def resolveRelation(
+plan: LogicalPlan,
+currentDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, currentDatabase))
+}
+
+// Lookup the table with the given name from catalog. If 
`currentDatabase` is set, we lookup
+// the table in the database `currentDatabase`, else we follow the 
default way.
+private def lookupTableFromCatalog(
+u: UnresolvedRelation,
+currentDatabase: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, currentDatabase)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
-(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-  // Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
-  // an exception from tableExists if the database does not exist.
-  u
-} else {
-  lookupTableFromCatalog(u)
+// If the database part is specified, and we support running SQL 
directly on files, and
+// it's not a temporary view, and the table does not exist, then let's 
just return the
+// original UnresolvedRelation. It is possible we are matching a query 
like "select *
+// from parquet.`/path/to/query`". The plan will get resolved later.
+// Note that we are testing (!db_exists || !table_exists) because the 
catalog throws
+// an exception from tableExists if the database does not exist.
+private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean 
= {
+  table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))
+}
+
+// Change the current database name if the plan is a view, and 
transformDown with the new
+// database name to resolve all UnresolvedRelation.
+def resolveView(plan: LogicalPlan): LogicalPlan = plan match {
+  case p @ SubqueryAlias(_, view: View, _) =>
+val currentDatabase = view.currentDatabase
+val newChild = view transform {
+  case v: View if !v.resolved =>
+resolveView(v)
+  case u: UnresolvedRelation =>
+resolveRelation(u, currentDatabase)
--- End diff --

Is this right? `u` can also have db name, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r93548847
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,62 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we 
lookup the table from catalog
+// and change the current database name if it is a view.
+def resolveRelation(
+plan: LogicalPlan,
+currentDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, currentDatabase))
+}
+
+// Lookup the table with the given name from catalog. If 
`currentDatabase` is set, we lookup
+// the table in the database `currentDatabase`, else we follow the 
default way.
+private def lookupTableFromCatalog(
--- End diff --

Is this function only used once?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-16 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r92777277
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,62 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we 
lookup the table from catalog
+// and change the current database name if it is a view.
+def resolveRelation(
+plan: LogicalPlan,
+currentDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, currentDatabase))
+}
+
+// Lookup the table with the given name from catalog. If 
`currentDatabase` is set, we lookup
+// the table in the database `currentDatabase`, else we follow the 
default way.
+private def lookupTableFromCatalog(
+u: UnresolvedRelation,
+currentDatabase: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, currentDatabase)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
-(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-  // Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
-  // an exception from tableExists if the database does not exist.
-  u
-} else {
-  lookupTableFromCatalog(u)
+// If the database part is specified, and we support running SQL 
directly on files, and
+// it's not a temporary view, and the table does not exist, then let's 
just return the
+// original UnresolvedRelation. It is possible we are matching a query 
like "select *
+// from parquet.`/path/to/query`". The plan will get resolved later.
+// Note that we are testing (!db_exists || !table_exists) because the 
catalog throws
+// an exception from tableExists if the database does not exist.
+private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean 
= {
+  table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+(!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))
+}
+
+// Change the current database name if the plan is a view, and 
transformDown with the new
+// database name to resolve all UnresolvedRelation.
+def resolveView(plan: LogicalPlan): LogicalPlan = plan match {
--- End diff --

I do not like this function name. Basically, this function is used to 
resolve both views and tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe,

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-16 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r92776636
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,62 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we 
lookup the table from catalog
+// and change the current database name if it is a view.
+def resolveRelation(
+plan: LogicalPlan,
+currentDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+resolveView(lookupTableFromCatalog(u, currentDatabase))
+}
+
+// Lookup the table with the given name from catalog. If 
`currentDatabase` is set, we lookup
--- End diff --

Nit: lookup is a noun.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-16 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r92776481
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,62 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we 
lookup the table from catalog
+// and change the current database name if it is a view.
+def resolveRelation(
+plan: LogicalPlan,
+currentDatabase: Option[String] = None): LogicalPlan = plan match {
--- End diff --

The interface of this function is not good. If we only pass 
`UnresolvedRelation` and then do not use `LogicalPlan `. `currentDatabase` is 
only used for the view. The name of this variable is not good. Is that possible 
to move it into `UnresolvedRelation`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-09 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r91689436
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -543,4 +545,34 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
   }
 }
   }
+
+  test("create a nested view") {
--- End diff --

It's not easy to test the `resolveRelations` rule directly in 
`AnalysisSuite`, so we test resolve a nested view here instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-09 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r91689112
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -114,10 +121,13 @@ private[hive] class 
HiveMetastoreCatalog(sparkSession: SparkSession) extends Log
   alias.map(a => SubqueryAlias(a, qualifiedTable, 
None)).getOrElse(qualifiedTable)
 } else if (table.tableType == CatalogTableType.VIEW) {
   val viewText = table.viewText.getOrElse(sys.error("Invalid view 
without text."))
-  SubqueryAlias(
-alias.getOrElse(table.identifier.table),
-sparkSession.sessionState.sqlParser.parsePlan(viewText),
-Option(table.identifier))
+  // The relation is a view, so we wrap the relation by:
+  // 1. Add a [[View]] operator over the relation to keep track of the 
database name;
+  // 2. Wrap the logical plan in a [[SubqueryAlias]] which tracks the 
name of the view.
+  val child = View(
--- End diff --

This piece of code is slightly different from that in 
`SessionCatalog.lookupRelation`, so we are not merging them into one common 
function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-09 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r91688872
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -219,6 +219,9 @@ class SQLBuilder private (
 case OneRowRelation =>
   ""
 
+case p: View =>
--- End diff --

We will remove the entire rule after we have implemented the late binding 
method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-09 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r91688428
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2224,6 +2253,17 @@ object EliminateSubqueryAliases extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Removes [[View]] operators from the plan. The operator is only used to 
provide the database
+ * name we should use in resolving a view, and it's respected till the end 
of analysis stage
+ * just because we want to see which part of a analyzed logical plan is 
generated from a view.
+ */
+object EliminateView extends Rule[LogicalPlan] {
--- End diff --

This rule is only used by `Optimizer`, it's placed here because it is in 
fact a analyze rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-09 Thread jiangxb1987

GitHub user jiangxb1987 opened a pull request:

https://github.com/apache/spark/pull/16233

[SPARK-18801][SQL] Add `View` operator to help resolve a view

## What changes were proposed in this pull request?

We should add a new operator called `View` to keep track of the database 
name used on resolving a view. The analysis rule `ResolveRelations` should also 
be updated.
After that change, we should be able to resolve a nested view.

This PR mainly changes:
1. Add a new operator `View` that keep track of the database name used to 
resolve a view;
2. Change the analyze rule `ResolveRelations` to support resolve a nested 
view;
3. Add `currentDatabase` variable to `CatalogTable` to keep track of the 
database name used to resolve a view, if the `CatalogTable` is not a view, then 
the variable should be `None`.

## How was this patch tested?
1. Add new tests in `SessionCatalogSuite` to test the function 
`lookupRelation`;
2. Add new test case in `SQLViewSuite` to test resolve a nested view.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiangxb1987/spark resolve-view

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16233.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16233


commit b67e7ce9f665a1ba3e57093ceb5ceeec2d9bb0d2
Author: jiangxingbo 
Date:   2016-12-09T09:50:00Z

add View operator.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

38 matches

Mail list logo