[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94104622 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,93 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default +// database with that the referenced view has, so we need to use the variable `defaultDatabase` +// to track the current default database. +// When the relation we resolve is a view, we fetch the view.desc(which is a CatalogTable), and +// then set the value of `CatalogTable.viewDefaultDatabase` to the variable `defaultDatabase`, +// we look up the relations that the view references using the default database. +// For example: +// |- view1 (defaultDatabase = db1) +// |- operator +// |- table2 (defaultDatabase = db1) +// |- view2 (defaultDatabase = db2) +//|- view3 (defaultDatabase = db3) +// |- view4 (defaultDatabase = db4) +// In this case, the view `view1` is a nested view, it directly references `table2`ã`view2` +// and `view4`, the view `view2` references `view3`. On resolving the table, we look up the +// relations `table2`ã`view2`ã`view4` using the default database `db1`, and look up the +// relation `view3` using the default database `db2`. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +val relation = lookupTableFromCatalog(u, defaultDatabase) +resolveRelation(relation, defaultDatabase) + // Hive support is required to resolve a persistent view, the logical plan returned by + // catalog.lookupRelation() should be: --- End diff -- oh, we are not parsing view text when we do not have hive support. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94103444 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,93 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default --- End diff -- oh, nvm, even if we have `database.viewname`, dbname and viewname will be inside the table identifier. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94103404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,93 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default --- End diff -- btw, will we allow cases like `CREATE VIEW database.viewname`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94102704 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,93 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default +// database with that the referenced view has, so we need to use the variable `defaultDatabase` +// to track the current default database. +// When the relation we resolve is a view, we fetch the view.desc(which is a CatalogTable), and +// then set the value of `CatalogTable.viewDefaultDatabase` to the variable `defaultDatabase`, +// we look up the relations that the view references using the default database. +// For example: +// |- view1 (defaultDatabase = db1) +// |- operator +// |- table2 (defaultDatabase = db1) +// |- view2 (defaultDatabase = db2) +//|- view3 (defaultDatabase = db3) +// |- view4 (defaultDatabase = db4) +// In this case, the view `view1` is a nested view, it directly references `table2`ã`view2` +// and `view4`, the view `view2` references `view3`. On resolving the table, we look up the +// relations `table2`ã`view2`ã`view4` using the default database `db1`, and look up the +// relation `view3` using the default database `db2`. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +val relation = lookupTableFromCatalog(u, defaultDatabase) +resolveRelation(relation, defaultDatabase) + // Hive support is required to resolve a persistent view, the logical plan returned by + // catalog.lookupRelation() should be: --- End diff -- Sorry. I probably missed something. A persistent view is stored in the external catalog. So, we can always have persistent views, right? (we have a InMemoryCatalog, which is another external catalog. It is not very useful though. But it is still an external catalog) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94041766 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,121 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default +// database with that the referenced view has, so we need to use the variable `defaultDatabase` +// to track the current default database. +// When the relation we resolve is a view, we fetch the view.desc(which is a CatalogTable), and +// then set the value of `CatalogTable.viewDefaultDatabase` to the variable `defaultDatabase`, +// we look up the relations that the view references using the default database. +// For example: +// |- view1 (defaultDatabase = db1) +// |- operator +// |- table2 (defaultDatabase = db1) +// |- view2 (defaultDatabase = db2) +//|- view3 (defaultDatabase = db3) +// |- view4 (defaultDatabase = db4) +// In this case, the view `view1` is a nested view, it directly references `table2`ã`view2` +// and `view4`, the view `view2` references `view3`. On resolving the table, we look up the +// relations `table2`ã`view2`ã`view4` using the default database `db1`, and look up the +// relation `view3` using the default database `db2`. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, defaultDatabase)) +} + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// Look up the table with the given name from catalog. The database we look up the table from +// is decided follow the steps: +// 1. If the database part is defined in the table identifier, use that database name; +// 2. Else If the defaultDatabase is defined, use the default database name; +// 3. Else use the currentDb of the SessionCatalog. +private def lookupTableFromCatalog( +u: UnresolvedRelation, +defaultDatabase: Option[String] = None): LogicalPlan = { try { -catalog.lookupRelation(u.tableIdentifier, u.alias) +catalog.lookupRelation(u.tableIdentifier, u.alias, defaultDatabase) } catch { case _: NoSuchTableException => u.failAnalysis(s"Table or view not found: ${u.tableName}") } } -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => -i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) - case u: UnresolvedRelation => -val table = u.tableIdentifier -if (table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && -(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table))) { - // If the database part is specified, and we support running SQL directly on files, and - // it's not a temporary view, and the table does not exist, then let's just return the - // original UnresolvedRelation. It is possible we are matching a query like "select * - // from parquet.`/path/to/query`". The plan will get resolved later. -
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93996591 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,121 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default +// database with that the referenced view has, so we need to use the variable `defaultDatabase` +// to track the current default database. +// When the relation we resolve is a view, we fetch the view.desc(which is a CatalogTable), and +// then set the value of `CatalogTable.viewDefaultDatabase` to the variable `defaultDatabase`, +// we look up the relations that the view references using the default database. +// For example: +// |- view1 (defaultDatabase = db1) +// |- operator +// |- table2 (defaultDatabase = db1) +// |- view2 (defaultDatabase = db2) +//|- view3 (defaultDatabase = db3) +// |- view4 (defaultDatabase = db4) +// In this case, the view `view1` is a nested view, it directly references `table2`ã`view2` +// and `view4`, the view `view2` references `view3`. On resolving the table, we look up the +// relations `table2`ã`view2`ã`view4` using the default database `db1`, and look up the +// relation `view3` using the default database `db2`. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, defaultDatabase)) +} --- End diff -- I will try to combine the both functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93996552 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala --- @@ -543,4 +545,34 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } } + + test("correctly resolve a nested view") { +withTempDatabase { db => + withView(s"$db.view1", s"$db.view2") { +val view1 = CatalogTable( + identifier = TableIdentifier("view1", Some(db)), + tableType = CatalogTableType.VIEW, + storage = CatalogStorageFormat.empty, + schema = new StructType().add("id", "int").add("id1", "int"), + provider = Some("parquet"), --- End diff -- I have had some mis-understanding about a view that references a datasource table, will change that ASAF! Thank you and sorry for that trouble! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93996247 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,121 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default +// database with that the referenced view has, so we need to use the variable `defaultDatabase` +// to track the current default database. +// When the relation we resolve is a view, we fetch the view.desc(which is a CatalogTable), and +// then set the value of `CatalogTable.viewDefaultDatabase` to the variable `defaultDatabase`, +// we look up the relations that the view references using the default database. +// For example: +// |- view1 (defaultDatabase = db1) +// |- operator +// |- table2 (defaultDatabase = db1) +// |- view2 (defaultDatabase = db2) +//|- view3 (defaultDatabase = db3) +// |- view4 (defaultDatabase = db4) +// In this case, the view `view1` is a nested view, it directly references `table2`ã`view2` +// and `view4`, the view `view2` references `view3`. On resolving the table, we look up the +// relations `table2`ã`view2`ã`view4` using the default database `db1`, and look up the +// relation `view3` using the default database `db2`. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, defaultDatabase)) +} --- End diff -- The name of `resolveView` is confusing. We are also resolving regular tables. What is the reason why we are unable to use the `resolveRelation` to handle all the cases? (That means, can we combine `resolveView` and `resolveRelation`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93995925 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,121 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default +// database with that the referenced view has, so we need to use the variable `defaultDatabase` +// to track the current default database. +// When the relation we resolve is a view, we fetch the view.desc(which is a CatalogTable), and +// then set the value of `CatalogTable.viewDefaultDatabase` to the variable `defaultDatabase`, +// we look up the relations that the view references using the default database. +// For example: +// |- view1 (defaultDatabase = db1) +// |- operator +// |- table2 (defaultDatabase = db1) +// |- view2 (defaultDatabase = db2) +//|- view3 (defaultDatabase = db3) +// |- view4 (defaultDatabase = db4) +// In this case, the view `view1` is a nested view, it directly references `table2`ã`view2` +// and `view4`, the view `view2` references `view3`. On resolving the table, we look up the +// relations `table2`ã`view2`ã`view4` using the default database `db1`, and look up the +// relation `view3` using the default database `db2`. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, defaultDatabase)) +} + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// Look up the table with the given name from catalog. The database we look up the table from +// is decided follow the steps: +// 1. If the database part is defined in the table identifier, use that database name; +// 2. Else If the defaultDatabase is defined, use the default database name; +// 3. Else use the currentDb of the SessionCatalog. +private def lookupTableFromCatalog( +u: UnresolvedRelation, +defaultDatabase: Option[String] = None): LogicalPlan = { try { -catalog.lookupRelation(u.tableIdentifier, u.alias) +catalog.lookupRelation(u.tableIdentifier, u.alias, defaultDatabase) } catch { case _: NoSuchTableException => u.failAnalysis(s"Table or view not found: ${u.tableName}") } } -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => -i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) - case u: UnresolvedRelation => -val table = u.tableIdentifier -if (table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && -(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table))) { - // If the database part is specified, and we support running SQL directly on files, and - // it's not a temporary view, and the table does not exist, then let's just return the - // original UnresolvedRelation. It is possible we are matching a query like "select * - // from parquet.`/path/to/query`". The plan will get resolved later. -
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93995882 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala --- @@ -543,4 +545,34 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } } + + test("correctly resolve a nested view") { +withTempDatabase { db => + withView(s"$db.view1", s"$db.view2") { +val view1 = CatalogTable( + identifier = TableIdentifier("view1", Some(db)), + tableType = CatalogTableType.VIEW, + storage = CatalogStorageFormat.empty, + schema = new StructType().add("id", "int").add("id1", "int"), + provider = Some("parquet"), --- End diff -- When a relation is `View`, the provider is empty. I do not know understand why we need to provide a provider in this case? What does `parquet` means for a view? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93995592 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -377,6 +378,36 @@ case class InsertIntoTable( override lazy val resolved: Boolean = childrenResolved && table.resolved } +/** Factory for constructing new `View` nodes. */ +object View { + def apply(desc: CatalogTable): View = View(desc, desc.schema.toAttributes, None) +} + +/** + * A container for holding the view description(CatalogTable), and the output of the view. The + * child will be defined if the view is defined in a Hive metastore or the view is resolved, + * else it should be None. + * This operator will be removed at the end of analysis stage. + * + * @param desc A view description(CatalogTable) that provides necessary information to resolve the + * view. + * @param output The output of a view operator, this is generated during planning the view, so that + * we are able to decouple the output from the underlying structure. + * @param child The logical plan of a view operator, it should be non-empty if the view is defined + * in a Hive metastore or the view is resolved, else it should be None. + */ +case class View( +desc: CatalogTable, +output: Seq[Attribute], +child: Option[LogicalPlan] = None) extends LogicalPlan with MultiInstanceRelation { + + override lazy val resolved: Boolean = child.exists(_.resolved) + + override def children: Seq[LogicalPlan] = child.toSeq + + override def newInstance(): LogicalPlan = copy(output = output.map(_.newInstance())) +} --- End diff -- Sure. name + output attributes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93994361 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -377,6 +378,36 @@ case class InsertIntoTable( override lazy val resolved: Boolean = childrenResolved && table.resolved } +/** Factory for constructing new `View` nodes. */ +object View { + def apply(desc: CatalogTable): View = View(desc, desc.schema.toAttributes, None) +} + +/** + * A container for holding the view description(CatalogTable), and the output of the view. The + * child will be defined if the view is defined in a Hive metastore or the view is resolved, + * else it should be None. + * This operator will be removed at the end of analysis stage. + * + * @param desc A view description(CatalogTable) that provides necessary information to resolve the + * view. + * @param output The output of a view operator, this is generated during planning the view, so that + * we are able to decouple the output from the underlying structure. + * @param child The logical plan of a view operator, it should be non-empty if the view is defined + * in a Hive metastore or the view is resolved, else it should be None. + */ +case class View( +desc: CatalogTable, +output: Seq[Attribute], +child: Option[LogicalPlan] = None) extends LogicalPlan with MultiInstanceRelation { + + override lazy val resolved: Boolean = child.exists(_.resolved) + + override def children: Seq[LogicalPlan] = child.toSeq + + override def newInstance(): LogicalPlan = copy(output = output.map(_.newInstance())) +} --- End diff -- How about we output the `CatalogTable.qualifiedName`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93993539 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -377,6 +378,36 @@ case class InsertIntoTable( override lazy val resolved: Boolean = childrenResolved && table.resolved } +/** Factory for constructing new `View` nodes. */ +object View { + def apply(desc: CatalogTable): View = View(desc, desc.schema.toAttributes, None) +} + +/** + * A container for holding the view description(CatalogTable), and the output of the view. The + * child will be defined if the view is defined in a Hive metastore or the view is resolved, + * else it should be None. + * This operator will be removed at the end of analysis stage. + * + * @param desc A view description(CatalogTable) that provides necessary information to resolve the + * view. + * @param output The output of a view operator, this is generated during planning the view, so that + * we are able to decouple the output from the underlying structure. + * @param child The logical plan of a view operator, it should be non-empty if the view is defined + * in a Hive metastore or the view is resolved, else it should be None. + */ +case class View( +desc: CatalogTable, +output: Seq[Attribute], +child: Option[LogicalPlan] = None) extends LogicalPlan with MultiInstanceRelation { + + override lazy val resolved: Boolean = child.exists(_.resolved) + + override def children: Seq[LogicalPlan] = child.toSeq + + override def newInstance(): LogicalPlan = copy(output = output.map(_.newInstance())) +} --- End diff -- We also need to add `override def simpleString`, which is used for `ONE line description of this node`. Currently, we output the whole `CatalogTable`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93898801 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -860,6 +864,24 @@ abstract class CatalogTestUtils { bucketSpec = Some(BucketSpec(4, Seq("col1"), Nil))) } + def newView(name: String, database: Option[String] = None): CatalogTable = { +CatalogTable( + identifier = TableIdentifier(name, database), + tableType = CatalogTableType.VIEW, + storage = CatalogStorageFormat.empty, + schema = new StructType() +.add("col1", "int") +.add("col2", "string") +.add("a", "int") +.add("b", "string"), + provider = Some("hive"), --- End diff -- Do we need a case that the provider is not set? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93898653 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -551,17 +551,26 @@ class SessionCatalog( * * If a database is specified in `name`, this will return the table/view from that database. * If no database is specified, this will first attempt to return a temporary table/view with - * the same name, then, if that does not exist, return the table/view from the current database. + * the same name, then, if that does not exist, and defaultDatabase is defined, return the + * table/view from the defaultDatabase, else return the table/view from the catalog.currentDb. * * Note that, the global temp view database is also valid here, this will return the global temp * view matching the given name. * - * If the relation is a view, the relation will be wrapped in a [[SubqueryAlias]] which will - * track the name of the view. + * If the relation is a view, we generate a [[View]] operator from the view description, and + * wrap the logical plan in a [[SubqueryAlias]] which will track the name of the view. + * + * @param name The name of the table/view that we lookup. + * @param alias The alias name of the table/view that we lookup. + * @param defaultDatabase The database name we should use to lookup the table/view, if the + *database part of [[TableIdentifier]] is not defined. --- End diff -- Let's also explain the precedence (db name in the table identifier -> default db -> current db). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93898562 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -658,6 +720,17 @@ class Analyzer( Generate(newG.asInstanceOf[Generator], join, outer, qualifier, output, child) } + // A special case for View, replace the output attributes with the attributes that have the + // same names from the child. If the corresponding attribute is not found, throw an + // AnalysisException. + // TODO: Also check the dataTypes and nullabilites of the output. --- End diff -- Can we also explain why we need to replace output attributes in the code comment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93898349 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,94 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, defaultDatabase)) +} + +// Look up the table with the given name from catalog. If `defaultDatabase` is set, we look up +// the table in the database `defaultDatabase`, else we follow the default way. +private def lookupTableFromCatalog( +u: UnresolvedRelation, +defaultDatabase: Option[String] = None): LogicalPlan = { try { -catalog.lookupRelation(u.tableIdentifier, u.alias) +catalog.lookupRelation(u.tableIdentifier, u.alias, defaultDatabase) } catch { case _: NoSuchTableException => u.failAnalysis(s"Table or view not found: ${u.tableName}") } } -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => -i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) - case u: UnresolvedRelation => -val table = u.tableIdentifier -if (table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && -(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table))) { - // If the database part is specified, and we support running SQL directly on files, and - // it's not a temporary view, and the table does not exist, then let's just return the - // original UnresolvedRelation. It is possible we are matching a query like "select * - // from parquet.`/path/to/query`". The plan will get resolved later. - // Note that we are testing (!db_exists || !table_exists) because the catalog throws - // an exception from tableExists if the database does not exist. - u -} else { - lookupTableFromCatalog(u) +// If the database part is specified, and we support running SQL directly on files, and +// it's not a temporary view, and the table does not exist, then let's just return the +// original UnresolvedRelation. It is possible we are matching a query like "select * +// from parquet.`/path/to/query`". The plan will get resolved later. +// Note that we are testing (!db_exists || !table_exists) because the catalog throws +// an exception from tableExists if the database does not exist. +private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean = { + table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && +(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table)) +} + +// Change the default database name if the plan is a view, and transformDown with the new +// database name to resolve all UnresolvedRelations and Views. +// If the view is defined in a DataSource other than Hive, and the view's child is empty, +// set the view's child to a SimpleCatalogRelation, else throw an AnalysisException. +def resolveView(plan: LogicalPlan): LogicalPlan = plan match { + case view: View => +val desc = view.desc +val defaultDatabase =
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93898300 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,94 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, defaultDatabase)) +} + +// Look up the table with the given name from catalog. If `defaultDatabase` is set, we look up +// the table in the database `defaultDatabase`, else we follow the default way. +private def lookupTableFromCatalog( +u: UnresolvedRelation, +defaultDatabase: Option[String] = None): LogicalPlan = { try { -catalog.lookupRelation(u.tableIdentifier, u.alias) +catalog.lookupRelation(u.tableIdentifier, u.alias, defaultDatabase) } catch { case _: NoSuchTableException => u.failAnalysis(s"Table or view not found: ${u.tableName}") } } -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => -i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) - case u: UnresolvedRelation => -val table = u.tableIdentifier -if (table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && -(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table))) { - // If the database part is specified, and we support running SQL directly on files, and - // it's not a temporary view, and the table does not exist, then let's just return the - // original UnresolvedRelation. It is possible we are matching a query like "select * - // from parquet.`/path/to/query`". The plan will get resolved later. - // Note that we are testing (!db_exists || !table_exists) because the catalog throws - // an exception from tableExists if the database does not exist. - u -} else { - lookupTableFromCatalog(u) +// If the database part is specified, and we support running SQL directly on files, and +// it's not a temporary view, and the table does not exist, then let's just return the +// original UnresolvedRelation. It is possible we are matching a query like "select * +// from parquet.`/path/to/query`". The plan will get resolved later. +// Note that we are testing (!db_exists || !table_exists) because the catalog throws +// an exception from tableExists if the database does not exist. +private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean = { + table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && +(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table)) +} + +// Change the default database name if the plan is a view, and transformDown with the new +// database name to resolve all UnresolvedRelations and Views. +// If the view is defined in a DataSource other than Hive, and the view's child is empty, +// set the view's child to a SimpleCatalogRelation, else throw an AnalysisException. +def resolveView(plan: LogicalPlan): LogicalPlan = plan match { + case view: View => +val desc = view.desc +val defaultDatabase =
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93898210 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,94 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, defaultDatabase)) +} + +// Look up the table with the given name from catalog. If `defaultDatabase` is set, we look up +// the table in the database `defaultDatabase`, else we follow the default way. +private def lookupTableFromCatalog( +u: UnresolvedRelation, +defaultDatabase: Option[String] = None): LogicalPlan = { try { -catalog.lookupRelation(u.tableIdentifier, u.alias) +catalog.lookupRelation(u.tableIdentifier, u.alias, defaultDatabase) } catch { case _: NoSuchTableException => u.failAnalysis(s"Table or view not found: ${u.tableName}") } } -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => -i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) - case u: UnresolvedRelation => -val table = u.tableIdentifier -if (table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && -(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table))) { - // If the database part is specified, and we support running SQL directly on files, and - // it's not a temporary view, and the table does not exist, then let's just return the - // original UnresolvedRelation. It is possible we are matching a query like "select * - // from parquet.`/path/to/query`". The plan will get resolved later. - // Note that we are testing (!db_exists || !table_exists) because the catalog throws - // an exception from tableExists if the database does not exist. - u -} else { - lookupTableFromCatalog(u) +// If the database part is specified, and we support running SQL directly on files, and +// it's not a temporary view, and the table does not exist, then let's just return the +// original UnresolvedRelation. It is possible we are matching a query like "select * +// from parquet.`/path/to/query`". The plan will get resolved later. +// Note that we are testing (!db_exists || !table_exists) because the catalog throws +// an exception from tableExists if the database does not exist. +private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean = { + table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && +(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table)) +} + +// Change the default database name if the plan is a view, and transformDown with the new +// database name to resolve all UnresolvedRelations and Views. +// If the view is defined in a DataSource other than Hive, and the view's child is empty, +// set the view's child to a SimpleCatalogRelation, else throw an AnalysisException. --- End diff -- ``` // If the view is defined in a DataSource other than Hive, and the view's child is empty, // set the view's child to a SimpleCatalogRelation, else
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93898130 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,94 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, defaultDatabase)) +} + +// Look up the table with the given name from catalog. If `defaultDatabase` is set, we look up +// the table in the database `defaultDatabase`, else we follow the default way. --- End diff -- Let's be specific on what "else we follow the default way" means at here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93898018 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,94 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { --- End diff -- Let's explain defaultDatabase more, like when we need it, when it is set, and how it is different from the current database in the session catalog. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93662142 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,62 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we lookup the table from catalog +// and change the current database name if it is a view. +def resolveRelation( +plan: LogicalPlan, +currentDatabase: Option[String] = None): LogicalPlan = plan match { --- End diff -- @hvanhovell When views or tables are not resolved, our Analyzer treats them as unresolved relations. Conceptually, the views' `currentDatabase` should be part of `UnresolvedRelation`. Thus, maybe we can just move it into `UnresolvedRelation`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93646337 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,62 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we lookup the table from catalog +// and change the current database name if it is a view. +def resolveRelation( +plan: LogicalPlan, +currentDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, currentDatabase)) +} + +// Lookup the table with the given name from catalog. If `currentDatabase` is set, we lookup +// the table in the database `currentDatabase`, else we follow the default way. +private def lookupTableFromCatalog( --- End diff -- The function is used both in function `apply` and `resolveRelation`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93635804 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,62 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we lookup the table from catalog +// and change the current database name if it is a view. +def resolveRelation( +plan: LogicalPlan, +currentDatabase: Option[String] = None): LogicalPlan = plan match { --- End diff -- @gatorsmile what would you suggest? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93551189 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -549,17 +549,26 @@ class SessionCatalog( * * If a database is specified in `name`, this will return the table/view from that database. * If no database is specified, this will first attempt to return a temporary table/view with - * the same name, then, if that does not exist, return the table/view from the current database. + * the same name, then, if that does not exist, and currentDatabase is defined, return the + * table/view from the currentDatabase, else return the table/view from the catalog.currentDb. * * Note that, the global temp view database is also valid here, this will return the global temp * view matching the given name. * - * If the relation is a view, the relation will be wrapped in a [[SubqueryAlias]] which will - * track the name of the view. + * If the relation is a view, we add a [[View]] operator over the relation, and wrap the logical + * plan in a [[SubqueryAlias]] which will track the name of the view. + * + * @param name The name of the table/view that we lookup. + * @param alias The alias name of the table/view that we lookup. + * @param currentDatabase The database name we should use to lookup the table/view, if the + *database part of [[TableIdentifier]] is not defined. */ - def lookupRelation(name: TableIdentifier, alias: Option[String] = None): LogicalPlan = { + def lookupRelation( + name: TableIdentifier, + alias: Option[String] = None, + currentDatabase: Option[String] = None): LogicalPlan = { synchronized { - val db = formatDatabaseName(name.database.getOrElse(currentDb)) + val db = formatDatabaseName(name.database.getOrElse(currentDatabase.getOrElse(currentDb))) --- End diff -- I feel that currentDatabase and currentDb will confuse readers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93551078 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,62 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we lookup the table from catalog +// and change the current database name if it is a view. +def resolveRelation( +plan: LogicalPlan, +currentDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, currentDatabase)) +} + +// Lookup the table with the given name from catalog. If `currentDatabase` is set, we lookup +// the table in the database `currentDatabase`, else we follow the default way. +private def lookupTableFromCatalog( +u: UnresolvedRelation, +currentDatabase: Option[String] = None): LogicalPlan = { try { -catalog.lookupRelation(u.tableIdentifier, u.alias) +catalog.lookupRelation(u.tableIdentifier, u.alias, currentDatabase) } catch { case _: NoSuchTableException => u.failAnalysis(s"Table or view not found: ${u.tableName}") } } -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => -i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) - case u: UnresolvedRelation => -val table = u.tableIdentifier -if (table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && -(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table))) { - // If the database part is specified, and we support running SQL directly on files, and - // it's not a temporary view, and the table does not exist, then let's just return the - // original UnresolvedRelation. It is possible we are matching a query like "select * - // from parquet.`/path/to/query`". The plan will get resolved later. - // Note that we are testing (!db_exists || !table_exists) because the catalog throws - // an exception from tableExists if the database does not exist. - u -} else { - lookupTableFromCatalog(u) +// If the database part is specified, and we support running SQL directly on files, and +// it's not a temporary view, and the table does not exist, then let's just return the +// original UnresolvedRelation. It is possible we are matching a query like "select * +// from parquet.`/path/to/query`". The plan will get resolved later. +// Note that we are testing (!db_exists || !table_exists) because the catalog throws +// an exception from tableExists if the database does not exist. +private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean = { + table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && +(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table)) +} + +// Change the current database name if the plan is a view, and transformDown with the new +// database name to resolve all UnresolvedRelation. +def resolveView(plan: LogicalPlan): LogicalPlan = plan match { + case p @ SubqueryAlias(_, view: View, _) => +val currentDatabase = view.currentDatabase +val newChild = view transform { + case v: View if !v.resolved => +resolveView(v) + case u: UnresolvedRelation => +resolveRelation(u, currentDatabase) --- End diff -- oh, resolveRelation's second argument is the current db. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93549589 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -396,6 +396,20 @@ case class InsertIntoTable( } /** + * A container for holding the current database of a view and a query plan. + * This operator will be removed at the begining of the optimize stage so we can see what is part + * of a view in a analyzed plan. + * + * @param child The logical plan of this view. + * @param currentDatabase The database name we use to resolve the logical plan. + */ +case class View(child: LogicalPlan, currentDatabase: Option[String]) extends LogicalPlan { --- End diff -- Also, I feel `currentDatabase` may not be a good name. I am wondering if `defaultDatabase` is better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93549309 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,62 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we lookup the table from catalog +// and change the current database name if it is a view. +def resolveRelation( +plan: LogicalPlan, +currentDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, currentDatabase)) +} + +// Lookup the table with the given name from catalog. If `currentDatabase` is set, we lookup +// the table in the database `currentDatabase`, else we follow the default way. +private def lookupTableFromCatalog( --- End diff -- If so, should we just inline it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93549165 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,62 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we lookup the table from catalog +// and change the current database name if it is a view. +def resolveRelation( +plan: LogicalPlan, +currentDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, currentDatabase)) +} + +// Lookup the table with the given name from catalog. If `currentDatabase` is set, we lookup +// the table in the database `currentDatabase`, else we follow the default way. +private def lookupTableFromCatalog( +u: UnresolvedRelation, +currentDatabase: Option[String] = None): LogicalPlan = { try { -catalog.lookupRelation(u.tableIdentifier, u.alias) +catalog.lookupRelation(u.tableIdentifier, u.alias, currentDatabase) } catch { case _: NoSuchTableException => u.failAnalysis(s"Table or view not found: ${u.tableName}") } } -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => -i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) - case u: UnresolvedRelation => -val table = u.tableIdentifier -if (table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && -(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table))) { - // If the database part is specified, and we support running SQL directly on files, and - // it's not a temporary view, and the table does not exist, then let's just return the - // original UnresolvedRelation. It is possible we are matching a query like "select * - // from parquet.`/path/to/query`". The plan will get resolved later. - // Note that we are testing (!db_exists || !table_exists) because the catalog throws - // an exception from tableExists if the database does not exist. - u -} else { - lookupTableFromCatalog(u) +// If the database part is specified, and we support running SQL directly on files, and +// it's not a temporary view, and the table does not exist, then let's just return the +// original UnresolvedRelation. It is possible we are matching a query like "select * +// from parquet.`/path/to/query`". The plan will get resolved later. +// Note that we are testing (!db_exists || !table_exists) because the catalog throws +// an exception from tableExists if the database does not exist. +private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean = { + table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && +(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table)) +} + +// Change the current database name if the plan is a view, and transformDown with the new +// database name to resolve all UnresolvedRelation. +def resolveView(plan: LogicalPlan): LogicalPlan = plan match { + case p @ SubqueryAlias(_, view: View, _) => +val currentDatabase = view.currentDatabase +val newChild = view transform { + case v: View if !v.resolved => +resolveView(v) + case u: UnresolvedRelation => +resolveRelation(u, currentDatabase) --- End diff -- Is this right? `u` can also have db name, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r93548847 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,62 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we lookup the table from catalog +// and change the current database name if it is a view. +def resolveRelation( +plan: LogicalPlan, +currentDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, currentDatabase)) +} + +// Lookup the table with the given name from catalog. If `currentDatabase` is set, we lookup +// the table in the database `currentDatabase`, else we follow the default way. +private def lookupTableFromCatalog( --- End diff -- Is this function only used once? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r92777277 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,62 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we lookup the table from catalog +// and change the current database name if it is a view. +def resolveRelation( +plan: LogicalPlan, +currentDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, currentDatabase)) +} + +// Lookup the table with the given name from catalog. If `currentDatabase` is set, we lookup +// the table in the database `currentDatabase`, else we follow the default way. +private def lookupTableFromCatalog( +u: UnresolvedRelation, +currentDatabase: Option[String] = None): LogicalPlan = { try { -catalog.lookupRelation(u.tableIdentifier, u.alias) +catalog.lookupRelation(u.tableIdentifier, u.alias, currentDatabase) } catch { case _: NoSuchTableException => u.failAnalysis(s"Table or view not found: ${u.tableName}") } } -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => -i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) - case u: UnresolvedRelation => -val table = u.tableIdentifier -if (table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && -(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table))) { - // If the database part is specified, and we support running SQL directly on files, and - // it's not a temporary view, and the table does not exist, then let's just return the - // original UnresolvedRelation. It is possible we are matching a query like "select * - // from parquet.`/path/to/query`". The plan will get resolved later. - // Note that we are testing (!db_exists || !table_exists) because the catalog throws - // an exception from tableExists if the database does not exist. - u -} else { - lookupTableFromCatalog(u) +// If the database part is specified, and we support running SQL directly on files, and +// it's not a temporary view, and the table does not exist, then let's just return the +// original UnresolvedRelation. It is possible we are matching a query like "select * +// from parquet.`/path/to/query`". The plan will get resolved later. +// Note that we are testing (!db_exists || !table_exists) because the catalog throws +// an exception from tableExists if the database does not exist. +private def isRunningDirectlyOnFiles(table: TableIdentifier): Boolean = { + table.database.isDefined && conf.runSQLonFile && !catalog.isTemporaryTable(table) && +(!catalog.databaseExists(table.database.get) || !catalog.tableExists(table)) +} + +// Change the current database name if the plan is a view, and transformDown with the new +// database name to resolve all UnresolvedRelation. +def resolveView(plan: LogicalPlan): LogicalPlan = plan match { --- End diff -- I do not like this function name. Basically, this function is used to resolve both views and tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe,
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r92776636 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,62 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we lookup the table from catalog +// and change the current database name if it is a view. +def resolveRelation( +plan: LogicalPlan, +currentDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +resolveView(lookupTableFromCatalog(u, currentDatabase)) +} + +// Lookup the table with the given name from catalog. If `currentDatabase` is set, we lookup --- End diff -- Nit: lookup is a noun. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r92776481 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,62 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we lookup the table from catalog +// and change the current database name if it is a view. +def resolveRelation( +plan: LogicalPlan, +currentDatabase: Option[String] = None): LogicalPlan = plan match { --- End diff -- The interface of this function is not good. If we only pass `UnresolvedRelation` and then do not use `LogicalPlan `. `currentDatabase` is only used for the view. The name of this variable is not good. Is that possible to move it into `UnresolvedRelation`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r91689436 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala --- @@ -543,4 +545,34 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } } + + test("create a nested view") { --- End diff -- It's not easy to test the `resolveRelations` rule directly in `AnalysisSuite`, so we test resolve a nested view here instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r91689112 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -114,10 +121,13 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log alias.map(a => SubqueryAlias(a, qualifiedTable, None)).getOrElse(qualifiedTable) } else if (table.tableType == CatalogTableType.VIEW) { val viewText = table.viewText.getOrElse(sys.error("Invalid view without text.")) - SubqueryAlias( -alias.getOrElse(table.identifier.table), -sparkSession.sessionState.sqlParser.parsePlan(viewText), -Option(table.identifier)) + // The relation is a view, so we wrap the relation by: + // 1. Add a [[View]] operator over the relation to keep track of the database name; + // 2. Wrap the logical plan in a [[SubqueryAlias]] which tracks the name of the view. + val child = View( --- End diff -- This piece of code is slightly different from that in `SessionCatalog.lookupRelation`, so we are not merging them into one common function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r91688872 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala --- @@ -219,6 +219,9 @@ class SQLBuilder private ( case OneRowRelation => "" +case p: View => --- End diff -- We will remove the entire rule after we have implemented the late binding method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r91688428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2224,6 +2253,17 @@ object EliminateSubqueryAliases extends Rule[LogicalPlan] { } /** + * Removes [[View]] operators from the plan. The operator is only used to provide the database + * name we should use in resolving a view, and it's respected till the end of analysis stage + * just because we want to see which part of a analyzed logical plan is generated from a view. + */ +object EliminateView extends Rule[LogicalPlan] { --- End diff -- This rule is only used by `Optimizer`, it's placed here because it is in fact a analyze rule. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/16233 [SPARK-18801][SQL] Add `View` operator to help resolve a view ## What changes were proposed in this pull request? We should add a new operator called `View` to keep track of the database name used on resolving a view. The analysis rule `ResolveRelations` should also be updated. After that change, we should be able to resolve a nested view. This PR mainly changes: 1. Add a new operator `View` that keep track of the database name used to resolve a view; 2. Change the analyze rule `ResolveRelations` to support resolve a nested view; 3. Add `currentDatabase` variable to `CatalogTable` to keep track of the database name used to resolve a view, if the `CatalogTable` is not a view, then the variable should be `None`. ## How was this patch tested? 1. Add new tests in `SessionCatalogSuite` to test the function `lookupRelation`; 2. Add new test case in `SQLViewSuite` to test resolve a nested view. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiangxb1987/spark resolve-view Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16233.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16233 commit b67e7ce9f665a1ba3e57093ceb5ceeec2d9bb0d2 Author: jiangxingboDate: 2016-12-09T09:50:00Z add View operator. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org