[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/14962 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r78687128 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -457,6 +457,20 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be checkAnswer(df2, df) } + test("save as table if a same-name temp view exists") { +import SaveMode._ +for (mode <- Seq(Append, ErrorIfExists, Overwrite, Ignore)) { + withTable("same_name") { +withTempView("same_name") { + spark.range(10).createTempView("same_name") + spark.range(20).write.mode(mode).saveAsTable("same_name") + checkAnswer(spark.table("same_name"), spark.range(10).toDF()) + checkAnswer(spark.table("default.same_name"), spark.range(20).toDF()) +} + } +} + } --- End diff -- Let's add comments to explain what this test is for in case we accidentally delete it in future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r78687123 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala --- @@ -322,6 +325,14 @@ class CatalogSuite assert(e2.message == "Cannot create a file-based external data source table without path") } + test("dropTempView if a same-name table exists") { +withTable("same_name") { + sql("CREATE TABLE same_name(i int) USING json") + spark.catalog.dropTempView("same_name") + assert(spark.sessionState.catalog.tableExists(TableIdentifier("same_name"))) +} + } --- End diff -- Let's add comments to explain what this test is for in case we accidentally delete it in future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r78687075 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2661,4 +2661,15 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { data.selectExpr("`part.col1`", "`col.1`")) } } + + test("CREATE TABLE USING if a same-name temp view exists") { +withTable("same_name") { + withTempView("same_name") { +spark.range(10).createTempView("same_name") +sql("CREATE TABLE same_name(i int) USING json") +checkAnswer(spark.table("same_name"), spark.range(10).toDF()) +assert(spark.table("default.same_name").collect().isEmpty) + } +} + } --- End diff -- Let's add comments to explain what this test is for in case we accidentally delete it in future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r78686868 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -457,6 +457,20 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be checkAnswer(df2, df) } + test("save as table if a same-name temp view exists") { +import SaveMode._ +for (mode <- Seq(Append, ErrorIfExists, Overwrite, Ignore)) { + withTable("same_name") { +withTempView("same_name") { + spark.range(10).createTempView("same_name") + spark.range(20).write.mode(mode).saveAsTable("same_name") + checkAnswer(spark.table("same_name"), spark.range(10).toDF()) + checkAnswer(spark.table("default.same_name"), spark.range(20).toDF()) +} + } +} + } --- End diff -- This is a regression test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r78686835 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala --- @@ -322,6 +325,14 @@ class CatalogSuite assert(e2.message == "Cannot create a file-based external data source table without path") } + test("dropTempView if a same-name table exists") { +withTable("same_name") { + sql("CREATE TABLE same_name(i int) USING json") + spark.catalog.dropTempView("same_name") + assert(spark.sessionState.catalog.tableExists(TableIdentifier("same_name"))) +} + } --- End diff -- This is a regression test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r78686776 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2661,4 +2661,15 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { data.selectExpr("`part.col1`", "`col.1`")) } } + + test("CREATE TABLE USING if a same-name temp view exists") { +withTable("same_name") { + withTempView("same_name") { +spark.range(10).createTempView("same_name") +sql("CREATE TABLE same_name(i int) USING json") +checkAnswer(spark.table("same_name"), spark.range(10).toDF()) +assert(spark.table("default.same_name").collect().isEmpty) + } +} + } --- End diff -- This is a regression test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r78683471 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -439,7 +439,7 @@ class Analyzer( object ResolveRelations extends Rule[LogicalPlan] { private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { try { -catalog.lookupRelation(u.tableIdentifier, u.alias) +catalog.lookupTempViewOrRelation(u.tableIdentifier, u.alias) --- End diff -- This is also for view, right? Should we just keep the old name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77859070 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -189,31 +189,39 @@ case class DropTableCommand( override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog -if (!catalog.tableExists(tableName)) { - if (!ifExists) { -val objectName = if (isView) "View" else "Table" -throw new AnalysisException(s"$objectName to drop '$tableName' does not exist") - } -} else { - // If the command DROP VIEW is to drop a table or DROP TABLE is to drop a view - // issue an exception. - catalog.getTableMetadataOption(tableName).map(_.tableType match { -case CatalogTableType.VIEW if !isView => - throw new AnalysisException( -"Cannot drop a view with DROP TABLE. Please use DROP VIEW instead") -case o if o != CatalogTableType.VIEW && isView => - throw new AnalysisException( -s"Cannot drop a table with DROP VIEW. Please use DROP TABLE instead") -case _ => - }) - try { -sparkSession.sharedState.cacheManager.uncacheQuery( - sparkSession.table(tableName.quotedString)) - } catch { -case NonFatal(e) => log.warn(e.toString, e) + +// If the table name contains database part, we should drop a metastore table directly, +// otherwise, try to drop a temp view first, if that not exist, drop metastore table. +val dropMetastoreTable = + tableName.database.isDefined || !catalog.dropTempView(tableName.table) --- End diff -- I see. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77858872 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala --- @@ -95,12 +95,12 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { e = intercept[AnalysisException] { sql(s"""LOAD DATA LOCAL INPATH "$testData" INTO TABLE $viewName""") }.getMessage - assert(e.contains(s"Target table in LOAD DATA cannot be temporary: `$viewName`")) + assert(e.contains(s"Target table in LOAD DATA does not exist: `$viewName`")) --- End diff -- ```Scala if (!catalog.tableExists(table)) { throw new AnalysisException(s"Target table in LOAD DATA does not exist: $table") } val targetTable = catalog.getTableMetadataOption(table).getOrElse { throw new AnalysisException(s"Target table in LOAD DATA cannot be temporary: $table") } ``` Currently, the message in the `getOrElse` becomes unreachable. Maybe, we can simplify it by ```Scala if (!catalog.tableExists(table)) { throw new AnalysisException(s"Target table in LOAD DATA does not exist: $table") } val targetTable = catalog.getTableMetadata(table) ``` Or ``` val targetTable = catalog.getTableMetadata(table) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77810897 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -159,12 +171,13 @@ case class AlterTableRenameCommand( override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog DDLUtils.verifyAlterTableType(catalog, oldName, isView) -// If this is a temp view, just rename the view. -// Otherwise, if this is a real table, we also need to uncache and invalidate the table. -val isTemporary = catalog.isTemporaryTable(oldName) -if (isTemporary) { - catalog.renameTable(oldName, newName) -} else { + +// If the old table name contains database part, we should rename a metastore table directly, +// otherwise, try to rename a temp view first, if that not exists, rename a metastore table. +val renameMetastoreTable = + oldName.database.isDefined || !catalog.renameTempView(oldName.table, newName) --- End diff -- see https://github.com/apache/spark/pull/14962#discussion_r77808532 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77809642 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -159,12 +171,13 @@ case class AlterTableRenameCommand( override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog DDLUtils.verifyAlterTableType(catalog, oldName, isView) -// If this is a temp view, just rename the view. -// Otherwise, if this is a real table, we also need to uncache and invalidate the table. -val isTemporary = catalog.isTemporaryTable(oldName) -if (isTemporary) { - catalog.renameTable(oldName, newName) -} else { + +// If the old table name contains database part, we should rename a metastore table directly, +// otherwise, try to rename a temp view first, if that not exists, rename a metastore table. +val renameMetastoreTable = + oldName.database.isDefined || !catalog.renameTempView(oldName.table, newName) --- End diff -- see https://github.com/apache/spark/pull/14962#discussion_r77808532 I'd like to avoid breaking existing behaviours here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77809524 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala --- @@ -95,12 +95,12 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { e = intercept[AnalysisException] { sql(s"""LOAD DATA LOCAL INPATH "$testData" INTO TABLE $viewName""") }.getMessage - assert(e.contains(s"Target table in LOAD DATA cannot be temporary: `$viewName`")) + assert(e.contains(s"Target table in LOAD DATA does not exist: `$viewName`")) --- End diff -- how? Actually the `tableExists` is kind of a sanity check here, `getTableMetadataOption(..).getOrElse(...)` will fail if the given table is not in metastore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77808727 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/TempViewManager.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.catalog + +import javax.annotation.concurrent.GuardedBy + +import scala.collection.mutable + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.analysis.TempViewAlreadyExistsException +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.util.StringUtils + + +/** + * A thread-safe manager for a list of temp views, providing atomic operations to manage temp views. --- End diff -- yea good idea --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77808532 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -189,31 +189,39 @@ case class DropTableCommand( override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog -if (!catalog.tableExists(tableName)) { - if (!ifExists) { -val objectName = if (isView) "View" else "Table" -throw new AnalysisException(s"$objectName to drop '$tableName' does not exist") - } -} else { - // If the command DROP VIEW is to drop a table or DROP TABLE is to drop a view - // issue an exception. - catalog.getTableMetadataOption(tableName).map(_.tableType match { -case CatalogTableType.VIEW if !isView => - throw new AnalysisException( -"Cannot drop a view with DROP TABLE. Please use DROP VIEW instead") -case o if o != CatalogTableType.VIEW && isView => - throw new AnalysisException( -s"Cannot drop a table with DROP VIEW. Please use DROP TABLE instead") -case _ => - }) - try { -sparkSession.sharedState.cacheManager.uncacheQuery( - sparkSession.table(tableName.quotedString)) - } catch { -case NonFatal(e) => log.warn(e.toString, e) + +// If the table name contains database part, we should drop a metastore table directly, +// otherwise, try to drop a temp view first, if that not exist, drop metastore table. +val dropMetastoreTable = + tableName.database.isDefined || !catalog.dropTempView(tableName.table) --- End diff -- Actually I noticed this and fixed it before, but it breaks a lot of tests. I'd like to keep this behaviour as it was, we can discuss how to fix it in follow-ups. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77765007 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala --- @@ -134,11 +134,26 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog { } /** - * Returns a list of columns for the given table in the current database. + * Returns a list of columns for the temp view matching the given name, or for the given table in + * the current database. */ @throws[AnalysisException]("table does not exist") override def listColumns(tableName: String): Dataset[Column] = { --- End diff -- We are having a [test case](https://github.com/apache/spark/blob/c0ae6bc6ea38909730fad36e653d3c7ab0a84b44/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala#L239-L242) for `listColumns` on temporary views. However, it does not check results. Maybe we can correct it in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77761938 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/TempViewManager.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.catalog + +import javax.annotation.concurrent.GuardedBy + +import scala.collection.mutable + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.analysis.TempViewAlreadyExistsException +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.util.StringUtils + + +/** + * A thread-safe manager for a list of temp views, providing atomic operations to manage temp views. --- End diff -- In the description of `TempViewManager`, could we mention the name of temp view is always case sensitive? The caller is responsible for handling case-related issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77756736 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -159,12 +171,13 @@ case class AlterTableRenameCommand( override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog DDLUtils.verifyAlterTableType(catalog, oldName, isView) -// If this is a temp view, just rename the view. -// Otherwise, if this is a real table, we also need to uncache and invalidate the table. -val isTemporary = catalog.isTemporaryTable(oldName) -if (isTemporary) { - catalog.renameTable(oldName, newName) -} else { + +// If the old table name contains database part, we should rename a metastore table directly, +// otherwise, try to rename a temp view first, if that not exists, rename a metastore table. +val renameMetastoreTable = + oldName.database.isDefined || !catalog.renameTempView(oldName.table, newName) --- End diff -- Here, we also need to check if it is VIEW before trying to drop a temp view. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77756537 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala --- @@ -95,12 +95,12 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { e = intercept[AnalysisException] { sql(s"""LOAD DATA LOCAL INPATH "$testData" INTO TABLE $viewName""") }.getMessage - assert(e.contains(s"Target table in LOAD DATA cannot be temporary: `$viewName`")) + assert(e.contains(s"Target table in LOAD DATA does not exist: `$viewName`")) --- End diff -- https://github.com/apache/spark/blob/c0ae6bc6ea38909730fad36e653d3c7ab0a84b44/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L218-L223 Before this PR, `tableExists` checks the temp table, but `getTableMetadataOption` does not check it. Thus, instead of changing the test case, we need to change the impl of `LoadDataCommand` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77756261 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -246,33 +246,23 @@ class SessionCatalog( } /** - * Retrieve the metadata of an existing metastore table. - * If no database is specified, assume the table is in the current database. - * If the specified table is not found in the database then a [[NoSuchTableException]] is thrown. + * Retrieve the metadata of an existing metastore table/view. + * If no database is specified, assume the table/view is in the current database. + * If the specified table/view is not found in the database then a [[NoSuchTableException]] is + * thrown. */ def getTableMetadata(name: TableIdentifier): CatalogTable = { val db = formatDatabaseName(name.database.getOrElse(getCurrentDatabase)) val table = formatTableName(name.table) -val tid = TableIdentifier(table) -if (isTemporaryTable(name)) { - CatalogTable( -identifier = tid, -tableType = CatalogTableType.VIEW, -storage = CatalogStorageFormat.empty, -schema = tempTables(table).output.toStructType, -properties = Map(), -viewText = None) -} else { - requireDbExists(db) - requireTableExists(TableIdentifier(table, Some(db))) - externalCatalog.getTable(db, table) -} +requireDbExists(db) +requireTableExists(TableIdentifier(table, Some(db))) +externalCatalog.getTable(db, table) } /** - * Retrieve the metadata of an existing metastore table. + * Retrieve the metadata of an existing metastore table/view. * If no database is specified, assume the table is in the current database. - * If the specified table is not found in the database then return None if it doesn't exist. + * If the specified table/view is not found in the database then return None if it doesn't exist. */ def getTableMetadataOption(name: TableIdentifier): Option[CatalogTable] = { --- End diff -- `getTableMetadataOption` does not check the temp view, but `getTableMetadata` does check it... We might have more bugs... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77753115 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -189,31 +189,39 @@ case class DropTableCommand( override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog -if (!catalog.tableExists(tableName)) { - if (!ifExists) { -val objectName = if (isView) "View" else "Table" -throw new AnalysisException(s"$objectName to drop '$tableName' does not exist") - } -} else { - // If the command DROP VIEW is to drop a table or DROP TABLE is to drop a view - // issue an exception. - catalog.getTableMetadataOption(tableName).map(_.tableType match { -case CatalogTableType.VIEW if !isView => - throw new AnalysisException( -"Cannot drop a view with DROP TABLE. Please use DROP VIEW instead") -case o if o != CatalogTableType.VIEW && isView => - throw new AnalysisException( -s"Cannot drop a table with DROP VIEW. Please use DROP TABLE instead") -case _ => - }) - try { -sparkSession.sharedState.cacheManager.uncacheQuery( - sparkSession.table(tableName.quotedString)) - } catch { -case NonFatal(e) => log.warn(e.toString, e) + +// If the table name contains database part, we should drop a metastore table directly, +// otherwise, try to drop a temp view first, if that not exist, drop metastore table. +val dropMetastoreTable = + tableName.database.isDefined || !catalog.dropTempView(tableName.table) --- End diff -- `Drop Table` is unable to drop a temp view, right? ```SQL spark.range(10).createTempView("tempView") sql("DESC tempView").show() sql("DROP TABLE tempView") sql("DESC tempView").show() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77745578 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -72,9 +72,7 @@ class SessionCatalog( this(externalCatalog, new SimpleFunctionRegistry, new SimpleCatalystConf(true)) } - /** List of temporary tables, mapping from table name to their logical plan. */ - @GuardedBy("this") - protected val tempTables = new mutable.HashMap[String, LogicalPlan] + private val tempViews = new TempViewManager --- End diff -- Since the goal of this PR is to add some view related API. So I think refactoring using TempViewManager is not the major goal? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77596625 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -72,9 +72,7 @@ class SessionCatalog( this(externalCatalog, new SimpleFunctionRegistry, new SimpleCatalystConf(true)) } - /** List of temporary tables, mapping from table name to their logical plan. */ - @GuardedBy("this") - protected val tempTables = new mutable.HashMap[String, LogicalPlan] + private val tempViews = new TempViewManager --- End diff -- Why not just name it `tempViewManager`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77593859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -72,9 +72,7 @@ class SessionCatalog( this(externalCatalog, new SimpleFunctionRegistry, new SimpleCatalystConf(true)) } - /** List of temporary tables, mapping from table name to their logical plan. */ - @GuardedBy("this") - protected val tempTables = new mutable.HashMap[String, LogicalPlan] + private val tempViews = new TempViewManager --- End diff -- I think it's easier to implement and reason about the thread-safe semantic for temp views if we put temp view management into one place. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77583701 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -325,180 +355,130 @@ class SessionCatalog( new Path(new Path(dbLocation), formatTableName(tableIdent.table)).toString } - // - - // | Methods that interact with temporary and metastore tables | - // - + // -- + // | Methods that interact with temporary views | + // -- /** - * Create a temporary table. + * Create a temporary view. */ def createTempView( name: String, - tableDefinition: LogicalPlan, - overrideIfExists: Boolean): Unit = synchronized { --- End diff -- yea, now we let `TempViewManager` to implement the thread-safe semantic --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77582577 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -325,180 +355,130 @@ class SessionCatalog( new Path(new Path(dbLocation), formatTableName(tableIdent.table)).toString } - // - - // | Methods that interact with temporary and metastore tables | - // - + // -- + // | Methods that interact with temporary views | + // -- /** - * Create a temporary table. + * Create a temporary view. */ def createTempView( name: String, - tableDefinition: LogicalPlan, - overrideIfExists: Boolean): Unit = synchronized { --- End diff -- If we change it back to `HashMap`, we need to add `synchronized` back. Is my understanding right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77560988 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -72,9 +72,7 @@ class SessionCatalog( this(externalCatalog, new SimpleFunctionRegistry, new SimpleCatalystConf(true)) } - /** List of temporary tables, mapping from table name to their logical plan. */ - @GuardedBy("this") - protected val tempTables = new mutable.HashMap[String, LogicalPlan] + private val tempViews = new TempViewManager --- End diff -- What is the reason? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77560363 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -72,9 +72,7 @@ class SessionCatalog( this(externalCatalog, new SimpleFunctionRegistry, new SimpleCatalystConf(true)) } - /** List of temporary tables, mapping from table name to their logical plan. */ - @GuardedBy("this") - protected val tempTables = new mutable.HashMap[String, LogicalPlan] + private val tempViews = new TempViewManager --- End diff -- Can we avoid adding TempViewManager? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/14962 [SPARK-17402][SQL] separate the management of temp views and metastore tables/views in SessionCatalog ## What changes were proposed in this pull request? In `SessionCatalog`, we have several operations(`getTableMetadata`, `tableExists`, `renameTable`, `dropTable`) that handle both temp views and metastore tables/views. They can save some lines of code for some commands that need to deal with both temp views and metastore tables/views, but also introduce bugs for other commands, because the operation names say nothing about temp views and are very likely to be misused: * `DataFrameWriter.saveAsTable`/`CREATE TABLE USING` will fail if a same-name temp view exits * `Catalog.dropTempView` may drop metastore table mistakenly * `ALTER TABLE RECOVER PARTITIONS`/`LOAD DATA`/`TRUNCATE TABLE`/`SHOW CREATE TABLE` should report "table not found" instead of "temp view is not supported", if a same-name temp view exists, because these commands don't need to deal with temp views. In some commands we support temp views mistakenly without mentioning it in document: `ShowColumnsCommand`, `Catalog.listColumns`. Mixing the handling of temp views and metastore tables/views also makes it harder to implement thread-safe operations. e.g. `AlterViewAsCommand` checks `isTemporaryTable` first then `createTempView`, which is not atomic. Most temp view related operations in `SessionCatalog` hold a lock on the `SessionCatalog` object, which is unnecessary. This PR separates the management of temp views and metastore tables/views in `SessionCatalog`, any commands that need to deal with temp views should explicitly call temp view related operations in `SessionCatalog`, to fix existing bugs and prevent future bug like this. ## How was this patch tested? existing tests and 3 new tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark temp-view Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14962.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14962 commit ebed732367a9949fe3f3d3df53a6798a63670064 Author: Wenchen Fan Date: 2016-09-01T08:29:50Z separate the management of temp views and metastore tables/views in SessionCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org