[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-10-18 Thread cloud-fan
Github user cloud-fan closed the pull request at:

https://github.com/apache/spark/pull/14962


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78687128
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -457,6 +457,20 @@ class DataFrameReaderWriterSuite extends QueryTest 
with SharedSQLContext with Be
 checkAnswer(df2, df)
   }
 
+  test("save as table if a same-name temp view exists") {
+import SaveMode._
+for (mode <- Seq(Append, ErrorIfExists, Overwrite, Ignore)) {
+  withTable("same_name") {
+withTempView("same_name") {
+  spark.range(10).createTempView("same_name")
+  spark.range(20).write.mode(mode).saveAsTable("same_name")
+  checkAnswer(spark.table("same_name"), spark.range(10).toDF())
+  checkAnswer(spark.table("default.same_name"), 
spark.range(20).toDF())
+}
+  }
+}
+  }
--- End diff --

Let's add comments to explain what this test is for in case we accidentally 
delete it in future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78687123
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala ---
@@ -322,6 +325,14 @@ class CatalogSuite
 assert(e2.message == "Cannot create a file-based external data source 
table without path")
   }
 
+  test("dropTempView if a same-name table exists") {
+withTable("same_name") {
+  sql("CREATE TABLE same_name(i int) USING json")
+  spark.catalog.dropTempView("same_name")
+  
assert(spark.sessionState.catalog.tableExists(TableIdentifier("same_name")))
+}
+  }
--- End diff --

Let's add comments to explain what this test is for in case we accidentally 
delete it in future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78687075
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2661,4 +2661,15 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 data.selectExpr("`part.col1`", "`col.1`"))
 }
   }
+
+  test("CREATE TABLE USING if a same-name temp view exists") {
+withTable("same_name") {
+  withTempView("same_name") {
+spark.range(10).createTempView("same_name")
+sql("CREATE TABLE same_name(i int) USING json")
+checkAnswer(spark.table("same_name"), spark.range(10).toDF())
+assert(spark.table("default.same_name").collect().isEmpty)
+  }
+}
+  }
--- End diff --

Let's add comments to explain what this test is for in case we accidentally 
delete it in future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78686868
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -457,6 +457,20 @@ class DataFrameReaderWriterSuite extends QueryTest 
with SharedSQLContext with Be
 checkAnswer(df2, df)
   }
 
+  test("save as table if a same-name temp view exists") {
+import SaveMode._
+for (mode <- Seq(Append, ErrorIfExists, Overwrite, Ignore)) {
+  withTable("same_name") {
+withTempView("same_name") {
+  spark.range(10).createTempView("same_name")
+  spark.range(20).write.mode(mode).saveAsTable("same_name")
+  checkAnswer(spark.table("same_name"), spark.range(10).toDF())
+  checkAnswer(spark.table("default.same_name"), 
spark.range(20).toDF())
+}
+  }
+}
+  }
--- End diff --

This is a regression test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78686835
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala ---
@@ -322,6 +325,14 @@ class CatalogSuite
 assert(e2.message == "Cannot create a file-based external data source 
table without path")
   }
 
+  test("dropTempView if a same-name table exists") {
+withTable("same_name") {
+  sql("CREATE TABLE same_name(i int) USING json")
+  spark.catalog.dropTempView("same_name")
+  
assert(spark.sessionState.catalog.tableExists(TableIdentifier("same_name")))
+}
+  }
--- End diff --

This is a regression test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78686776
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2661,4 +2661,15 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 data.selectExpr("`part.col1`", "`col.1`"))
 }
   }
+
+  test("CREATE TABLE USING if a same-name temp view exists") {
+withTable("same_name") {
+  withTempView("same_name") {
+spark.range(10).createTempView("same_name")
+sql("CREATE TABLE same_name(i int) USING json")
+checkAnswer(spark.table("same_name"), spark.range(10).toDF())
+assert(spark.table("default.same_name").collect().isEmpty)
+  }
+}
+  }
--- End diff --

This is a regression test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78683471
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -439,7 +439,7 @@ class Analyzer(
   object ResolveRelations extends Rule[LogicalPlan] {
 private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupTempViewOrRelation(u.tableIdentifier, u.alias)
--- End diff --

This is also for view, right? Should we just keep the old name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77859070
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -189,31 +189,39 @@ case class DropTableCommand(
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
-if (!catalog.tableExists(tableName)) {
-  if (!ifExists) {
-val objectName = if (isView) "View" else "Table"
-throw new AnalysisException(s"$objectName to drop '$tableName' 
does not exist")
-  }
-} else {
-  // If the command DROP VIEW is to drop a table or DROP TABLE is to 
drop a view
-  // issue an exception.
-  catalog.getTableMetadataOption(tableName).map(_.tableType match {
-case CatalogTableType.VIEW if !isView =>
-  throw new AnalysisException(
-"Cannot drop a view with DROP TABLE. Please use DROP VIEW 
instead")
-case o if o != CatalogTableType.VIEW && isView =>
-  throw new AnalysisException(
-s"Cannot drop a table with DROP VIEW. Please use DROP TABLE 
instead")
-case _ =>
-  })
-  try {
-sparkSession.sharedState.cacheManager.uncacheQuery(
-  sparkSession.table(tableName.quotedString))
-  } catch {
-case NonFatal(e) => log.warn(e.toString, e)
+
+// If the table name contains database part, we should drop a 
metastore table directly,
+// otherwise, try to drop a temp view first, if that not exist, drop 
metastore table.
+val dropMetastoreTable =
+  tableName.database.isDefined || 
!catalog.dropTempView(tableName.table)
--- End diff --

I see. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77858872
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -95,12 +95,12 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
   e = intercept[AnalysisException] {
 sql(s"""LOAD DATA LOCAL INPATH "$testData" INTO TABLE $viewName""")
   }.getMessage
-  assert(e.contains(s"Target table in LOAD DATA cannot be temporary: 
`$viewName`"))
+  assert(e.contains(s"Target table in LOAD DATA does not exist: 
`$viewName`"))
--- End diff --

```Scala
if (!catalog.tableExists(table)) {
  throw new AnalysisException(s"Target table in LOAD DATA does not 
exist: $table")
}
val targetTable = catalog.getTableMetadataOption(table).getOrElse {
  throw new AnalysisException(s"Target table in LOAD DATA cannot be 
temporary: $table")
}
```

Currently, the message in the `getOrElse` becomes unreachable. Maybe, we 
can simplify it by

```Scala
if (!catalog.tableExists(table)) {
  throw new AnalysisException(s"Target table in LOAD DATA does not 
exist: $table")
}
val targetTable = catalog.getTableMetadata(table)
```
Or
```
val targetTable = catalog.getTableMetadata(table)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77810897
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -159,12 +171,13 @@ case class AlterTableRenameCommand(
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
 DDLUtils.verifyAlterTableType(catalog, oldName, isView)
-// If this is a temp view, just rename the view.
-// Otherwise, if this is a real table, we also need to uncache and 
invalidate the table.
-val isTemporary = catalog.isTemporaryTable(oldName)
-if (isTemporary) {
-  catalog.renameTable(oldName, newName)
-} else {
+
+// If the old table name contains database part, we should rename a 
metastore table directly,
+// otherwise, try to rename a temp view first, if that not exists, 
rename a metastore table.
+val renameMetastoreTable =
+  oldName.database.isDefined || !catalog.renameTempView(oldName.table, 
newName)
--- End diff --

see https://github.com/apache/spark/pull/14962#discussion_r77808532


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77809642
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -159,12 +171,13 @@ case class AlterTableRenameCommand(
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
 DDLUtils.verifyAlterTableType(catalog, oldName, isView)
-// If this is a temp view, just rename the view.
-// Otherwise, if this is a real table, we also need to uncache and 
invalidate the table.
-val isTemporary = catalog.isTemporaryTable(oldName)
-if (isTemporary) {
-  catalog.renameTable(oldName, newName)
-} else {
+
+// If the old table name contains database part, we should rename a 
metastore table directly,
+// otherwise, try to rename a temp view first, if that not exists, 
rename a metastore table.
+val renameMetastoreTable =
+  oldName.database.isDefined || !catalog.renameTempView(oldName.table, 
newName)
--- End diff --

see https://github.com/apache/spark/pull/14962#discussion_r77808532

I'd like to avoid breaking existing behaviours here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77809524
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -95,12 +95,12 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
   e = intercept[AnalysisException] {
 sql(s"""LOAD DATA LOCAL INPATH "$testData" INTO TABLE $viewName""")
   }.getMessage
-  assert(e.contains(s"Target table in LOAD DATA cannot be temporary: 
`$viewName`"))
+  assert(e.contains(s"Target table in LOAD DATA does not exist: 
`$viewName`"))
--- End diff --

how? Actually the `tableExists` is kind of a sanity check here, 
`getTableMetadataOption(..).getOrElse(...)` will fail if the given table is not 
in metastore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77808727
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/TempViewManager.scala
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.catalog
+
+import javax.annotation.concurrent.GuardedBy
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.AnalysisException
+import 
org.apache.spark.sql.catalyst.analysis.TempViewAlreadyExistsException
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util.StringUtils
+
+
+/**
+ * A thread-safe manager for a list of temp views, providing atomic 
operations to manage temp views.
--- End diff --

yea good idea


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77808532
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -189,31 +189,39 @@ case class DropTableCommand(
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
-if (!catalog.tableExists(tableName)) {
-  if (!ifExists) {
-val objectName = if (isView) "View" else "Table"
-throw new AnalysisException(s"$objectName to drop '$tableName' 
does not exist")
-  }
-} else {
-  // If the command DROP VIEW is to drop a table or DROP TABLE is to 
drop a view
-  // issue an exception.
-  catalog.getTableMetadataOption(tableName).map(_.tableType match {
-case CatalogTableType.VIEW if !isView =>
-  throw new AnalysisException(
-"Cannot drop a view with DROP TABLE. Please use DROP VIEW 
instead")
-case o if o != CatalogTableType.VIEW && isView =>
-  throw new AnalysisException(
-s"Cannot drop a table with DROP VIEW. Please use DROP TABLE 
instead")
-case _ =>
-  })
-  try {
-sparkSession.sharedState.cacheManager.uncacheQuery(
-  sparkSession.table(tableName.quotedString))
-  } catch {
-case NonFatal(e) => log.warn(e.toString, e)
+
+// If the table name contains database part, we should drop a 
metastore table directly,
+// otherwise, try to drop a temp view first, if that not exist, drop 
metastore table.
+val dropMetastoreTable =
+  tableName.database.isDefined || 
!catalog.dropTempView(tableName.table)
--- End diff --

Actually I noticed this and fixed it before, but it breaks a lot of tests. 
I'd like to keep this behaviour as it was, we can discuss how to fix it in 
follow-ups.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77765007
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala ---
@@ -134,11 +134,26 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   }
 
   /**
-   * Returns a list of columns for the given table in the current database.
+   * Returns a list of columns for the temp view matching the given name, 
or for the given table in
+   * the current database.
*/
   @throws[AnalysisException]("table does not exist")
   override def listColumns(tableName: String): Dataset[Column] = {
--- End diff --

We are having a [test 
case](https://github.com/apache/spark/blob/c0ae6bc6ea38909730fad36e653d3c7ab0a84b44/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala#L239-L242)
 for `listColumns` on temporary views. However, it does not check results. 
Maybe we can correct it in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77761938
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/TempViewManager.scala
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.catalog
+
+import javax.annotation.concurrent.GuardedBy
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.AnalysisException
+import 
org.apache.spark.sql.catalyst.analysis.TempViewAlreadyExistsException
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util.StringUtils
+
+
+/**
+ * A thread-safe manager for a list of temp views, providing atomic 
operations to manage temp views.
--- End diff --

In the description of `TempViewManager`, could we mention the name of temp 
view is always case sensitive? The caller is responsible for handling 
case-related issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77756736
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -159,12 +171,13 @@ case class AlterTableRenameCommand(
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
 DDLUtils.verifyAlterTableType(catalog, oldName, isView)
-// If this is a temp view, just rename the view.
-// Otherwise, if this is a real table, we also need to uncache and 
invalidate the table.
-val isTemporary = catalog.isTemporaryTable(oldName)
-if (isTemporary) {
-  catalog.renameTable(oldName, newName)
-} else {
+
+// If the old table name contains database part, we should rename a 
metastore table directly,
+// otherwise, try to rename a temp view first, if that not exists, 
rename a metastore table.
+val renameMetastoreTable =
+  oldName.database.isDefined || !catalog.renameTempView(oldName.table, 
newName)
--- End diff --

Here, we also need to check if it is VIEW before trying to drop a temp view.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77756537
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -95,12 +95,12 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
   e = intercept[AnalysisException] {
 sql(s"""LOAD DATA LOCAL INPATH "$testData" INTO TABLE $viewName""")
   }.getMessage
-  assert(e.contains(s"Target table in LOAD DATA cannot be temporary: 
`$viewName`"))
+  assert(e.contains(s"Target table in LOAD DATA does not exist: 
`$viewName`"))
--- End diff --


https://github.com/apache/spark/blob/c0ae6bc6ea38909730fad36e653d3c7ab0a84b44/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L218-L223

Before this PR, `tableExists` checks the temp table, but 
`getTableMetadataOption` does not check it. Thus, instead of changing the test 
case, we need to change the impl of `LoadDataCommand` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77756261
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -246,33 +246,23 @@ class SessionCatalog(
   }
 
   /**
-   * Retrieve the metadata of an existing metastore table.
-   * If no database is specified, assume the table is in the current 
database.
-   * If the specified table is not found in the database then a 
[[NoSuchTableException]] is thrown.
+   * Retrieve the metadata of an existing metastore table/view.
+   * If no database is specified, assume the table/view is in the current 
database.
+   * If the specified table/view is not found in the database then a 
[[NoSuchTableException]] is
+   * thrown.
*/
   def getTableMetadata(name: TableIdentifier): CatalogTable = {
 val db = 
formatDatabaseName(name.database.getOrElse(getCurrentDatabase))
 val table = formatTableName(name.table)
-val tid = TableIdentifier(table)
-if (isTemporaryTable(name)) {
-  CatalogTable(
-identifier = tid,
-tableType = CatalogTableType.VIEW,
-storage = CatalogStorageFormat.empty,
-schema = tempTables(table).output.toStructType,
-properties = Map(),
-viewText = None)
-} else {
-  requireDbExists(db)
-  requireTableExists(TableIdentifier(table, Some(db)))
-  externalCatalog.getTable(db, table)
-}
+requireDbExists(db)
+requireTableExists(TableIdentifier(table, Some(db)))
+externalCatalog.getTable(db, table)
   }
 
   /**
-   * Retrieve the metadata of an existing metastore table.
+   * Retrieve the metadata of an existing metastore table/view.
* If no database is specified, assume the table is in the current 
database.
-   * If the specified table is not found in the database then return None 
if it doesn't exist.
+   * If the specified table/view is not found in the database then return 
None if it doesn't exist.
*/
   def getTableMetadataOption(name: TableIdentifier): Option[CatalogTable] 
= {
--- End diff --

`getTableMetadataOption` does not check the temp view, but 
`getTableMetadata` does check it... We might have more bugs...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77753115
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -189,31 +189,39 @@ case class DropTableCommand(
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
-if (!catalog.tableExists(tableName)) {
-  if (!ifExists) {
-val objectName = if (isView) "View" else "Table"
-throw new AnalysisException(s"$objectName to drop '$tableName' 
does not exist")
-  }
-} else {
-  // If the command DROP VIEW is to drop a table or DROP TABLE is to 
drop a view
-  // issue an exception.
-  catalog.getTableMetadataOption(tableName).map(_.tableType match {
-case CatalogTableType.VIEW if !isView =>
-  throw new AnalysisException(
-"Cannot drop a view with DROP TABLE. Please use DROP VIEW 
instead")
-case o if o != CatalogTableType.VIEW && isView =>
-  throw new AnalysisException(
-s"Cannot drop a table with DROP VIEW. Please use DROP TABLE 
instead")
-case _ =>
-  })
-  try {
-sparkSession.sharedState.cacheManager.uncacheQuery(
-  sparkSession.table(tableName.quotedString))
-  } catch {
-case NonFatal(e) => log.warn(e.toString, e)
+
+// If the table name contains database part, we should drop a 
metastore table directly,
+// otherwise, try to drop a temp view first, if that not exist, drop 
metastore table.
+val dropMetastoreTable =
+  tableName.database.isDefined || 
!catalog.dropTempView(tableName.table)
--- End diff --

`Drop Table` is unable to drop a temp view, right? 
```SQL
spark.range(10).createTempView("tempView")
sql("DESC tempView").show()
sql("DROP TABLE tempView")
sql("DESC tempView").show()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread clockfly
Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77745578
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -72,9 +72,7 @@ class SessionCatalog(
 this(externalCatalog, new SimpleFunctionRegistry, new 
SimpleCatalystConf(true))
   }
 
-  /** List of temporary tables, mapping from table name to their logical 
plan. */
-  @GuardedBy("this")
-  protected val tempTables = new mutable.HashMap[String, LogicalPlan]
+  private val tempViews = new TempViewManager
--- End diff --

Since the goal of this PR is to add some view related API. So I think 
refactoring using TempViewManager is not the major goal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77596625
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -72,9 +72,7 @@ class SessionCatalog(
 this(externalCatalog, new SimpleFunctionRegistry, new 
SimpleCatalystConf(true))
   }
 
-  /** List of temporary tables, mapping from table name to their logical 
plan. */
-  @GuardedBy("this")
-  protected val tempTables = new mutable.HashMap[String, LogicalPlan]
+  private val tempViews = new TempViewManager
--- End diff --

Why not just name it `tempViewManager`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77593859
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -72,9 +72,7 @@ class SessionCatalog(
 this(externalCatalog, new SimpleFunctionRegistry, new 
SimpleCatalystConf(true))
   }
 
-  /** List of temporary tables, mapping from table name to their logical 
plan. */
-  @GuardedBy("this")
-  protected val tempTables = new mutable.HashMap[String, LogicalPlan]
+  private val tempViews = new TempViewManager
--- End diff --

I think it's easier to implement and reason about the thread-safe semantic 
for temp views if we put temp view management into one place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77583701
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -325,180 +355,130 @@ class SessionCatalog(
 new Path(new Path(dbLocation), 
formatTableName(tableIdent.table)).toString
   }
 
-  // -
-  // | Methods that interact with temporary and metastore tables |
-  // -
+  // --
+  // | Methods that interact with temporary views |
+  // --
 
   /**
-   * Create a temporary table.
+   * Create a temporary view.
*/
   def createTempView(
   name: String,
-  tableDefinition: LogicalPlan,
-  overrideIfExists: Boolean): Unit = synchronized {
--- End diff --

yea, now we let `TempViewManager` to implement the thread-safe semantic


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77582577
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -325,180 +355,130 @@ class SessionCatalog(
 new Path(new Path(dbLocation), 
formatTableName(tableIdent.table)).toString
   }
 
-  // -
-  // | Methods that interact with temporary and metastore tables |
-  // -
+  // --
+  // | Methods that interact with temporary views |
+  // --
 
   /**
-   * Create a temporary table.
+   * Create a temporary view.
*/
   def createTempView(
   name: String,
-  tableDefinition: LogicalPlan,
-  overrideIfExists: Boolean): Unit = synchronized {
--- End diff --

If we change it back to `HashMap`, we need to add `synchronized` back. Is 
my understanding right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77560988
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -72,9 +72,7 @@ class SessionCatalog(
 this(externalCatalog, new SimpleFunctionRegistry, new 
SimpleCatalystConf(true))
   }
 
-  /** List of temporary tables, mapping from table name to their logical 
plan. */
-  @GuardedBy("this")
-  protected val tempTables = new mutable.HashMap[String, LogicalPlan]
+  private val tempViews = new TempViewManager
--- End diff --

What is the reason? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-05 Thread clockfly
Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77560363
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -72,9 +72,7 @@ class SessionCatalog(
 this(externalCatalog, new SimpleFunctionRegistry, new 
SimpleCatalystConf(true))
   }
 
-  /** List of temporary tables, mapping from table name to their logical 
plan. */
-  @GuardedBy("this")
-  protected val tempTables = new mutable.HashMap[String, LogicalPlan]
+  private val tempViews = new TempViewManager
--- End diff --

Can we avoid adding TempViewManager?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-05 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/14962

[SPARK-17402][SQL] separate the management of temp views and metastore 
tables/views in SessionCatalog

## What changes were proposed in this pull request?

In `SessionCatalog`, we have several operations(`getTableMetadata`, 
`tableExists`, `renameTable`, `dropTable`) that handle both temp views and 
metastore tables/views. They can save some lines of code for some commands that 
need to deal with both temp views and metastore tables/views, but also 
introduce bugs for other commands, because the operation names say nothing 
about temp views and are very likely to be misused:

* `DataFrameWriter.saveAsTable`/`CREATE TABLE USING` will fail if a 
same-name temp view exits
* `Catalog.dropTempView` may drop metastore table mistakenly
* `ALTER TABLE RECOVER PARTITIONS`/`LOAD DATA`/`TRUNCATE TABLE`/`SHOW 
CREATE TABLE` should report "table not found" instead of "temp view is not 
supported", if a same-name temp view exists, because these commands don't need 
to deal with temp views.

In some commands we support temp views mistakenly without mentioning it in 
document: `ShowColumnsCommand`, `Catalog.listColumns`.

Mixing the handling of temp views and metastore tables/views also makes it 
harder to implement thread-safe operations. e.g. `AlterViewAsCommand` checks 
`isTemporaryTable` first then `createTempView`, which is not atomic. Most temp 
view related operations in `SessionCatalog` hold a lock on the `SessionCatalog` 
object, which is unnecessary.


This PR separates the management of temp views and metastore tables/views 
in `SessionCatalog`, any commands that need to deal with temp views should 
explicitly call temp view related operations in `SessionCatalog`, to fix 
existing bugs and prevent future bug like this.

## How was this patch tested?

existing tests and 3 new tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark temp-view

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14962.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14962


commit ebed732367a9949fe3f3d3df53a6798a63670064
Author: Wenchen Fan 
Date:   2016-09-01T08:29:50Z

separate the management of temp views and metastore tables/views in 
SessionCatalog




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org