[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22263


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r226534798
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -325,6 +325,21 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 assert(isExpectStorageLevel(rddId, Memory))
   }
 
+  test("SQL interface support storageLevel(Invalid StorageLevel)") {
+val message = intercept[IllegalArgumentException] {
+  sql("CACHE TABLE testData OPTIONS('storageLevel' 
'invalid_storage_level')")
+}.getMessage
+assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL"))
+  }
+
+  test("SQL interface support storageLevel(with LAZY)") {
+sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+sql("SELECT COUNT(*) FROM testData").collect()
+assert(isExpectStorageLevel(rddId, Disk))
--- End diff --

Do you think the previously existing `lazy`-related test cases protect this 
new SQL syntax contribution from future regressions?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-18 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r226514429
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -325,6 +325,21 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 assert(isExpectStorageLevel(rddId, Memory))
   }
 
+  test("SQL interface support storageLevel(Invalid StorageLevel)") {
+val message = intercept[IllegalArgumentException] {
+  sql("CACHE TABLE testData OPTIONS('storageLevel' 
'invalid_storage_level')")
+}.getMessage
+assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL"))
+  }
+
+  test("SQL interface support storageLevel(with LAZY)") {
+sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+sql("SELECT COUNT(*) FROM testData").collect()
+assert(isExpectStorageLevel(rddId, Disk))
--- End diff --

Yes, I think this test cast is mainly to assert that `storageLevel` works 
with `lazy`, not `lazy` itself. WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r226380114
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -64,6 +67,13 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 maybeBlock.nonEmpty
   }
 
+  def isExpectStorageLevel(rddId: Int, level: 
DataReadMethod.DataReadMethod): Boolean = {
--- End diff --

`DataReadMethod.DataReadMethod` -> `DataReadMethod` and 
remove line 25, `import org.apache.spark.executor.DataReadMethod`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r226379468
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -325,6 +325,21 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 assert(isExpectStorageLevel(rddId, Memory))
   }
 
+  test("SQL interface support storageLevel(Invalid StorageLevel)") {
+val message = intercept[IllegalArgumentException] {
+  sql("CACHE TABLE testData OPTIONS('storageLevel' 
'invalid_storage_level')")
+}.getMessage
+assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL"))
+  }
+
+  test("SQL interface support storageLevel(with LAZY)") {
+sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+sql("SELECT COUNT(*) FROM testData").collect()
+assert(isExpectStorageLevel(rddId, Disk))
--- End diff --

Although the commit message is a revert, this is not a revert. We lost the 
previous test coverage. Is this intentional?
```
"Lazily cached in-memory table shouldn't be materialized eagerly"
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r226215310
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
--- End diff --

Can we do the following? Then, those 4 function will have one-line body.
1. Let's remove `tableName: String,`. We can use the same table name.
2. With (1), we can remove the common prefix, `CACHE TABLE testData 
OPTIONS(`, too.
3. With (1) and (2), we can change `sqlStr: String` -> `options: String`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-18 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r226187423
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
--- End diff --

Thanks @dongjoon-hyun  How about:
```scala

  def assertStorageLevel(
  sqlStr: String,
  tableName: String,
  level: DataReadMethod.DataReadMethod): Unit = {
sql(sqlStr)
assertCached(spark.table(tableName))
val rddId = rddIdOf(tableName)
assert(isExpectStorageLevel(rddId, level))
  }

  test("SQL interface support storageLevel(DISK_ONLY)") {
assertStorageLevel(
  "CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')",
  "testData",
  Disk)
  }

  test("SQL interface cache SELECT ... support storageLevel(DISK_ONLY)") {
withTempView("testCacheSelect") {
  assertStorageLevel(
"CACHE TABLE testCacheSelect OPTIONS('storageLevel' 'DISK_ONLY') 
SELECT * FROM testData",
"testCacheSelect",
Disk)
}
  }

  test("SQL interface support storageLevel(DISK_ONLY) with invalid 
options") {
assertStorageLevel(
  "CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY', 'a' '1', 
'b' '2')",
  "testData",
  Disk)
  }

  test("SQL interface support storageLevel(MEMORY_ONLY)") {
assertStorageLevel(
  "CACHE TABLE testData OPTIONS('storageLevel' 'MEMORY_ONLY')",
  "testData",
  Memory)
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r226184242
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
--- End diff --

@wangyum . Originally, I thought you can make a utility function for those 
4 test cases. 
The utility function can accept `sql` statement and `storagelevel`. Then, 4 
test cases will have simpler body.
> They are mostly the same except SQL statements and StorageLevels.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-17 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r226182333
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
--- End diff --

@dongjoon-hyun  Such test cases don't seem to be clear. Can we use the 
original 4 test cases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-16 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r225762035
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
--- End diff --

How about this:
```scala
Seq("LAZY", "").foreach { isLazy =>
  Seq(true, false).foreach { withInvalidOptions =>
Seq(true, false).foreach { withCacheTempView =>
  Map("DISK_ONLY" -> Disk, "MEMORY_ONLY" -> Memory).foreach {
case (storageLevel, dataReadMethod) =>
  val testName = s"SQL interface support option: storageLevel: 
$storageLevel, " +
s"isLazy: ${isLazy.equals("LAZY")}, " +
s"withInvalidOptions: $withInvalidOptions, withCacheTempView: 
$withCacheTempView"
  val cacheOption = if (withInvalidOptions) {
s"OPTIONS('storageLevel' '$storageLevel', 'a' '1', 'b' '2')"
  } else {
s"OPTIONS('storageLevel' '$storageLevel')"
  }
  test(testName) {
if (withCacheTempView) {
  withTempView("testSelect") {
sql(s"CACHE $isLazy TABLE testSelect $cacheOption SELECT * 
FROM testData")
assertCached(spark.table("testSelect"))
val rddId = rddIdOf("testSelect")
if (isLazy.equals("LAZY")) {
  sql("SELECT COUNT(*) FROM testSelect").collect()
}
assert(isExpectStorageLevel(rddId, dataReadMethod))
  }
} else {
  sql(s"CACHE $isLazy TABLE testData $cacheOption")
  assertCached(spark.table("testData"))
  val rddId = rddIdOf("testData")
  if (isLazy.equals("LAZY")) {
sql("SELECT COUNT(*) FROM testData").collect()
  }
  assert(isExpectStorageLevel(rddId, dataReadMethod))
}
  }
  }
}
  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r225619883
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -328,34 +325,6 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 assert(isExpectStorageLevel(rddId, Memory))
   }
 
-  test("SQL interface support storageLevel(Invalid StorageLevel)") {
--- End diff --

Sorry, but why do you remove these unique ones ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r225608012
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
--- End diff --

@wangyum . Could you refactor the following four positive test cases by 
reducing the duplication? They are mostly the same except SQL statements and 
`StorageLevel`s.
```scala
test("SQL interface support storageLevel(DISK_ONLY)")
test("SQL interface select from table support storageLevel(DISK_ONLY)")
test("SQL interface support storageLevel(DISK_ONLY) with invalid options")
test("SQL interface support storageLevel(MEMORY_ONLY)")
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r225605943
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
+sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+assert(isExpectStorageLevel(rddId, Disk))
+spark.catalog.uncacheTable("testData")
+  }
+
+  test("SQL interface select from table support storageLevel(DISK_ONLY)") {
+sql("CACHE TABLE testSelect OPTIONS('storageLevel' 'DISK_ONLY') SELECT 
* FROM testData")
+assertCached(spark.table("testSelect"))
+val rddId = rddIdOf("testSelect")
+assert(isExpectStorageLevel(rddId, Disk))
+spark.catalog.uncacheTable("testSelect")
+  }
+
+  test("SQL interface support storageLevel(DISK_ONLY) with invalid 
options") {
+sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY', 'a' '1', 
'b' '2')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+assert(isExpectStorageLevel(rddId, Disk))
+spark.catalog.uncacheTable("testData")
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r225605534
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
+sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+assert(isExpectStorageLevel(rddId, Disk))
+spark.catalog.uncacheTable("testData")
+  }
+
+  test("SQL interface select from table support storageLevel(DISK_ONLY)") {
+sql("CACHE TABLE testSelect OPTIONS('storageLevel' 'DISK_ONLY') SELECT 
* FROM testData")
+assertCached(spark.table("testSelect"))
+val rddId = rddIdOf("testSelect")
+assert(isExpectStorageLevel(rddId, Disk))
+spark.catalog.uncacheTable("testSelect")
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r225605478
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
+sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+assert(isExpectStorageLevel(rddId, Disk))
+spark.catalog.uncacheTable("testData")
--- End diff --

Let's remove `uncacheTable` since we have `spark.catalog.clearCache()` in 
`afterEach()`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r225005803
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,69 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
+sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+assert(isExpectStorageLevel(rddId, Disk))
+assert(!isExpectStorageLevel(rddId, Memory))
--- End diff --

Do we need line 305 when we have line 304?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r224977190
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,68 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
+sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+assert(isExpectStorageLevel(rddId, Disk))
+assert(!isExpectStorageLevel(rddId, Memory))
+spark.catalog.uncacheTable("testData")
+  }
+
+  test("SQL interface select from table support storageLevel(DISK_ONLY)") {
+sql("CACHE TABLE testSelect OPTIONS('storageLevel' 'DISK_ONLY') select 
* from testData")
+assertCached(spark.table("testSelect"))
+val rddId = rddIdOf("testSelect")
+assert(isExpectStorageLevel(rddId, Disk))
+assert(!isExpectStorageLevel(rddId, Memory))
+spark.catalog.uncacheTable("testSelect")
+  }
+
+  test("SQL interface support storageLevel(DISK_ONLY) with invalid 
options") {
+sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY', 'a' '1', 
'b' '2')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+assert(isExpectStorageLevel(rddId, Disk))
+assert(!isExpectStorageLevel(rddId, Memory))
+spark.catalog.uncacheTable("testData")
+  }
+
+  test("SQL interface support storageLevel(MEMORY_ONLY)") {
+sql("CACHE TABLE testData OPTIONS('storageLevel' 'MEMORY_ONLY')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+assert(!isExpectStorageLevel(rddId, Disk))
+assert(isExpectStorageLevel(rddId, Memory))
+  }
+
+  test("SQL interface support storageLevel(Invalid StorageLevel)") {
+val message = intercept[IllegalArgumentException] {
+  sql("CACHE TABLE testData OPTIONS('storageLevel' 
'invalid_storage_level')")
+}.getMessage
+assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL"))
+  }
+
+  test("SQL interface support storageLevel(with LAZY)") {
+sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')")
+assertCached(spark.table("testData"))
+
+val rddId = rddIdOf("testData")
+assert(
+  !isMaterialized(rddId),
+  "Lazily cached in-memory table shouldn't be materialized eagerly")
+
+sql("SELECT COUNT(*) FROM testData").collect()
+assert(
+  isMaterialized(rddId),
+  "Lazily cached in-memory table should have been materialized")
+
--- End diff --

We didn't check the storage level in this test case.
Add `isExpectStorageLevel` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-09-21 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r219490742
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -162,7 +162,8 @@ statement
 tableIdentifier partitionSpec? describeColName?
#describeTable
 | REFRESH TABLE tableIdentifier
#refreshTable
 | REFRESH (STRING | .*?)   
#refreshResource
-| CACHE LAZY? TABLE tableIdentifier (AS? query)?   
#cacheTable
+| CACHE LAZY? storageLevel=identifier? TABLE
+tableIdentifier (AS? query)?   
#cacheTable
--- End diff --

Thanks @maropu, I changed to option.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-08-29 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r213904343
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -162,7 +162,8 @@ statement
 tableIdentifier partitionSpec? describeColName?
#describeTable
 | REFRESH TABLE tableIdentifier
#refreshTable
 | REFRESH (STRING | .*?)   
#refreshResource
-| CACHE LAZY? TABLE tableIdentifier (AS? query)?   
#cacheTable
+| CACHE LAZY? storageLevel=identifier? TABLE
+tableIdentifier (AS? query)?   
#cacheTable
--- End diff --

As another syntax option, how about this?
```
CACHE TABLE testTable OPTIONS ('storageLevel'='disk_only')
UNCACHE TABLE testTable OPTION ('cascade'='true', 'blocking'='true')
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-08-29 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r213708297
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala ---
@@ -17,16 +17,19 @@
 
 package org.apache.spark.sql.execution.command
 
+import java.util.Locale
+
 import org.apache.spark.sql.{Dataset, Row, SparkSession}
 import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
--- End diff --

nit: I saw comments against change like this one in order to avoid issue in 
backporting the ticket.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-08-29 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r213706428
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -272,7 +272,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
   throw new ParseException(s"It is not allowed to add database prefix 
`$database` to " +
 s"the table name in CACHE TABLE AS SELECT", ctx)
 }
-CacheTableCommand(tableIdent, query, ctx.LAZY != null)
+CacheTableCommand(tableIdent, query, ctx.LAZY != null, 
Option(ctx.storageLevel).map(source))
--- End diff --

why `source` instead of `getText`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-08-29 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r213568423
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -64,6 +66,13 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 maybeBlock.nonEmpty
   }
 
+  def isExpectStorageLevel(rddId: Int, level: 
DataReadMethod.DataReadMethod): Boolean = {
+val maybeBlock = sparkContext.env.blockManager.get(RDDBlockId(rddId, 
0))
--- End diff --

shall we also check that the block is present? If nothing is cached this 
would (wrongly IMHO) return true. I see that since you are checking 
`isMaterialized` this doesn't happen in your test cases, but this function may 
be reused in the future so I think it'd be better to have it behaving 
consistently to its name. Do you agree?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-08-28 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/22263

[SPARK-25269][SQL] SQL interface support specify StorageLevel when cache 
table

## What changes were proposed in this pull request?

SQL interface support specify `StorageLevel` when cache table. The semantic 
is like this:
```sql
CACHE DISK_ONLY TABLE tableName;
```
All supported `StorageLevel` is:

https://github.com/apache/spark/blob/eefdf9f9dd8afde49ad7d4e230e2735eb817ab0a/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L172-L183

## How was this patch tested?

unit tests and manual tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-25269

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22263


commit 9e058d1b402dec85982a880bf086268a1dcec99e
Author: Yuming Wang 
Date:   2018-08-29T06:31:44Z

SQL interface support specify StorageLevel when cache table




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org