[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22263 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r226534798 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -325,6 +325,21 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(isExpectStorageLevel(rddId, Memory)) } + test("SQL interface support storageLevel(Invalid StorageLevel)") { +val message = intercept[IllegalArgumentException] { + sql("CACHE TABLE testData OPTIONS('storageLevel' 'invalid_storage_level')") +}.getMessage +assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL")) + } + + test("SQL interface support storageLevel(with LAZY)") { +sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +sql("SELECT COUNT(*) FROM testData").collect() +assert(isExpectStorageLevel(rddId, Disk)) --- End diff -- Do you think the previously existing `lazy`-related test cases protect this new SQL syntax contribution from future regressions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r226514429 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -325,6 +325,21 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(isExpectStorageLevel(rddId, Memory)) } + test("SQL interface support storageLevel(Invalid StorageLevel)") { +val message = intercept[IllegalArgumentException] { + sql("CACHE TABLE testData OPTIONS('storageLevel' 'invalid_storage_level')") +}.getMessage +assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL")) + } + + test("SQL interface support storageLevel(with LAZY)") { +sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +sql("SELECT COUNT(*) FROM testData").collect() +assert(isExpectStorageLevel(rddId, Disk)) --- End diff -- Yes, I think this test cast is mainly to assert that `storageLevel` works with `lazy`, not `lazy` itself. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r226380114 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -64,6 +67,13 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext maybeBlock.nonEmpty } + def isExpectStorageLevel(rddId: Int, level: DataReadMethod.DataReadMethod): Boolean = { --- End diff -- `DataReadMethod.DataReadMethod` -> `DataReadMethod` and remove line 25, `import org.apache.spark.executor.DataReadMethod`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r226379468 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -325,6 +325,21 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(isExpectStorageLevel(rddId, Memory)) } + test("SQL interface support storageLevel(Invalid StorageLevel)") { +val message = intercept[IllegalArgumentException] { + sql("CACHE TABLE testData OPTIONS('storageLevel' 'invalid_storage_level')") +}.getMessage +assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL")) + } + + test("SQL interface support storageLevel(with LAZY)") { +sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +sql("SELECT COUNT(*) FROM testData").collect() +assert(isExpectStorageLevel(rddId, Disk)) --- End diff -- Although the commit message is a revert, this is not a revert. We lost the previous test coverage. Is this intentional? ``` "Lazily cached in-memory table shouldn't be materialized eagerly" ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r226215310 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { --- End diff -- Can we do the following? Then, those 4 function will have one-line body. 1. Let's remove `tableName: String,`. We can use the same table name. 2. With (1), we can remove the common prefix, `CACHE TABLE testData OPTIONS(`, too. 3. With (1) and (2), we can change `sqlStr: String` -> `options: String`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r226187423 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { --- End diff -- Thanks @dongjoon-hyun How about: ```scala def assertStorageLevel( sqlStr: String, tableName: String, level: DataReadMethod.DataReadMethod): Unit = { sql(sqlStr) assertCached(spark.table(tableName)) val rddId = rddIdOf(tableName) assert(isExpectStorageLevel(rddId, level)) } test("SQL interface support storageLevel(DISK_ONLY)") { assertStorageLevel( "CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')", "testData", Disk) } test("SQL interface cache SELECT ... support storageLevel(DISK_ONLY)") { withTempView("testCacheSelect") { assertStorageLevel( "CACHE TABLE testCacheSelect OPTIONS('storageLevel' 'DISK_ONLY') SELECT * FROM testData", "testCacheSelect", Disk) } } test("SQL interface support storageLevel(DISK_ONLY) with invalid options") { assertStorageLevel( "CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY', 'a' '1', 'b' '2')", "testData", Disk) } test("SQL interface support storageLevel(MEMORY_ONLY)") { assertStorageLevel( "CACHE TABLE testData OPTIONS('storageLevel' 'MEMORY_ONLY')", "testData", Memory) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r226184242 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { --- End diff -- @wangyum . Originally, I thought you can make a utility function for those 4 test cases. The utility function can accept `sql` statement and `storagelevel`. Then, 4 test cases will have simpler body. > They are mostly the same except SQL statements and StorageLevels. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r226182333 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { --- End diff -- @dongjoon-hyun Such test cases don't seem to be clear. Can we use the original 4 test cases? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r225762035 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { --- End diff -- How about this: ```scala Seq("LAZY", "").foreach { isLazy => Seq(true, false).foreach { withInvalidOptions => Seq(true, false).foreach { withCacheTempView => Map("DISK_ONLY" -> Disk, "MEMORY_ONLY" -> Memory).foreach { case (storageLevel, dataReadMethod) => val testName = s"SQL interface support option: storageLevel: $storageLevel, " + s"isLazy: ${isLazy.equals("LAZY")}, " + s"withInvalidOptions: $withInvalidOptions, withCacheTempView: $withCacheTempView" val cacheOption = if (withInvalidOptions) { s"OPTIONS('storageLevel' '$storageLevel', 'a' '1', 'b' '2')" } else { s"OPTIONS('storageLevel' '$storageLevel')" } test(testName) { if (withCacheTempView) { withTempView("testSelect") { sql(s"CACHE $isLazy TABLE testSelect $cacheOption SELECT * FROM testData") assertCached(spark.table("testSelect")) val rddId = rddIdOf("testSelect") if (isLazy.equals("LAZY")) { sql("SELECT COUNT(*) FROM testSelect").collect() } assert(isExpectStorageLevel(rddId, dataReadMethod)) } } else { sql(s"CACHE $isLazy TABLE testData $cacheOption") assertCached(spark.table("testData")) val rddId = rddIdOf("testData") if (isLazy.equals("LAZY")) { sql("SELECT COUNT(*) FROM testData").collect() } assert(isExpectStorageLevel(rddId, dataReadMethod)) } } } } } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r225619883 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -328,34 +325,6 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(isExpectStorageLevel(rddId, Memory)) } - test("SQL interface support storageLevel(Invalid StorageLevel)") { --- End diff -- Sorry, but why do you remove these unique ones ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r225608012 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { --- End diff -- @wangyum . Could you refactor the following four positive test cases by reducing the duplication? They are mostly the same except SQL statements and `StorageLevel`s. ```scala test("SQL interface support storageLevel(DISK_ONLY)") test("SQL interface select from table support storageLevel(DISK_ONLY)") test("SQL interface support storageLevel(DISK_ONLY) with invalid options") test("SQL interface support storageLevel(MEMORY_ONLY)") ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r225605943 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { +sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +assert(isExpectStorageLevel(rddId, Disk)) +spark.catalog.uncacheTable("testData") + } + + test("SQL interface select from table support storageLevel(DISK_ONLY)") { +sql("CACHE TABLE testSelect OPTIONS('storageLevel' 'DISK_ONLY') SELECT * FROM testData") +assertCached(spark.table("testSelect")) +val rddId = rddIdOf("testSelect") +assert(isExpectStorageLevel(rddId, Disk)) +spark.catalog.uncacheTable("testSelect") + } + + test("SQL interface support storageLevel(DISK_ONLY) with invalid options") { +sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY', 'a' '1', 'b' '2')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +assert(isExpectStorageLevel(rddId, Disk)) +spark.catalog.uncacheTable("testData") --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r225605534 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { +sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +assert(isExpectStorageLevel(rddId, Disk)) +spark.catalog.uncacheTable("testData") + } + + test("SQL interface select from table support storageLevel(DISK_ONLY)") { +sql("CACHE TABLE testSelect OPTIONS('storageLevel' 'DISK_ONLY') SELECT * FROM testData") +assertCached(spark.table("testSelect")) +val rddId = rddIdOf("testSelect") +assert(isExpectStorageLevel(rddId, Disk)) +spark.catalog.uncacheTable("testSelect") --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r225605478 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { +sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +assert(isExpectStorageLevel(rddId, Disk)) +spark.catalog.uncacheTable("testData") --- End diff -- Let's remove `uncacheTable` since we have `spark.catalog.clearCache()` in `afterEach()`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r225005803 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,69 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { +sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +assert(isExpectStorageLevel(rddId, Disk)) +assert(!isExpectStorageLevel(rddId, Memory)) --- End diff -- Do we need line 305 when we have line 304? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r224977190 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,68 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { +sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +assert(isExpectStorageLevel(rddId, Disk)) +assert(!isExpectStorageLevel(rddId, Memory)) +spark.catalog.uncacheTable("testData") + } + + test("SQL interface select from table support storageLevel(DISK_ONLY)") { +sql("CACHE TABLE testSelect OPTIONS('storageLevel' 'DISK_ONLY') select * from testData") +assertCached(spark.table("testSelect")) +val rddId = rddIdOf("testSelect") +assert(isExpectStorageLevel(rddId, Disk)) +assert(!isExpectStorageLevel(rddId, Memory)) +spark.catalog.uncacheTable("testSelect") + } + + test("SQL interface support storageLevel(DISK_ONLY) with invalid options") { +sql("CACHE TABLE testData OPTIONS('storageLevel' 'DISK_ONLY', 'a' '1', 'b' '2')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +assert(isExpectStorageLevel(rddId, Disk)) +assert(!isExpectStorageLevel(rddId, Memory)) +spark.catalog.uncacheTable("testData") + } + + test("SQL interface support storageLevel(MEMORY_ONLY)") { +sql("CACHE TABLE testData OPTIONS('storageLevel' 'MEMORY_ONLY')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +assert(!isExpectStorageLevel(rddId, Disk)) +assert(isExpectStorageLevel(rddId, Memory)) + } + + test("SQL interface support storageLevel(Invalid StorageLevel)") { +val message = intercept[IllegalArgumentException] { + sql("CACHE TABLE testData OPTIONS('storageLevel' 'invalid_storage_level')") +}.getMessage +assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL")) + } + + test("SQL interface support storageLevel(with LAZY)") { +sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')") +assertCached(spark.table("testData")) + +val rddId = rddIdOf("testData") +assert( + !isMaterialized(rddId), + "Lazily cached in-memory table shouldn't be materialized eagerly") + +sql("SELECT COUNT(*) FROM testData").collect() +assert( + isMaterialized(rddId), + "Lazily cached in-memory table should have been materialized") + --- End diff -- We didn't check the storage level in this test case. Add `isExpectStorageLevel` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r219490742 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -162,7 +162,8 @@ statement tableIdentifier partitionSpec? describeColName? #describeTable | REFRESH TABLE tableIdentifier #refreshTable | REFRESH (STRING | .*?) #refreshResource -| CACHE LAZY? TABLE tableIdentifier (AS? query)? #cacheTable +| CACHE LAZY? storageLevel=identifier? TABLE +tableIdentifier (AS? query)? #cacheTable --- End diff -- Thanks @maropu, I changed to option. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r213904343 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -162,7 +162,8 @@ statement tableIdentifier partitionSpec? describeColName? #describeTable | REFRESH TABLE tableIdentifier #refreshTable | REFRESH (STRING | .*?) #refreshResource -| CACHE LAZY? TABLE tableIdentifier (AS? query)? #cacheTable +| CACHE LAZY? storageLevel=identifier? TABLE +tableIdentifier (AS? query)? #cacheTable --- End diff -- As another syntax option, how about this? ``` CACHE TABLE testTable OPTIONS ('storageLevel'='disk_only') UNCACHE TABLE testTable OPTION ('cascade'='true', 'blocking'='true') ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r213708297 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala --- @@ -17,16 +17,19 @@ package org.apache.spark.sql.execution.command +import java.util.Locale + import org.apache.spark.sql.{Dataset, Row, SparkSession} import org.apache.spark.sql.catalyst.TableIdentifier -import org.apache.spark.sql.catalyst.analysis.NoSuchTableException --- End diff -- nit: I saw comments against change like this one in order to avoid issue in backporting the ticket. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r213706428 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -272,7 +272,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { throw new ParseException(s"It is not allowed to add database prefix `$database` to " + s"the table name in CACHE TABLE AS SELECT", ctx) } -CacheTableCommand(tableIdent, query, ctx.LAZY != null) +CacheTableCommand(tableIdent, query, ctx.LAZY != null, Option(ctx.storageLevel).map(source)) --- End diff -- why `source` instead of `getText`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r213568423 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -64,6 +66,13 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext maybeBlock.nonEmpty } + def isExpectStorageLevel(rddId: Int, level: DataReadMethod.DataReadMethod): Boolean = { +val maybeBlock = sparkContext.env.blockManager.get(RDDBlockId(rddId, 0)) --- End diff -- shall we also check that the block is present? If nothing is cached this would (wrongly IMHO) return true. I see that since you are checking `isMaterialized` this doesn't happen in your test cases, but this function may be reused in the future so I think it'd be better to have it behaving consistently to its name. Do you agree? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/22263 [SPARK-25269][SQL] SQL interface support specify StorageLevel when cache table ## What changes were proposed in this pull request? SQL interface support specify `StorageLevel` when cache table. The semantic is like this: ```sql CACHE DISK_ONLY TABLE tableName; ``` All supported `StorageLevel` is: https://github.com/apache/spark/blob/eefdf9f9dd8afde49ad7d4e230e2735eb817ab0a/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L172-L183 ## How was this patch tested? unit tests and manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-25269 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22263.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22263 commit 9e058d1b402dec85982a880bf086268a1dcec99e Author: Yuming Wang Date: 2018-08-29T06:31:44Z SQL interface support specify StorageLevel when cache table --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org