[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220529510 Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220529183 thanks, merging to master and 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13156 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220528429 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58940/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220528428 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220528241 **[Test build #58940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58940/consoleFull)** for PR 13156 at commit [`20d5055`](https://github.com/apache/spark/commit/20d50556c6a3a4ca2d69f961822a2bb058edbbec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220517365 **[Test build #58940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58940/consoleFull)** for PR 13156 at commit [`20d5055`](https://github.com/apache/spark/commit/20d50556c6a3a4ca2d69f961822a2bb058edbbec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220516949 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220515502 LGTM, pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220513843 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220507235 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220507239 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58934/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220507152 **[Test build #58934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58934/consoleFull)** for PR 13156 at commit [`20d5055`](https://github.com/apache/spark/commit/20d50556c6a3a4ca2d69f961822a2bb058edbbec). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220500710 **[Test build #58934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58934/consoleFull)** for PR 13156 at commit [`20d5055`](https://github.com/apache/spark/commit/20d50556c6a3a4ca2d69f961822a2bb058edbbec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220500330 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220461395 **[Test build #58905 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58905/consoleFull)** for PR 13156 at commit [`20d5055`](https://github.com/apache/spark/commit/20d50556c6a3a4ca2d69f961822a2bb058edbbec). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220461429 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58905/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220461425 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220459388 **[Test build #58905 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58905/consoleFull)** for PR 13156 at commit [`20d5055`](https://github.com/apache/spark/commit/20d50556c6a3a4ca2d69f961822a2bb058edbbec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220458174 I see. `spark.sessionState.invalidateTable` already exists. They have the same implementation. Thus, I will just remove `spark.sessionState.refreshTable`? Let me do it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220424613 LGTM other than the renaming. We shouldn't have `spark.catalog.refreshTable` and `spark.sessionState.refreshTable` do different things. I would rename the latter to `invalidateTable` since that's what it's really doing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63937901 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala --- @@ -163,6 +163,9 @@ private[sql] class SessionState(sparkSession: SparkSession) { def executePlan(plan: LogicalPlan): QueryExecution = new QueryExecution(sparkSession, plan) def refreshTable(tableName: String): Unit = { +// Different from SparkSession.catalog.refreshTable, this API only refreshes the metadata. +// It does not reload the cached data. That means, if this table is cached as +// an InMemoryRelation, we do not refresh the cached data. --- End diff -- this is super confusing, the fact that `spark.catalog.refreshTable` and `spark.sessionState.refreshTable` do different things. Should we just rename this to `invalidateTable` along with `HiveMetastoreCatalog.refreshTable`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63820600 --- Diff: sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -58,4 +58,16 @@ class HiveContext private[hive]( sparkSession.sharedState.asInstanceOf[HiveSharedState] } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63820108 --- Diff: sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -58,4 +58,16 @@ class HiveContext private[hive]( sparkSession.sharedState.asInstanceOf[HiveSharedState] } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { --- End diff -- This class is for the compatibility purpose. Let's leave it as is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63817417 --- Diff: sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -58,4 +58,16 @@ class HiveContext private[hive]( sparkSession.sharedState.asInstanceOf[HiveSharedState] } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { --- End diff -- if `invalidateTable` has different meaning than `refreshTable`, should we also add it to `HiveContext`? cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220212872 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220212875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58831/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220212724 **[Test build #58831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58831/consoleFull)** for PR 13156 at commit [`2b773b8`](https://github.com/apache/spark/commit/2b773b823672199a685e765f5345ceb6584eb3d8). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEach ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220201874 **[Test build #58831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58831/consoleFull)** for PR 13156 at commit [`2b773b8`](https://github.com/apache/spark/commit/2b773b823672199a685e765f5345ceb6584eb3d8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63808728 --- Diff: sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala --- @@ -99,4 +105,41 @@ class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEac assert(databases3.toSeq == Seq("default")) } + test("check change after refresh") { +val _hc = hc +import _hc.implicits._ + +withTempPath { tempDir => --- End diff -- Sure, let me remove it. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220200787 LGTM except the test stuff, thanks for working on it! I agree that we should remove `refreshTable` in `SessionState`, but need someone to confirm, or we can do it in follow-up PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63808539 --- Diff: sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala --- @@ -99,4 +105,41 @@ class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEac assert(databases3.toSeq == Seq("default")) } + test("check change after refresh") { +val _hc = hc +import _hc.implicits._ + +withTempPath { tempDir => --- End diff -- Do we still need this test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220155088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58808/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220155086 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220154809 **[Test build #58808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58808/consoleFull)** for PR 13156 at commit [`8a52ac6`](https://github.com/apache/spark/commit/8a52ac608d1836e095cf83185be37a25696cf0c7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220149993 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58805/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220149990 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220149660 **[Test build #58805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58805/consoleFull)** for PR 13156 at commit [`7142ef5`](https://github.com/apache/spark/commit/7142ef54fbc806c12859f2af152794af5d50ec72). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220121864 **[Test build #58808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58808/consoleFull)** for PR 13156 at commit [`8a52ac6`](https://github.com/apache/spark/commit/8a52ac608d1836e095cf83185be37a25696cf0c7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63760124 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala --- @@ -163,6 +163,9 @@ private[sql] class SessionState(sparkSession: SparkSession) { def executePlan(plan: LogicalPlan): QueryExecution = new QueryExecution(sparkSession, plan) def refreshTable(tableName: String): Unit = { +// Different from SparkSession.catalog.refreshTable, this API only refreshes the metadata. +// It does not reload the cached data. That means, if this table is cached as +// an InMemoryRelation, we do not refresh the cached data. --- End diff -- Let me know if we need to remove the API `refreshTable` in `SessionState`. So far, it is not being used by any test case. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63758568 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -622,7 +622,7 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv .mode(SaveMode.Append) .saveAsTable("arrayInParquet") - sessionState.refreshTable("arrayInParquet") + sparkSession.catalog.refreshTable("arrayInParquet") --- End diff -- Got it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63758626 --- Diff: sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala --- @@ -99,4 +105,41 @@ class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEac assert(databases3.toSeq == Seq("default")) } + test("check change after refresh") { --- End diff -- Done. Please review the latest code changes. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63758414 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala --- @@ -163,6 +163,9 @@ private[sql] class SessionState(sparkSession: SparkSession) { def executePlan(plan: LogicalPlan): QueryExecution = new QueryExecution(sparkSession, plan) def refreshTable(tableName: String): Unit = { +// Different from SparkSession.catalog.refreshTable, this API only refreshes the metadata. +// It does not reload the cached data. That means, if this table is cached as +// an InMemoryRelation, we do not refresh the cached data. --- End diff -- `SharedState` can refresh the cached table data. In `SessionState`, we only can refresh the metadata. Thus, this API `refreshTable` only refresh the metadata --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63757822 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala --- @@ -126,24 +126,9 @@ case class RefreshTable(tableIdent: TableIdentifier) extends RunnableCommand { override def run(sparkSession: SparkSession): Seq[Row] = { -// Refresh the given table's metadata first. -sparkSession.sessionState.catalog.refreshTable(tableIdent) - -// If this table is cached as a InMemoryColumnarRelation, drop the original -// cached version and make the new version cached lazily. -val logicalPlan = sparkSession.sessionState.catalog.lookupRelation(tableIdent) -// Use lookupCachedData directly since RefreshTable also takes databaseName. -val isCached = sparkSession.cacheManager.lookupCachedData(logicalPlan).nonEmpty -if (isCached) { - // Create a data frame to represent the table. - // TODO: Use uncacheTable once it supports database name. - val df = Dataset.ofRows(sparkSession, logicalPlan) - // Uncache the logicalPlan. - sparkSession.cacheManager.tryUncacheQuery(df, blocking = true) - // Cache it again. - sparkSession.cacheManager.cacheQuery(df, Some(tableIdent.table)) -} --- End diff -- The above logics are moved to `sparkSession.catalog.refreshTable` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220117679 **[Test build #58805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58805/consoleFull)** for PR 13156 at commit [`7142ef5`](https://github.com/apache/spark/commit/7142ef54fbc806c12859f2af152794af5d50ec72). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63732945 --- Diff: sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala --- @@ -99,4 +105,41 @@ class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEac assert(databases3.toSeq == Seq("default")) } + test("check change after refresh") { --- End diff -- Sure, will do it soon. Then, the new behavior will be different from what Spark 1.6 behaves. However, I think we should keep two interfaces (SQL and API) consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63729506 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -622,7 +622,7 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv .mode(SaveMode.Append) .saveAsTable("arrayInParquet") - sessionState.refreshTable("arrayInParquet") + sparkSession.catalog.refreshTable("arrayInParquet") --- End diff -- Actually, invalidateTable and refreshTable do have different meanings. The current implementation of `HiveMetastoreCatalog.refreshTable` is `HiveMetastoreCatalog.invalidateTable` (and then we retrieve the new metadata lazily). But, it does not mean that `refreshTable` and `invalidateTable` have the same semantic. If we should remove any of `invalidateTable` or `refreshTable` should be discussed in a different thread. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63727680 --- Diff: sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala --- @@ -99,4 +105,41 @@ class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEac assert(databases3.toSeq == Seq("default")) } + test("check change after refresh") { --- End diff -- Looks like `RefreshTable` command is actually doing more work. I think we need to make `RefreshTable` and `sparkSession.catalog.refreshTable` have the same behavior. Can you make that change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63719915 --- Diff: sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala --- @@ -99,4 +105,41 @@ class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEac assert(databases3.toSeq == Seq("default")) } + test("check change after refresh") { --- End diff -- I see. `refreshTable` API is in `HiveContext`. I think we can just do a dummy call to verify if the API still exists but does not check the functionalities. Does that sound good to you? @cloud-fan @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63675218 --- Diff: sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala --- @@ -99,4 +105,41 @@ class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEac assert(databases3.toSeq == Seq("default")) } + test("check change after refresh") { --- End diff -- They share the same implementation and I think we don't need to test all of them. cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-219931477 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58740/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-219931474 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-219931325 **[Test build #58740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58740/consoleFull)** for PR 13156 at commit [`4ac3b76`](https://github.com/apache/spark/commit/4ac3b768e0b4720dcef86b910e89e31335390217). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63647706 --- Diff: sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala --- @@ -99,4 +105,41 @@ class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEac assert(databases3.toSeq == Seq("default")) } + test("check change after refresh") { --- End diff -- The test cases modified by this PR are used to verify `refreshTable` APIs. We also have test cases to verify the corresponding SQL interface, which is calling RefreshTable Command. For example, https://github.com/apache/spark/blob/d8a83a564ff3fd0281007adbf8aa3757da8a2c2b/sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala#L164-L207 Now, to test the pure `HiveContext`, the only way we can do is to add a test case in `sql/hivecontext-compatibility`. Not sure if this can answer your question. Let me know if you have any concern --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63646713 --- Diff: sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala --- @@ -99,4 +105,41 @@ class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEac assert(databases3.toSeq == Seq("default")) } + test("check change after refresh") { --- End diff -- Do we have test for `refreshTable`/RefreshTable before? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63644169 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -622,7 +622,7 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv .mode(SaveMode.Append) .saveAsTable("arrayInParquet") - sessionState.refreshTable("arrayInParquet") + sparkSession.catalog.refreshTable("arrayInParquet") --- End diff -- Actually, I also want to remove `invalidateTable`, which is a duplicate name of `refreshTable` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63643173 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -622,7 +622,7 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv .mode(SaveMode.Append) .saveAsTable("arrayInParquet") - sessionState.refreshTable("arrayInParquet") + sparkSession.catalog.refreshTable("arrayInParquet") --- End diff -- As we don't call `refreshTable` through `SessionState`, do we still need to keep `SessionState.refreshTable`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-219918792 **[Test build #58740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58740/consoleFull)** for PR 13156 at commit [`4ac3b76`](https://github.com/apache/spark/commit/4ac3b768e0b4720dcef86b910e89e31335390217). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-219886319 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-219886323 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58725/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-219886108 **[Test build #58725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58725/consoleFull)** for PR 13156 at commit [`9e6c4b7`](https://github.com/apache/spark/commit/9e6c4b7d8ef035a63f6c3c219950be326f2e8357). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63615059 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MultiDatabaseSuite.scala --- @@ -202,7 +202,8 @@ class MultiDatabaseSuite extends QueryTest with SQLTestUtils with TestHiveSingle activateDatabase(db) { sql( -s"""CREATE EXTERNAL TABLE t (id BIGINT) +s""" + |CREATE EXTERNAL TABLE t (id BIGINT) --- End diff -- Will revert them back. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63615033 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -294,6 +294,18 @@ class SQLContext private[sql]( sparkSession.catalog.clearCache() } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { +sparkSession.catalog.refreshTable(tableName) + } --- End diff -- Sure, will remove it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63613203 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MultiDatabaseSuite.scala --- @@ -202,7 +202,8 @@ class MultiDatabaseSuite extends QueryTest with SQLTestUtils with TestHiveSingle activateDatabase(db) { sql( -s"""CREATE EXTERNAL TABLE t (id BIGINT) +s""" + |CREATE EXTERNAL TABLE t (id BIGINT) --- End diff -- Seems these formatting changes are not necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63613130 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -294,6 +294,18 @@ class SQLContext private[sql]( sparkSession.catalog.clearCache() } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { +sparkSession.catalog.refreshTable(tableName) + } --- End diff -- This method did not exist in SQLContext. It's in HiveContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-219869025 **[Test build #58725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58725/consoleFull)** for PR 13156 at commit [`9e6c4b7`](https://github.com/apache/spark/commit/9e6c4b7d8ef035a63f6c3c219950be326f2e8357). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/13156 [SPARK-15367] [SQL] Add refreshTable back What changes were proposed in this pull request? `refreshTable` was a method in `HiveContext`. It was deleted accidentally while we were migrating the APIs. This PR is to add it back to `HiveContext`. In addition, in `SparkSession`, we put it under the catalog namespace (`SparkSession.catalog.refreshTable`). How was this patch tested? Changed the existing test cases to use the function `refreshTable` You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark refreshTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13156.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13156 commit b1cd1c670adbd0db3dcb82831b4aacae514c37f1 Author: gatorsmile Date: 2016-05-17T20:38:22Z initial fix commit bdd7c61b9d1d7fb1b839485c10f82300304860c1 Author: gatorsmile Date: 2016-05-17T21:14:58Z fix. commit e3564d5dff530ce84a28d7ed90a4ff4bac7de46b Author: gatorsmile Date: 2016-05-17T21:29:28Z fix again. commit c3f3f0b481c5a3fe3b2485ab0d73194dd7898911 Author: gatorsmile Date: 2016-05-17T22:03:33Z revert it back commit 9e6c4b7d8ef035a63f6c3c219950be326f2e8357 Author: gatorsmile Date: 2016-05-17T22:04:01Z Merge remote-tracking branch 'upstream/master' into refreshTable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org