[ https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246942#comment-17246942 ]
Apache Spark commented on SPARK-33729: -------------------------------------- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/30699 > When refreshing cache, Spark should not use cached plan when recaching data > --------------------------------------------------------------------------- > > Key: SPARK-33729 > URL: https://issues.apache.org/jira/browse/SPARK-33729 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.0 > Reporter: Chao Sun > Priority: Major > > Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark > will call {{refreshTable}} method within {{CatalogImpl}}. > {code} > override def refreshTable(tableName: String): Unit = { > val tableIdent = > sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName) > val tableMetadata = > sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent) > val table = sparkSession.table(tableIdent) > if (tableMetadata.tableType == CatalogTableType.VIEW) { > // Temp or persistent views: refresh (or invalidate) any metadata/data > cached > // in the plan recursively. > table.queryExecution.analyzed.refresh() > } else { > // Non-temp tables: refresh the metadata cache. > sessionCatalog.refreshTable(tableIdent) > } > // If this table is cached as an InMemoryRelation, drop the original > // cached version and make the new version cached lazily. > val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table) > // uncache the logical plan. > // note this is a no-op for the table itself if it's not cached, but will > invalidate all > // caches referencing this table. > sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true) > if (cache.nonEmpty) { > // save the cache name and cache level for recreation > val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName > val cacheLevel = > cache.get.cachedRepresentation.cacheBuilder.storageLevel > // recache with the same name and cache level. > sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, > cacheLevel) > } > } > {code} > Note that the {{table}} is created before the table relation cache is > cleared, and used later in {{cacheQuery}}. This is incorrect since it still > refers cached table relation which could be stale. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org