[ https://issues.apache.org/jira/browse/HUDI-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yue Zhang updated HUDI-5697: ---------------------------- Fix Version/s: 0.14.0 (was: 0.13.1) > Spark SQL re-lists Hudi table after every SQL operations > -------------------------------------------------------- > > Key: HUDI-5697 > URL: https://issues.apache.org/jira/browse/HUDI-5697 > Project: Apache Hudi > Issue Type: Bug > Components: spark, spark-sql > Reporter: Alexey Kudinkin > Assignee: Alexey Kudinkin > Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.0 > > > Currently, after most DML operations in Spark SQL, Hudi invokes > `Catalog.refreshTable` > Prior to Spark 3.2, this was essentially doing the following: > # Invalidating relation cache (forcing next time for relation to be > re-resolved, creating new FileIndex, listing files, etc) > # Trigger cascading invalidation (re-caching) of the cached data (in > CacheManager) > As of Spark 3.2 it now additionally does `LogicalRelation.refresh` for ALL > tables (previously this was only done for Temporary Views), therefore > entailing whole table to be re-listed again by triggering `FileIndex.refresh` > which might be costly operation. > > We should revert back to preceding behavior from Spark 3.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)