Alexey Kudinkin created HUDI-5697:
-------------------------------------

             Summary: Spark SQL re-lists Hudi table after every SQL operations
                 Key: HUDI-5697
                 URL: https://issues.apache.org/jira/browse/HUDI-5697
             Project: Apache Hudi
          Issue Type: Bug
          Components: spark, spark-sql
            Reporter: Alexey Kudinkin
            Assignee: Alexey Kudinkin
             Fix For: 0.13.1


Currently, after most DML operations in Spark SQL, Hudi invokes 
`Catalog.refreshTable`

Prior to Spark 3.2, this was essentially doing the following:
 # Invalidating relation cache (forcing next time for relation to be 
re-resolved, creating new FileIndex, listing files, etc)
 # Trigger cascading invalidation (re-caching) of the cached data (in 
CacheManager)

As of Spark 3.2 it now additionally does `LogicalRelation.refresh` for ALL 
tables (previously this was only done for Temporary Views), therefore entailing 
whole table to be re-listed again by triggering `FileIndex.refresh` which might 
be costly operation.

 

We should revert back to preceding behavior from Spark 3.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to