[ 
https://issues.apache.org/jira/browse/SPARK-54025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032869#comment-18032869
 ] 

Gengliang Wang commented on SPARK-54025:
----------------------------------------

cc [~vli-databricks] 

> Support recaching when a table is written via a different table 
> implementation (V1 or V2)
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-54025
>                 URL: https://issues.apache.org/jira/browse/SPARK-54025
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.1.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> When a table is cached using one table implementation (e.g., V2) and written 
> through the other (e.g., V1), Spark may not automatically trigger recaching. 
> As a result, the cached data can become stale even though the underlying 
> table content has changed.
>  
> This issue arises because the current recaching mechanism does not 
> consistently handle cross-implementation writes. Given that the community is 
> actively working on Data Source V2 (DSV2), many data sources are expected to 
> have both V1 and V2 implementations for a period of time, making this issue 
> more likely to occur in practice.
>  
> *Proposed Fix:*
> Enhance the cache invalidation logic to detect writes that occur through a 
> different table implementation (V1 ↔ V2) and trigger recaching accordingly.
>  
> {*}Expected Outcome:{*}{*}{*}
>  * Cached data remains up to date when a table is written through either V1 
> or V2 paths.
>  * Both logical-plan-based and file-path-based recaching continue to work as 
> expected for V1&V2 connectors
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to