Gengliang Wang created SPARK-54025:
--------------------------------------

             Summary: Support recaching when a table is written via a different 
table implementation (V1 or V2)
                 Key: SPARK-54025
                 URL: https://issues.apache.org/jira/browse/SPARK-54025
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.1.0
            Reporter: Gengliang Wang


When a table is cached using one table implementation (e.g., V2) and written 
through the other (e.g., V1), Spark may not automatically trigger recaching. As 
a result, the cached data can become stale even though the underlying table 
content has changed.

 

This issue arises because the current recaching mechanism does not consistently 
handle cross-implementation writes. Given that the community is actively 
working on Data Source V2 (DSV2), many data sources are expected to have both 
V1 and V2 implementations for a period of time, making this issue more likely 
to occur in practice.

 

*Proposed Fix:*

Enhance the cache invalidation logic to detect writes that occur through a 
different table implementation (V1 ↔ V2) and trigger recaching accordingly.

 

{*}Expected Outcome:{*}{*}{*}
 * Cached data remains up to date when a table is written through either V1 or 
V2 paths.

 * Both logical-plan-based and file-path-based recaching continue to work as 
expected for V1&V2 connectors

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to