Gengliang Wang created SPARK-54025:
--------------------------------------
Summary: Support recaching when a table is written via a different
table implementation (V1 or V2)
Key: SPARK-54025
URL: https://issues.apache.org/jira/browse/SPARK-54025
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.1.0
Reporter: Gengliang Wang
When a table is cached using one table implementation (e.g., V2) and written
through the other (e.g., V1), Spark may not automatically trigger recaching. As
a result, the cached data can become stale even though the underlying table
content has changed.
This issue arises because the current recaching mechanism does not consistently
handle cross-implementation writes. Given that the community is actively
working on Data Source V2 (DSV2), many data sources are expected to have both
V1 and V2 implementations for a period of time, making this issue more likely
to occur in practice.
*Proposed Fix:*
Enhance the cache invalidation logic to detect writes that occur through a
different table implementation (V1 ↔ V2) and trigger recaching accordingly.
{*}Expected Outcome:{*}{*}{*}
* Cached data remains up to date when a table is written through either V1 or
V2 paths.
* Both logical-plan-based and file-path-based recaching continue to work as
expected for V1&V2 connectors
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]