[
https://issues.apache.org/jira/browse/SPARK-55795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-55795:
-----------------------------------
Labels: pull-request-available (was: )
> Add automatic V1 to V2 offset log upgrade for streaming queries with named
> sources
> ----------------------------------------------------------------------------------
>
> Key: SPARK-55795
> URL: https://issues.apache.org/jira/browse/SPARK-55795
> Project: Spark
> Issue Type: Task
> Components: Structured Streaming
> Affects Versions: 4.2.0
> Reporter: Eric Marnadi
> Priority: Major
> Labels: pull-request-available
>
> Introduce an automatic offset log upgrade mechanism that allows streaming
> queries to migrate from V1 (positional) offset tracking to V2 (named) offset
> tracking when users add {{.name()}} to their streaming sources.
> Currently, when users want to migrate from V1 (index-based) to V2
> (name-based) offset tracking, they must:
> # Delete their checkpoint directory (losing all state)
> # Start fresh
> This is problematic because:
> * {*}State loss{*}: All stateful operators (aggregations, joins,
> deduplication) lose their state
> * {*}Data reprocessing{*}: Query must reprocess all historical data from the
> beginning
> * {*}Downtime{*}: Requires stopping the query and careful coordination
> With this change, users can safely migrate existing V1 offset logs to V2
> format by:
> # Adding {{.name()}} to all streaming sources
> # Setting {{spark.sql.streaming.offsetLog.formatVersion=2}}
> # Setting {{spark.sql.streaming.offsetLog.v1ToV2.autoUpgrade.enabled=true}}
> # Restarting the query
> The upgrade preserves all state and offset positions, enabling seamless
> transition to the more flexible V2 format that supports source evolution
> (adding/removing sources by name).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]