ericm-db opened a new pull request, #56015:
URL: https://github.com/apache/spark/pull/56015

   ### What changes were proposed in this pull request?
   
   When the streaming source evolution flag 
(`spark.sql.streaming.queryEvolution.enableSourceEvolution`) is set to `true`, 
force the offset log format to `VERSION_2` for new streaming queries.
   
   In `MicroBatchExecution.initializeExecution`, the offset log format version 
selection now takes `max(STREAMING_OFFSET_LOG_FORMAT_VERSION, 
minRequiredVersion)`, where `minRequiredVersion` is `VERSION_2` when source 
evolution is enabled and `VERSION_1` otherwise. Existing queries continue to 
use whatever version is already written in their offset log (read from 
`latestStartedBatch`), so this only affects new queries.
   
   The `testWithSourceEvolution` helper in `StreamingSourceEvolutionSuite` was 
updated to no longer set the offset log version explicitly, since it is now 
selected automatically.
   
   ### Why are the changes needed?
   
   Streaming source evolution relies on the `OffsetMap` (sourceId -> offset) 
format, which is only available in offset log `VERSION_2`. Previously, users 
had to remember to set `spark.sql.streaming.offsetLog.formatVersion=2` 
alongside enabling source evolution; otherwise the format would default to 
`VERSION_1` (sequence-based) and the named-source tracking required by source 
evolution would not function properly. Coupling the two configs eliminates a 
footgun.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   The change only affects new streaming queries that explicitly enable the 
internal `spark.sql.streaming.queryEvolution.enableSourceEvolution` flag. For 
such queries, the offset log will now use `VERSION_2` automatically. Users who 
manually set the offset log version remain in control: the final version is 
`max(configuredVersion, minRequiredVersion)`, so a user-configured `VERSION_2` 
keeps working unchanged.
   
   ### How was this patch tested?
   
   - Added `offset log uses VERSION_2 when source evolution is enabled` test in 
`StreamingSourceEvolutionSuite`.
   - Existing `StreamingSourceEvolutionSuite` tests pass after dropping the 
explicit offset log version from `testWithSourceEvolution` (19/19).
   - `OffsetSeqLogSuite` continues to pass (19/19).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (claude-opus-4-7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to