[ 
https://issues.apache.org/jira/browse/SPARK-45672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-45672:
---------------------------------
        Parent:     (was: SPARK-45511)
    Issue Type: Improvement  (was: Sub-task)

> Provide a unified user-facing schema for state format versions in state data 
> source - reader
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-45672
>                 URL: https://issues.apache.org/jira/browse/SPARK-45672
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 4.0.0
>            Reporter: Jungtaek Lim
>            Priority: Major
>
> As of now, except stream-stream join with joinSide option being specified, 
> state data source would provide the state "as it is" in the state store. This 
> means state data source will provide the different schema for operators 
> having multiple state format versions.
> From users' perspective, they do not care about the state format version, 
> hence may be confused if the state data source produces different schema.
> That said, we could probably consider defining and providing the same user 
> facing schema for each operator.
> *Note that this would need further discussion* before coming up with code, 
> because there is a clear trade-off. It makes a strong coupling between state 
> data source and the implementation of stateful operators. Also, for the 
> argument of non-predictable output schema, users can call printSchema() to 
> see the output schema in prior to query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to