[ https://issues.apache.org/jira/browse/SPARK-45672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim updated SPARK-45672: --------------------------------- Parent: (was: SPARK-45511) Issue Type: Improvement (was: Sub-task) > Provide a unified user-facing schema for state format versions in state data > source - reader > -------------------------------------------------------------------------------------------- > > Key: SPARK-45672 > URL: https://issues.apache.org/jira/browse/SPARK-45672 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 4.0.0 > Reporter: Jungtaek Lim > Priority: Major > > As of now, except stream-stream join with joinSide option being specified, > state data source would provide the state "as it is" in the state store. This > means state data source will provide the different schema for operators > having multiple state format versions. > From users' perspective, they do not care about the state format version, > hence may be confused if the state data source produces different schema. > That said, we could probably consider defining and providing the same user > facing schema for each operator. > *Note that this would need further discussion* before coming up with code, > because there is a clear trade-off. It makes a strong coupling between state > data source and the implementation of stateful operators. Also, for the > argument of non-predictable output schema, users can call printSchema() to > see the output schema in prior to query. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org