[ https://issues.apache.org/jira/browse/SPARK-33827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251353#comment-17251353 ]
Apache Spark commented on SPARK-33827: -------------------------------------- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30827 > Unload State Store asap once it becomes inactive > ------------------------------------------------ > > Key: SPARK-33827 > URL: https://issues.apache.org/jira/browse/SPARK-33827 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.2.0 > Reporter: L. C. Hsieh > Priority: Major > > SS maintains state stores in executors across batches. Due to the nature of > Spark scheduling, a state store might be allocated on another executor in > next batch. The state store in previous batch becomes inactive. > Now we run a maintenance task periodically to unload inactive state stores. > So there will be some delays between a state store becomes inactive and it is > unloaded. > Per the discussion on https://github.com/apache/spark/pull/30770 with > [~kabhwan], I think the preference is to unload inactive state store asap. > However, we can force Spark to always allocate a state store to same > executor, by using task locality configuration. This can reduce the > possibility to have inactive state store. > Normally, I think with locality configuration, we might not able to see > inactive state store generally. There is still chance an executor can be > failed and reallocated, but in this case, inactive state store is also lost > too. So it is not an issue. > So unloading inactive store asap is only useful when we don't use task > locality to force state store locality across batches. > The required change to make driver-executor bi-directional for state store > management looks non-trivial. If we already can reduce possibility of > inactive store, is it still worth making non-trivial here? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org