Tathagata Das created SPARK-21145:
-------------------------------------

             Summary: Restarted queries reuse same StateStoreProvider, causing 
multiple concurrent tasks to update same StateStore
                 Key: SPARK-21145
                 URL: https://issues.apache.org/jira/browse/SPARK-21145
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.2.0
            Reporter: Tathagata Das
            Assignee: Tathagata Das


StateStoreProvider instances are loaded on-demand in a executor when a query is 
started. When a query is restarted, the loaded provider instance will get 
reused. Now, there is a non-trivial chance, that the task of the previous query 
run is still running, while the tasks of the restarted run has started. So for 
a stateful partition, there may be two concurrent tasks related to the same 
stateful partition, and there for using the same provider instance. This can 
lead to inconsistent results and possibly random failures, as state store 
implementations are not designed to be thread-safe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to