Mukul Murthy created SPARK-25399:
------------------------------------

             Summary: Reusing execution threads from continuous processing for 
microbatch streaming can result in correctness issues
                 Key: SPARK-25399
                 URL: https://issues.apache.org/jira/browse/SPARK-25399
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.4.0
            Reporter: Mukul Murthy


Continuous processing sets some thread local variables that, when read by a 
thread running a microbatch stream, may result in incorrect or no previous 
state being read and resulting in wrong answers. This was caught by a job 
running the StreamSuite tests, and only repros occasionally when the same 
threads are used.

The issue is in StateStoreRDD.compute - when we compute currentVersion, we read 
from a thread local variable which is set by continuous processing threads. If 
this value is set, we then think we're on the wrong state version.

I imagine very few people, if any, would run into this bug, because you'd have 
to use continuous processing and then microbatch processing in the same 
cluster. However, it can result in silent correctness issues, and it would be 
very difficult for someone to tell if they were impacted by this or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to