Github user tdas commented on the issue: https://github.com/apache/spark/pull/20622 @jose-torres I had a long offline chat with @zsxwing, kudos to him for catching a corner case in the current solution. The following sequence of events may occur. - In the query thread, the epoch tracking thread is started - Before the query thread actually starts the Spark job, the epoch tracking thread may detect some sort of reconfiguration and attempt to cancelJob even before the query thread has started spark jobs. - Query thread starts spark job, gets blocked, never terminates. Fundamentally, its not a great setup that one thread is starting the jobs and another thread is canceling them. Because of the async nature, we have no way reasoning which attempt wins, starting or cancelling. Rather let's make sure that we start and cancel in the same thread (then we can do some reasoning). Here is an alternate solution. - The epoch thread ONLY interrupts the query thread. It's not responsible for any Spark state management (other than the enum state). - The query thread cancels jobs and stops sources in the `finally` clause. There is less likely to be race conditions that end up not canceling Spark job as a single thread (the query thread) is responsible for all Spark state management.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org