[ https://issues.apache.org/jira/browse/SPARK-51596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18008169#comment-18008169 ]
Anish Shrigondekar commented on SPARK-51596: -------------------------------------------- PR merged here -> [GitHub Pull Request #51565|https://github.com/apache/spark/pull/51565] > Fix concurrent StateStoreProvider maintenance and closing > --------------------------------------------------------- > > Key: SPARK-51596 > URL: https://issues.apache.org/jira/browse/SPARK-51596 > Project: Spark > Issue Type: Task > Components: Structured Streaming > Affects Versions: 4.0.0 > Reporter: Livia Zhu > Priority: Major > Labels: pull-request-available > > Currently, both the task thread and maintenance thread can call unload() on a > provider. This leads to a race condition where the maintenance could be > conducting maintenance while the task thread is closing the provider, leading > to unexpected behavior. > We want to guarantee that when maintenance is run, the provider is not > closed/closing. The easiest way to do this is to move the unload operation > into the maintenance thread. To continue unloading ASAP (rather than > potentially waiting for the maintenance interval) as was introduced by > https://issues.apache.org/jira/browse/SPARK-33827, we should immediately > trigger a maintenance thread to do the unload. > This gives us an extra benefit that unloading other providers doesn't block > the task thread. To capitalize on this, unload() should not hold the > loadedProviders lock the entire time (which will block other task threads), > but instead release it once it has deleted the unloading providers from the > map and close the providers without the lock held. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org