[ https://issues.apache.org/jira/browse/SPARK-47253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
TakawaAkirayo updated SPARK-47253: ---------------------------------- Summary: Allow LiveEventBus to stop without the completely draining of event queue (was: Allow LiveEventBus to stop without the completly draining of event queue) > Allow LiveEventBus to stop without the completely draining of event queue > ------------------------------------------------------------------------- > > Key: SPARK-47253 > URL: https://issues.apache.org/jira/browse/SPARK-47253 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.5.0 > Reporter: TakawaAkirayo > Priority: Minor > > #Problem statement: > The SparkContext.stop() hung a long time on LiveEventBus.stop() when > listeners slow > #User scenarios: > We have a centralized service with multiple instances to regularly execute > user's scheduled tasks. > For each user task within one service instance, the process is as follows: > 1.Create a Spark session directly within the service process with an account > defined in the task. > 2.Instantiate listeners by class names and register them with the > SparkContext. The JARs containing the listener classes are uploaded to the > service by the user. > 3.Prepare resources. > 4.Run user logic (Spark SQL). > 5.Stop the Spark session by invoking SparkSession.stop(). > In step 5, it will wait for the LiveEventBus to stop, which requires the > remaining events to be completely drained by each listener. > Since the listener is implemented by users and we cannot prevent some heavy > stuffs within the listener on each event, there are cases where a single > heavy job has over 30,000 tasks, > and it could take 30 minutes for the listener to process all the remaining > events, because within the listener, it requires a coarse-grained global lock > and update the internal status to the remote database. > This kind of delay affects other user tasks in the queue. Therefore, from the > server side perspective, we need the guarantee that the stop operation > finishes quickly. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org