TakawaAkirayo created SPARK-47253:
-------------------------------------
Summary: Allow LiveEventBus to stop without the completly draining
of event queue
Key: SPARK-47253
URL: https://issues.apache.org/jira/browse/SPARK-47253
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.5.0
Reporter: TakawaAkirayo
#Problem statement:
The SparkContext.stop() hung a long time on LiveEventBus.stop() when listeners
slow
#User scenarios:
We have a centralized service with multiple instances to regularly execute
user's scheduled tasks.
For each user task within one service instance, the process is as follows:
1.Create a Spark session directly within the service process with an account
defined in the task.
2.Instantiate listeners by class names and register them with the SparkContext.
The JARs containing the listener classes are uploaded to the service by the
user.
3.Prepare resources.
4.Run user logic (Spark SQL).
5.Stop the Spark session by invoking SparkSession.stop().
In step 5, it will wait for the LiveEventBus to stop, which requires the
remaining events to be completely drained by each listener.
Since the listener is implemented by users and we cannot prevent some heavy
stuffs within the listener on each event, there are cases where a single heavy
job has over 30,000 tasks,
and it could take 30 minutes for the listener to process all the remaining
events, because within the listener, it requires a coarse-grained global lock
and update the internal status to the remote database.
This kind of delay affects other user tasks in the queue. Therefore, from the
server side perspective, we need the guarantee that the stop operation finishes
quickly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]