[ 
https://issues.apache.org/jira/browse/SPARK-47253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-47253.
-----------------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Issue resolved by pull request 45367
[https://github.com/apache/spark/pull/45367]

> Allow LiveEventBus to stop without the completely draining of event queue
> -------------------------------------------------------------------------
>
>                 Key: SPARK-47253
>                 URL: https://issues.apache.org/jira/browse/SPARK-47253
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 4.0.0
>            Reporter: TakawaAkirayo
>            Assignee: TakawaAkirayo
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> #Problem statement:
> The SparkContext.stop() hung a long time on LiveEventBus.stop() when 
> listeners slow
> #User scenarios:
> We have a centralized service with multiple instances to regularly execute 
> user's scheduled tasks.
> For each user task within one service instance, the process is as follows:
> 1.Create a Spark session directly within the service process with an account 
> defined in the task.
> 2.Instantiate listeners by class names and register them with the 
> SparkContext. The JARs containing the listener classes are uploaded to the 
> service by the user.
> 3.Prepare resources.
> 4.Run user logic (Spark SQL).
> 5.Stop the Spark session by invoking SparkSession.stop().
> In step 5, it will wait for the LiveEventBus to stop, which requires the 
> remaining events to be completely drained by each listener.
> Since the listener is implemented by users and we cannot prevent some heavy 
> stuffs within the listener on each event, there are cases where a single 
> heavy job has over 30,000 tasks,
> and it could take 30 minutes for the listener to process all the remaining 
> events, because within the listener, it requires a coarse-grained global lock 
> and update the internal status to the remote database.
> This kind of delay affects other user tasks in the queue. Therefore, from the 
> server side perspective, we need the guarantee that the stop operation 
> finishes quickly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to