Rather than using a separate thread pool, perhaps we can just move the prep
code to the call site thread?


On Sun, Mar 4, 2018 at 11:15 PM, Ajith shetty <ajith.she...@huawei.com>
wrote:

> DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted
> events has to be processed as DAGSchedulerEventProcessLoop is single
> threaded and it will block other tasks in queue like TaskCompletion.
>
> The JobSubmitted event is time consuming depending on the nature of the
> job (Example: calculating parent stage dependencies, shuffle dependencies,
> partitions) and thus it blocks all the events to be processed.
>
>
>
> I see multiple JIRA referring to this behavior
>
> https://issues.apache.org/jira/browse/SPARK-2647
>
> https://issues.apache.org/jira/browse/SPARK-4961
>
>
>
> Similarly in my cluster some jobs partition calculation is time consuming
> (Similar to stack at SPARK-2647) hence it slows down the spark
> DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even
> if its tasks are finished within seconds, as TaskCompletion Events are
> processed at a slower rate due to blockage.
>
>
>
> I think we can split a JobSubmitted Event into 2 events
>
> Step 1. JobSubmittedPreperation - Runs in separate thread on
> JobSubmission, this will involve steps org.apache.spark.scheduler.
> DAGScheduler#createResultStage
>
> Step 2. JobSubmittedExecution - If Step 1 is success, fire an event to
> DAGSchedulerEventProcessLoop and let it process output of
> org.apache.spark.scheduler.DAGScheduler#createResultStage
>
>
>
> I can see the effect of doing this may be that Job Submissions may not be
> FIFO depending on how much time Step 1 mentioned above is going to consume.
>
>
>
> Does above solution suffice for the problem described? And is there any
> other side effect of this solution?
>
>
>
> Regards
>
> Ajith
>

Reply via email to