[jira] [Created] (SPARK-24474) Cores are left idle when there are a lot of stages to run

Al M (JIRA) Wed, 06 Jun 2018 05:29:18 -0700

Al M created SPARK-24474:
----------------------------

             Summary: Cores are left idle when there are a lot of stages to run
                 Key: SPARK-24474
                 URL: https://issues.apache.org/jira/browse/SPARK-24474
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 2.2.0
            Reporter: Al M



I've observed an issue happening consistently when:
* A job contains a join of two datasets
* One dataset is much larger than the other
* Both datasets require some processing before they are joined

What I have observed is:
* 2 stages are initially active to run processing on the two datasets
** These stages are run in parallel
** One stage has significantly more tasks than the other (e.g. one has 30k 
tasks and the other has 2k tasks)
** Spark allocates a similar (though not exactly equal) number of cores to each 
stage
* First stage completes (for the smaller dataset)
** Now there is only one stage running
** It still has many tasks left (usually > 20k tasks)
** Around half the cores are idle (e.g. Total Cores = 200, active tasks = 103)
** This continues until the second stage completes
* Second stage completes, and third begins (the stage that actually joins the 
data)
** This stage works fine, no cores are idle (e.g. Total Cores = 200, active 
tasks = 200)

Other interesting things about this:
* It seems that when we have multiple stages active, and one of them finishes, 
it does not actually release any cores to other stages
* I can't reproduce this locally on my machine, only on a cluster with YARN 
enabled



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24474) Cores are left idle when there are a lot of stages to run

Reply via email to