YanTang Zhai created SPARK-4962:
-----------------------------------

             Summary: Put TaskScheduler.start back in SparkContext to shorten 
cluster resources occupation period
                 Key: SPARK-4962
                 URL: https://issues.apache.org/jira/browse/SPARK-4962
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
            Reporter: YanTang Zhai
            Priority: Minor


When SparkContext object is instantiated, TaskScheduler is started and some 
resources are allocated from cluster. However, these
resources may be not used for the moment. For example, 
DAGScheduler.JobSubmitted is processing and so on. These resources are wasted in
this period. Thus, we want to put TaskScheduler.start back to shorten cluster 
resources occupation period specially for busy cluster.
TaskScheduler could be started just before running stages.

We could analyse and compare the  resources occupation period before and after 
optimization.

TaskScheduler.start execution time: [time1__]
DAGScheduler.JobSubmitted (excluding HadoopRDD.getPartitions or 
TaskScheduler.start) execution time: [time2_]
HadoopRDD.getPartitions execution time: [time3___]
Stages execution time: [time4_____]

The cluster resources occupation period before optimization is 
[time2_][time3___][time4_____].
The cluster resources occupation period after optimization 
is....[time3___][time4_____].
In summary, the cluster resources
occupation period after optimization is less than before.
If HadoopRDD.getPartitions could be put forward (SPARK-4961), the period may be 
shorten more which is [time4_____].
The resources saving is important for busy cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to