YanTang Zhai created SPARK-4962: ----------------------------------- Summary: Put TaskScheduler.start back in SparkContext to shorten cluster resources occupation period Key: SPARK-4962 URL: https://issues.apache.org/jira/browse/SPARK-4962 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: YanTang Zhai Priority: Minor
When SparkContext object is instantiated, TaskScheduler is started and some resources are allocated from cluster. However, these resources may be not used for the moment. For example, DAGScheduler.JobSubmitted is processing and so on. These resources are wasted in this period. Thus, we want to put TaskScheduler.start back to shorten cluster resources occupation period specially for busy cluster. TaskScheduler could be started just before running stages. We could analyse and compare the resources occupation period before and after optimization. TaskScheduler.start execution time: [time1__] DAGScheduler.JobSubmitted (excluding HadoopRDD.getPartitions or TaskScheduler.start) execution time: [time2_] HadoopRDD.getPartitions execution time: [time3___] Stages execution time: [time4_____] The cluster resources occupation period before optimization is [time2_][time3___][time4_____]. The cluster resources occupation period after optimization is....[time3___][time4_____]. In summary, the cluster resources occupation period after optimization is less than before. If HadoopRDD.getPartitions could be put forward (SPARK-4961), the period may be shorten more which is [time4_____]. The resources saving is important for busy cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org