[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049940#comment-16049940
 ] 

Saisai Shao commented on SPARK-21082:
-------------------------------------

Fast node actually equals to idle node, since fast node execute tasks more 
efficiently, so that it has more idle time to accept more tasks. Scheduler may 
not know which node is fast node, but it will always schedule tasks on to idle 
node (regardless of locality waiting), so as a result fast node will execute 
more tasks. 

What I mean fast nodes not only means much stronger CPU, it may be fast IO. 
Normally tasks should be relatively equal distributed, if you saw one Node has 
much more tasks compared to other nodes, you'd better find out the difference 
of that node from different aspects. Changing scheduler is not the first choice 
after all.

> Consider Executor's memory usage when scheduling task 
> ------------------------------------------------------
>
>                 Key: SPARK-21082
>                 URL: https://issues.apache.org/jira/browse/SPARK-21082
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, Spark Core
>    Affects Versions: 2.2.1
>            Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage well enough(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to