[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049906#comment-16049906
 ] 

Saisai Shao commented on SPARK-21082:
-------------------------------------

Is it due to fast node and slow node problem? Ideally if all the nodes have 
equal computation power, then the cached memory usage should be even. Here 
according to your description, it is more like a fast node and slow node 
problem, fast node will process and cache more data, it is reasonable to 
schedule more tasks on to fast node.

Based on free memory and OOM problem to schedule the tasks is quite scenario 
dependent AFAIK, actually we may have other solutions to tune the cluster 
instead of changing the code, also this scenario is not generic enough to 
change the scheduler. I would suggest to do a careful and generic design if you 
want to improve the scheduler. 

> Consider Executor's memory usage when scheduling task 
> ------------------------------------------------------
>
>                 Key: SPARK-21082
>                 URL: https://issues.apache.org/jira/browse/SPARK-21082
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, Spark Core
>    Affects Versions: 2.2.1
>            Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage well enough(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to