[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049933#comment-16049933
 ] 

DjvuLee edited comment on SPARK-21082 at 6/15/17 2:47 AM:
----------------------------------------------------------

Not a really fast node and slow node problem.

Even all the nodes have equal computation power, but there are lots of factor 
affect the data cached by Executors. Such as the data locality for the task's 
input, the network, and scheduling order etc.

`it is reasonable to schedule more tasks on to fast node.`
but the fact is schedule more tasks to ideal Executors. Scheduler has no 
meaning of fast or slow for each Executor, it considers more about locality and 
idle.

I agree that it is better not to change the code, but I can not find any 
configuration to solve the problem.
Is there any good solution to keep the used memory balanced across Executors? 


was (Author: djvulee):
Not a really fast node and slow node problem.

Even all the nodes have equal computation power, but there are lots of factor 
affect the data cached by Executors. Such as the data locality for the task's 
input, the network, and scheduling order etc.

`it is reasonable to schedule more tasks on to fast node.`
but the fact is schedule more tasks to ideal Executors. Scheduler has no 
meaning of fast or slow for each Executor, it considers more about locality and 
idle.

I agree that it is better not to change the code, but I can not find any 
configuration to solve the problem.
Is there any good solution to keep the used memory balanced across the 
Executors? 

> Consider Executor's memory usage when scheduling task 
> ------------------------------------------------------
>
>                 Key: SPARK-21082
>                 URL: https://issues.apache.org/jira/browse/SPARK-21082
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, Spark Core
>    Affects Versions: 2.2.1
>            Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage well enough(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to