[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task

DjvuLee (JIRA) Wed, 14 Jun 2017 20:50:30 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049965#comment-16049965
 ]


DjvuLee commented on SPARK-21082:
---------------------------------

Data locality, input size for task, scheduling order affect a lot, even all the 
nodes have the same computation capacity.


Suppose there are two Executors with same computation capacity and four tasks 
with input size: 10G, 3G, 10G, 20G. 
So there is a chance that one Executor will cache 30GB, one will  cache 13GB 
under current scheduling policy。 
If the Executor have only 25GB memory for storage, then not all the data can be 
cached in memory.

I will give a more detail description for the propose if it seems OK now.

> Consider Executor's memory usage when scheduling task 
> ------------------------------------------------------
>
>                 Key: SPARK-21082
>                 URL: https://issues.apache.org/jira/browse/SPARK-21082
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, Spark Core
>    Affects Versions: 2.2.1
>            Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage well enough(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task

Reply via email to