[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16048835#comment-16048835 ]
DjvuLee commented on SPARK-21082: --------------------------------- Yes, one of the reason why Spark do not balance tasks well enough is affected by data locality. Consider data locality is good in most case, but when we want to cache the RDD and analysis many times on this RDD, memory balance is more important than keep data locality when load the data。 If we can not guarantee all the consideration well enough, offer a configuration to users is valuable when dealing with memory. I will give a pull request soon if this suggestion is not defective at first sight. > Consider Executor's memory usage when scheduling task > ------------------------------------------------------ > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core > Affects Versions: 2.2.1 > Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage enough well(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org