[jira] [Comment Edited] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

Aaron Davidson (JIRA) Mon, 21 Jul 2014 12:47:22 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997761#comment-13997761
 ]


Aaron Davidson edited comment on SPARK-1767 at 7/21/14 7:46 PM:
----------------------------------------------------------------

-One simple workaround to this is to just make sure that partitions that are in 
memory are ordered first in the list of partitions, as Spark will try to place 
executors based on the order in this list.- This is, of course, not a complete 
solution, as we would not utilize the locality-wait logic within Spark and 
would immediately fallback to a non-cached node if the cached node was busy, 
rather than waiting for some period of time for the cached node to become 
available.

Edit: I was wrong about how Spark schedules partitions -- ordering is not 
sufficient.


was (Author: ilikerps):
One simple workaround to this is to just make sure that partitions that are in 
memory are ordered first in the list of partitions, as Spark will try to place 
executors based on the order in this list. This is, of course, not a complete 
solution, as we would not utilize the locality-wait logic within Spark and 
would immediately fallback to a non-cached node if the cached node was busy, 
rather than waiting for some period of time for the cached node to become 
available.

> Prefer HDFS-cached replicas when scheduling data-local tasks
> ------------------------------------------------------------
>
>                 Key: SPARK-1767
>                 URL: https://issues.apache.org/jira/browse/SPARK-1767
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Sandy Ryza
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

Reply via email to