[ https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997761#comment-13997761 ]
Aaron Davidson edited comment on SPARK-1767 at 7/21/14 7:46 PM: ---------------------------------------------------------------- -One simple workaround to this is to just make sure that partitions that are in memory are ordered first in the list of partitions, as Spark will try to place executors based on the order in this list.- This is, of course, not a complete solution, as we would not utilize the locality-wait logic within Spark and would immediately fallback to a non-cached node if the cached node was busy, rather than waiting for some period of time for the cached node to become available. Edit: I was wrong about how Spark schedules partitions -- ordering is not sufficient. was (Author: ilikerps): One simple workaround to this is to just make sure that partitions that are in memory are ordered first in the list of partitions, as Spark will try to place executors based on the order in this list. This is, of course, not a complete solution, as we would not utilize the locality-wait logic within Spark and would immediately fallback to a non-cached node if the cached node was busy, rather than waiting for some period of time for the cached node to become available. > Prefer HDFS-cached replicas when scheduling data-local tasks > ------------------------------------------------------------ > > Key: SPARK-1767 > URL: https://issues.apache.org/jira/browse/SPARK-1767 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 1.0.0 > Reporter: Sandy Ryza > -- This message was sent by Atlassian JIRA (v6.2#6252)