[ https://issues.apache.org/jira/browse/SPARK-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu updated SPARK-11212: --------------------------------- Description: This patch includes the following changes: 1. Add a new preferred location format, executor_<host>_<executorID> (e.g., "executor_localhost_2"), to support specifying the executor locations for RDD. 2. Use the new preferred location format in ReceiverTracker to optimize the starting time of Receivers when there are multiple executors in a host. The goal of this patch is to enable the streaming scheduler to place receivers (which run as tasks) in specific executors. Basically, I want to have more control on the placement of the receivers such that they are evenly distributed among the executors. We tried to do this without changing the core scheduling logic. But it does not allow specifying particular executor as preferred location, only at the host level. So if there are two executors in the same host, and I want two receivers to run on them (one on each executor), I cannot specify that. Current code only specifies the host as preference, which may end up launching both receivers on the same executor. We try to work around it but restarting a receiver when it does not launch in the desired executor and hope that next time it will be started in the right one. But that cause lots of restarts, and delays in correctly launching the receiver. So this change, would allow the streaming scheduler to specify the exact executor as the preferred location. Also this is not exposed to the user, only the streaming scheduler uses this. > Make RDD's preferred locations support the executor location and fix > ReceiverTracker for multiple executors in a host > ----------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-11212 > URL: https://issues.apache.org/jira/browse/SPARK-11212 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Streaming > Reporter: Shixiong Zhu > > This patch includes the following changes: > 1. Add a new preferred location format, executor_<host>_<executorID> (e.g., > "executor_localhost_2"), to support specifying the executor locations for RDD. > 2. Use the new preferred location format in ReceiverTracker to optimize the > starting time of Receivers when there are multiple executors in a host. > The goal of this patch is to enable the streaming scheduler to place > receivers (which run as tasks) in specific executors. Basically, I want to > have more control on the placement of the receivers such that they are evenly > distributed among the executors. We tried to do this without changing the > core scheduling logic. But it does not allow specifying particular executor > as preferred location, only at the host level. So if there are two executors > in the same host, and I want two receivers to run on them (one on each > executor), I cannot specify that. Current code only specifies the host as > preference, which may end up launching both receivers on the same executor. > We try to work around it but restarting a receiver when it does not launch in > the desired executor and hope that next time it will be started in the right > one. But that cause lots of restarts, and delays in correctly launching the > receiver. > So this change, would allow the streaming scheduler to specify the exact > executor as the preferred location. Also this is not exposed to the user, > only the streaming scheduler uses this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org