[ 
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977969#comment-14977969
 ] 

Saisai Shao commented on SPARK-2089:
------------------------------------

Hi [~pwendell], [~mridulm80], [~sandyr] and [~lianhuiwang], I'm still thinking 
enabling this feature is quite useful, for many workloads like batch workload, 
which from my understanding dynamic allocation is not used necessarily, 
currently there's no way to provide locality hints for AM to allocate 
containers, this will affect the performance a lot if data is fetched remotely, 
especially in large Yarn cluster. So I'd incline to revive this feature, but 
maybe with another way (current way is so hadoop way and cannot be worked in 
yarn-client mode), also this locality computation can reuse the implementation 
of SPARK-4352, I'd like to spend some time to take crack at this issue, what's 
your opinion and concern? Is it still necessary to address this issue, since it 
is broken for many versions. Any comment is greatly appreciated, thanks a lot.

> With YARN, preferredNodeLocalityData isn't honored 
> ---------------------------------------------------
>
>                 Key: SPARK-2089
>                 URL: https://issues.apache.org/jira/browse/SPARK-2089
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Critical
>
> When running in YARN cluster mode, apps can pass preferred locality data when 
> constructing a Spark context that will dictate where to request executor 
> containers.
> This is currently broken because of a race condition.  The Spark-YARN code 
> runs the user class and waits for it to start up a SparkContext.  During its 
> initialization, the SparkContext will create a YarnClusterScheduler, which 
> notifies a monitor in the Spark-YARN code that .  The Spark-Yarn code then 
> immediately fetches the preferredNodeLocationData from the SparkContext and 
> uses it to start requesting containers.
> But in the SparkContext constructor that takes the preferredNodeLocationData, 
> setting preferredNodeLocationData comes after the rest of the initialization, 
> so, if the Spark-YARN code comes around quickly enough after being notified, 
> the data that's fetched is the empty unset version.  The occurred during all 
> of my runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to