[ 
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097826#comment-14097826
 ] 

Sandy Ryza edited comment on SPARK-2089 at 8/14/14 10:41 PM:
-------------------------------------------------------------

Hmmmm, it's true that my suggestion would require us to serialize and then 
immediately deserialize a possibly huge string.  How about Spark conf 
properties that just specify the input file and input format? We would handle 
all the logic for converting this to location preferences on the other side.  
This would also simplify things for the users (just need to set properties, not 
call any methods).


was (Author: sandyr):
Hmmmm, it's true that my suggestion would require us to serialize and then 
immediately deserialize a possibly huge string.  How about Spark conf 
properties that just specify the input file and input format, and handles all 
the logic for converting this to location preferences on the other side.  This 
would also simplify things for the users (just need to set properties, not call 
any methods).

> With YARN, preferredNodeLocalityData isn't honored 
> ---------------------------------------------------
>
>                 Key: SPARK-2089
>                 URL: https://issues.apache.org/jira/browse/SPARK-2089
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Critical
>
> When running in YARN cluster mode, apps can pass preferred locality data when 
> constructing a Spark context that will dictate where to request executor 
> containers.
> This is currently broken because of a race condition.  The Spark-YARN code 
> runs the user class and waits for it to start up a SparkContext.  During its 
> initialization, the SparkContext will create a YarnClusterScheduler, which 
> notifies a monitor in the Spark-YARN code that .  The Spark-Yarn code then 
> immediately fetches the preferredNodeLocationData from the SparkContext and 
> uses it to start requesting containers.
> But in the SparkContext constructor that takes the preferredNodeLocationData, 
> setting preferredNodeLocationData comes after the rest of the initialization, 
> so, if the Spark-YARN code comes around quickly enough after being notified, 
> the data that's fetched is the empty unset version.  The occurred during all 
> of my runs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to