[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097826#comment-14097826 ]
Sandy Ryza edited comment on SPARK-2089 at 8/14/14 10:41 PM: ------------------------------------------------------------- Hmmmm, it's true that my suggestion would require us to serialize and then immediately deserialize a possibly huge string. How about Spark conf properties that just specify the input file and input format? We would handle all the logic for converting this to location preferences on the other side. This would also simplify things for the users (just need to set properties, not call any methods). was (Author: sandyr): Hmmmm, it's true that my suggestion would require us to serialize and then immediately deserialize a possibly huge string. How about Spark conf properties that just specify the input file and input format, and handles all the logic for converting this to location preferences on the other side. This would also simplify things for the users (just need to set properties, not call any methods). > With YARN, preferredNodeLocalityData isn't honored > --------------------------------------------------- > > Key: SPARK-2089 > URL: https://issues.apache.org/jira/browse/SPARK-2089 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.0.0 > Reporter: Sandy Ryza > Assignee: Sandy Ryza > Priority: Critical > > When running in YARN cluster mode, apps can pass preferred locality data when > constructing a Spark context that will dictate where to request executor > containers. > This is currently broken because of a race condition. The Spark-YARN code > runs the user class and waits for it to start up a SparkContext. During its > initialization, the SparkContext will create a YarnClusterScheduler, which > notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then > immediately fetches the preferredNodeLocationData from the SparkContext and > uses it to start requesting containers. > But in the SparkContext constructor that takes the preferredNodeLocationData, > setting preferredNodeLocationData comes after the rest of the initialization, > so, if the Spark-YARN code comes around quickly enough after being notified, > the data that's fetched is the empty unset version. The occurred during all > of my runs. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org