[ 
https://issues.apache.org/jira/browse/SPARK-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942823#comment-14942823
 ] 

Daljeet Virdi commented on SPARK-10921:
---------------------------------------

Sorry, I'm new to this project.Thanks for clarifying. 

> Completely remove the use of SparkContext.preferredNodeLocationData
> -------------------------------------------------------------------
>
>                 Key: SPARK-10921
>                 URL: https://issues.apache.org/jira/browse/SPARK-10921
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 1.5.1
>            Reporter: Jacek Laskowski
>            Priority: Minor
>
> SPARK-8949 obsoleted the use of {{SparkContext.preferredNodeLocationData}} 
> yet the code makes it less obvious as it says (see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L93-L96):
> {code}
>   // This is used only by YARN for now, but should be relevant to other 
> cluster types (Mesos,
>   // etc) too. This is typically generated from 
> InputFormatInfo.computePreferredLocations. It
>   // contains a map from hostname to a list of input format splits on the 
> host.
>   private[spark] var preferredNodeLocationData: Map[String, Set[SplitInfo]] = 
> Map()
> {code}
> It turns out that there are places where the initialization does take place 
> that only adds up to the confusion.
> When you search for the use of {{SparkContext.preferredNodeLocationData}},
> you'll find 3 places - one constructor marked {{@deprecated}}, the other with
> {{logWarning}} telling us that _"Passing in preferred locations has no
> effect at all, see SPARK-8949"_, and in
> {{org.apache.spark.deploy.yarn.ApplicationMaster.registerAM}} method.
> There is no consistent approach to deal with it given it's no longer used in 
> theory.
> [org.apache.spark.deploy.yarn.ApplicationMaster.registerAM|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L234-L265]
>  method
> caught my eye and I found that it does the following in
> client.register:
> {code}
> if (sc != null) sc.preferredNodeLocationData else Map()
> {code}
> However, {{client.register}} [ignores the input parameter 
> completely|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala#L47-L78],
>  but the scaladoc says (note {{preferredNodeLocations}} param):
> {code}
>   /**
>    * Registers the application master with the RM.
>    *
>    * @param conf The Yarn configuration.
>    * @param sparkConf The Spark configuration.
>    * @param preferredNodeLocations Map with hints about where to allocate 
> containers.
>    * @param uiAddress Address of the SparkUI.
>    * @param uiHistoryAddress Address of the application on the History Server.
>    */
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to