[ https://issues.apache.org/jira/browse/SPARK-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942823#comment-14942823 ]
Daljeet Virdi commented on SPARK-10921: --------------------------------------- Sorry, I'm new to this project.Thanks for clarifying. > Completely remove the use of SparkContext.preferredNodeLocationData > ------------------------------------------------------------------- > > Key: SPARK-10921 > URL: https://issues.apache.org/jira/browse/SPARK-10921 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN > Affects Versions: 1.5.1 > Reporter: Jacek Laskowski > Priority: Minor > > SPARK-8949 obsoleted the use of {{SparkContext.preferredNodeLocationData}} > yet the code makes it less obvious as it says (see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L93-L96): > {code} > // This is used only by YARN for now, but should be relevant to other > cluster types (Mesos, > // etc) too. This is typically generated from > InputFormatInfo.computePreferredLocations. It > // contains a map from hostname to a list of input format splits on the > host. > private[spark] var preferredNodeLocationData: Map[String, Set[SplitInfo]] = > Map() > {code} > It turns out that there are places where the initialization does take place > that only adds up to the confusion. > When you search for the use of {{SparkContext.preferredNodeLocationData}}, > you'll find 3 places - one constructor marked {{@deprecated}}, the other with > {{logWarning}} telling us that _"Passing in preferred locations has no > effect at all, see SPARK-8949"_, and in > {{org.apache.spark.deploy.yarn.ApplicationMaster.registerAM}} method. > There is no consistent approach to deal with it given it's no longer used in > theory. > [org.apache.spark.deploy.yarn.ApplicationMaster.registerAM|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L234-L265] > method > caught my eye and I found that it does the following in > client.register: > {code} > if (sc != null) sc.preferredNodeLocationData else Map() > {code} > However, {{client.register}} [ignores the input parameter > completely|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala#L47-L78], > but the scaladoc says (note {{preferredNodeLocations}} param): > {code} > /** > * Registers the application master with the RM. > * > * @param conf The Yarn configuration. > * @param sparkConf The Spark configuration. > * @param preferredNodeLocations Map with hints about where to allocate > containers. > * @param uiAddress Address of the SparkUI. > * @param uiHistoryAddress Address of the application on the History Server. > */ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org