Hi, You're right - it is unused, but the code does some (very little) initialization as if it'd be really needed. Confusion is seeded.
I filled https://issues.apache.org/jira/browse/SPARK-10921 to track it. The other reason I brought it up was to help myself (and hopefully others) who read the code and are constantly distracted by important things that…are turning out not be be so whatsoever. I spent a couple of hours yesterday while reading the sources for its uses as I initially thought the YARN-specific feature in Spark was really important (that eventually caught my attention and I kept digging deeper) until I'd found it is a leftover. Read the comment in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L93-L96: // This is used only by YARN for now, but should be relevant to other cluster types (Mesos, // etc) too. This is typically generated from InputFormatInfo.computePreferredLocations. It // contains a map from hostname to a list of input format splits on the host. private[spark] var preferredNodeLocationData: Map[String, Set[SplitInfo]] = Map() What would you think about the var? I was convinced it's important for Spark on YARN. Would "Removing the internal field and one usage of it seems OK, though I don't think it would help much of anything." still hold? I don't think so and hence the issue reported. Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Sun, Oct 4, 2015 at 5:50 AM, Sean Owen <so...@cloudera.com> wrote: > I think it's unused as the JIRA says, but removing it from the > constructors would change the API, so that's why it stays in the > signature. Removing the internal field and one usage of it seems OK, > though I don't think it would help much of anything. > > On Sun, Oct 4, 2015 at 4:36 AM, Jacek Laskowski <ja...@japila.pl> wrote: >> Hi, >> >> I've been reviewing SparkContext and found preferredNodeLocationData >> that was made obsoleted by SPARK-8949 [1]. >> >> When you search where SparkContext.preferredNodeLocationData is used, >> you find 3 places - one constructor marked @deprecated, the other with >> logWarning telling us that "Passing in preferred locations has no >> effect at all, see SPARK-8949", and in >> org.apache.spark.deploy.yarn.ApplicationMaster.registerAM method. >> >> org.apache.spark.deploy.yarn.ApplicationMaster.registerAM method >> caught my eye and I found that it does the following in >> client.register: >> >> if (sc != null) sc.preferredNodeLocationData else Map() >> >> However, AFAIU client.register ignores the input parameter completely >> (!) It's not used in the body of the method and seems a leftover. The >> input parameter should be removed and so should the above line. >> >> What do you think? Should I report an issue and clean it up via a pull req? >> >> BTW, What do you think about removing >> SparkContext.preferredNodeLocationData as part of the cleanup? >> >> [1] https://issues.apache.org/jira/browse/SPARK-8949 >> >> Pozdrawiam, >> Jacek >> >> -- >> Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl >> Follow me at https://twitter.com/jaceklaskowski >> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org