Hi,

You're right - it is unused, but the code does some (very little)
initialization as if it'd be really needed. Confusion is seeded.

I filled https://issues.apache.org/jira/browse/SPARK-10921 to track it.

The other reason I brought it up was to help myself (and hopefully
others) who read the code and are constantly distracted by important
things that…are turning out not be be so whatsoever. I spent a couple
of hours yesterday while reading the sources for its uses as I
initially thought the YARN-specific feature in Spark was really
important (that eventually caught my attention and I kept digging
deeper) until I'd found it is a leftover.

Read the comment in
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L93-L96:

  // This is used only by YARN for now, but should be relevant to
other cluster types (Mesos,
  // etc) too. This is typically generated from
InputFormatInfo.computePreferredLocations. It
  // contains a map from hostname to a list of input format splits on the host.
  private[spark] var preferredNodeLocationData: Map[String,
Set[SplitInfo]] = Map()

What would you think about the var? I was convinced it's important for
Spark on YARN. Would "Removing the internal field and one usage of it
seems OK, though I don't think it would help much of anything." still
hold? I don't think so and hence the issue reported.

Pozdrawiam,
Jacek

--
Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl
Follow me at https://twitter.com/jaceklaskowski
Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski


On Sun, Oct 4, 2015 at 5:50 AM, Sean Owen <so...@cloudera.com> wrote:
> I think it's unused as the JIRA says, but removing it from the
> constructors would change the API, so that's why it stays in the
> signature. Removing the internal field and one usage of it seems OK,
> though I don't think it would help much of anything.
>
> On Sun, Oct 4, 2015 at 4:36 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>> Hi,
>>
>> I've been reviewing SparkContext and found preferredNodeLocationData
>> that was made obsoleted by SPARK-8949 [1].
>>
>> When you search where SparkContext.preferredNodeLocationData is used,
>> you find 3 places - one constructor marked @deprecated, the other with
>> logWarning telling us that "Passing in preferred locations has no
>> effect at all, see SPARK-8949", and in
>> org.apache.spark.deploy.yarn.ApplicationMaster.registerAM method.
>>
>> org.apache.spark.deploy.yarn.ApplicationMaster.registerAM method
>> caught my eye and I found that it does the following in
>> client.register:
>>
>> if (sc != null) sc.preferredNodeLocationData else Map()
>>
>> However, AFAIU client.register ignores the input parameter completely
>> (!) It's not used in the body of the method and seems a leftover. The
>> input parameter should be removed and so should the above line.
>>
>> What do you think? Should I report an issue and clean it up via a pull req?
>>
>> BTW, What do you think about removing
>> SparkContext.preferredNodeLocationData as part of the cleanup?
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-8949
>>
>> Pozdrawiam,
>> Jacek
>>
>> --
>> Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl
>> Follow me at https://twitter.com/jaceklaskowski
>> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to