and even the same process where the data might be cached.
these are the different locality levels: PROCESS_LOCAL NODE_LOCAL RACK_LOCAL ANY relevant code: https://github.com/apache/spark/blob/7712e724ad69dd0b83754e938e9799d13a4d43b9/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala#L150 https://github.com/apache/spark/blob/63bdb1f41b4895e3a9444f7938094438a94d3007/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L250 relevant docs: see the spark.locality configuration attributes here: https://spark.apache.org/docs/latest/configuration.html On Tue, Jul 8, 2014 at 1:13 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote: > Hi Anish, > > Spark, like MapReduce, makes an effort to schedule tasks on the same nodes > and racks that the input blocks reside on. > > -Sandy > > > On Tue, Jul 8, 2014 at 12:27 PM, anishs...@yahoo.co.in < > anishs...@yahoo.co.in> wrote: > > > Hi All > > > > My apologies for very basic question, do we have full support of data > > locality in Spark MapReduce. > > > > Please suggest. > > > > -- > > Anish Sneh > > "Experience is the best teacher." > > http://in.linkedin.com/in/anishsneh > > > > >