Re: Data Locality In Spark

Chris Fregly Tue, 19 Aug 2014 16:10:29 -0700

and even the same process where the data might be cached.


these are the different locality levels:

PROCESS_LOCAL
NODE_LOCAL
RACK_LOCAL
ANY

relevant code:
https://github.com/apache/spark/blob/7712e724ad69dd0b83754e938e9799d13a4d43b9/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala#L150

https://github.com/apache/spark/blob/63bdb1f41b4895e3a9444f7938094438a94d3007/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L250

relevant docs:
see the spark.locality configuration attributes here:
https://spark.apache.org/docs/latest/configuration.html


On Tue, Jul 8, 2014 at 1:13 PM, Sandy Ryza <[email protected]> wrote:

> Hi Anish,
>
> Spark, like MapReduce, makes an effort to schedule tasks on the same nodes
> and racks that the input blocks reside on.
>
> -Sandy
>
>
> On Tue, Jul 8, 2014 at 12:27 PM, [email protected] <
> [email protected]> wrote:
>
> > Hi All
> >
> > My apologies for very basic question, do we have full support of data
> > locality in Spark MapReduce.
> >
> > Please suggest.
> >
> > --
> > Anish Sneh
> > "Experience is the best teacher."
> > http://in.linkedin.com/in/anishsneh
> >
> >
>

Re: Data Locality In Spark

Reply via email to