Re: [Spark 2.0.2 HDFS]: no data locality

Sun Rui Tue, 27 Dec 2016 18:00:07 -0800

Although the Spark task scheduler is aware of rack-level data locality, it 
seems that only YARN implements the support for it. However, node-level 
locality can still work for Standalone.

It is not necessary to copy the hadoop config files into the Spark CONF 
directory. Set HADOOP_CONF_DIR to point to the conf directory of your Hadoop.

Data Locality involves in both task data locality and executor data locality. 
Executor data locality is only supported on YARN with executor dynamic 
allocation enabled. For standalone, by default, a Spark application will 
acquire all available cores in the cluster, generally meaning there is at least 
one executor on each node, in which case task data locality can work because a 
task can be dispatched to an executor on any of the preferred nodes of the task 
for execution.

for your case, have you set spark.cores.max to limit the cores to acquire, 
which means executors are available on a subset of the cluster nodes?

> On Dec 27, 2016, at 01:39, Karamba <phantom...@web.de> wrote:
> 
> Hi,
> 
> I am running a couple of docker hosts, each with an HDFS and a spark
> worker in a spark standalone cluster.
> In order to get data locality awareness, I would like to configure Racks
> for each host, so that a spark worker container knows from which hdfs
> node container it should load its data. Does this make sense?
> 
> I configured HDFS container nodes via the core-site.xml in
> $HADOOP_HOME/etc and this works. hdfs dfsadmin -printTopology shows my
> setup.
> 
> I configured SPARK the same way. I placed core-site.xml and
> hdfs-site.xml in the SPARK_CONF_DIR ... BUT this has no effect.
> 
> Submitting a spark job via spark-submit to the spark-master that loads
> from HDFS just has Data locality ANY.
> 
> It would be great if anybody would help me getting the right configuration!
> 
> Thanks and best regards,
> on
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Spark 2.0.2 HDFS]: no data locality

Reply via email to