Hi,

I am running a couple of docker hosts, each with an HDFS and a spark
worker in a spark standalone cluster.
In order to get data locality awareness, I would like to configure Racks
for each host, so that a spark worker container knows from which hdfs
node container it should load its data. Does this make sense?

I configured HDFS container nodes via the core-site.xml in
$HADOOP_HOME/etc and this works. hdfs dfsadmin -printTopology shows my
setup.

I configured SPARK the same way. I placed core-site.xml and
hdfs-site.xml in the SPARK_CONF_DIR ... BUT this has no effect.

Submitting a spark job via spark-submit to the spark-master that loads
from HDFS just has Data locality ANY.

It would be great if anybody would help me getting the right configuration!

Thanks and best regards,
on

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to