Hi, I am running a couple of docker hosts, each with an HDFS and a spark worker in a spark standalone cluster. In order to get data locality awareness, I would like to configure Racks for each host, so that a spark worker container knows from which hdfs node container it should load its data. Does this make sense?
I configured HDFS container nodes via the core-site.xml in $HADOOP_HOME/etc and this works. hdfs dfsadmin -printTopology shows my setup. I configured SPARK the same way. I placed core-site.xml and hdfs-site.xml in the SPARK_CONF_DIR ... BUT this has no effect. Submitting a spark job via spark-submit to the spark-master that loads from HDFS just has Data locality ANY. It would be great if anybody would help me getting the right configuration! Thanks and best regards, on --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org