It turned out the issue was with my environment not Spark. Just in case
anyone else is experiencing this the problem was that the Spark workers did
not use the machine hostname by default. Setting the following environment
variable on each worker rectified it: SPARK_LOCAL_HOSTNAME: "worker1" etc.
Hi Guys,
I asked this in stack overflow here:
https://stackoverflow.com/questions/63535720/why-would-preferredlocations-not-be-enforced-on-an-empty-spark-cluster
but am hoping there is further help here.
I have a 4 node standalone cluster with workers named worker1, worker2
and worker3 and a