Hi All, I have replication factor 3 in my HDFS. With 3 datanodes, i ran my experiments. Now i just added another node to it with no data in it. When i ran, SPARK launches non-local tasks in it and the time taken is more than what it took for 3 node cluster.
Here delayed scheduling fails i think because of the parameter spark.locality.wait.node which is by default 3 sec. It launches "ANY" level tasks in the added data node. I wanted to increase this parameter in the interactive shell. How do i do it. What variable should i set to pass it onto the spark-context in interactive shell? Thanks.