I wanted to ask a general question about Hadoop/Yarn and Apache Spark integration. I know that Hadoop on a physical cluster has rack awareness. i.e. It attempts to minimise network traffic by saving replicated blocks within a rack. i.e.
I wondered whether, when Spark is configured to use Yarn as a cluster manager, it is able to use this feature to also minimise network traffic to a degree. Sorry if this questionn is not quite accurate but I think you can generally see what I mean ?