Hi all, I am evaluating Spark to use here at my work. We have an existing Hadoop 1.x install which I planning to upgrade to Hadoop 2.3.
I am trying to work out whether I should install YARN or simply just setup a Spark standalone cluster. We already use ZooKeeper so it isn't a problem to setup HA. I am puzzled however as to how the Spark nodes can coordinate on data locality - i.e., assuming I install the nodes on the same machines as the DFS data nodes, I don't understand how Spark can work out which nodes should get which splits of the jobs? Anyway, my bigger question remains: YARN or standalone? Which is the more stable option currently? Which is the more future-proof option? Thanks, Ishaaq -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/standalone-vs-YARN-tp4271.html Sent from the Apache Spark User List mailing list archive at Nabble.com.