Hi all,
I am evaluating Spark to use here at my work.

We have an existing Hadoop 1.x install which I planning to upgrade to Hadoop
2.3.

I am trying to work out whether I should install YARN or simply just setup a
Spark standalone cluster. We already use ZooKeeper so it isn't a problem to
setup HA. I am puzzled however as to how the Spark nodes can coordinate on
data locality - i.e., assuming I install the nodes on the same machines as
the DFS data nodes, I don't understand how Spark can work out which nodes
should get which splits of the jobs?

Anyway, my bigger question remains: YARN or standalone? Which is the more
stable option currently? Which is the more future-proof option?

Thanks,
Ishaaq 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/standalone-vs-YARN-tp4271.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to