Hi list, We (scrapinghub) are planning to deploy spark in a 10+ node cluster, mainly for processing data in HDFS and kafka streaming. We are thinking of using mesos instead of yarn as the cluster resource manager so we can use docker container as the executor and makes deployment easier. But there is one import thing before making the decision: data locality.
If we run spark on mesos, can it achieve good data locality when processing HDFS data? I think spark on yarn can achieve that out of the box, but not sure whether spark on mesos could do that. I've searched through the archive of the list, but didn't find a helpful answer yet. Any reply is appreciated. Regards, Shuai