How did you run this benchmark, and is there a open version I can try it with?
And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8, 2015 at 11:44 AM, mvle <m...@us.ibm.com> wrote: > Hi, > > I've noticed running Spark apps on Mesos is significantly slower compared > to > stand-alone or Spark on YARN. > I don't think it should be the case, so I am posting the problem here in > case someone has some explanation > or can point me to some configuration options i've missed. > > I'm running the LinearRegression benchmark with a dataset of 48.8GB. > On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), > I can finish the workload in about 5min (I don't remember exactly). > The data is loaded into HDFS spanning the same 10-node cluster. > There are 6 worker instances per node. > > However, when running the same workload on the same cluster but now with > Spark on Mesos (course-grained mode), the execution time is somewhere > around > 15min. Actually, I tried with find-grained mode and giving each Mesos node > 6 > VCPUs (to hopefully get 6 executors like the stand-alone test), I still get > roughly 15min. > > I've noticed that when Spark is running on Mesos, almost all tasks execute > with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On > stand-alone, the locality is mostly PROCESS_LOCAL. > > I think this locality issue might be the reason for the slow down but I > can't figure out why, especially for coarse-grained mode as the executors > supposedly do not go away until job completion. > > Any ideas? > > Thanks, > Mike > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >