How did you run this benchmark, and is there a open version I can try it
with?

And what is your configurations, like spark.locality.wait, etc?

Tim

On Thu, Jan 8, 2015 at 11:44 AM, mvle <m...@us.ibm.com> wrote:

> Hi,
>
> I've noticed running Spark apps on Mesos is significantly slower compared
> to
> stand-alone or Spark on YARN.
> I don't think it should be the case, so I am posting the problem here in
> case someone has some explanation
> or can point me to some configuration options i've missed.
>
> I'm running the LinearRegression benchmark with a dataset of 48.8GB.
> On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM),
> I can finish the workload in about 5min (I don't remember exactly).
> The data is loaded into HDFS spanning the same 10-node cluster.
> There are 6 worker instances per node.
>
> However, when running the same workload on the same cluster but now with
> Spark on Mesos (course-grained mode), the execution time is somewhere
> around
> 15min. Actually, I tried with find-grained mode and giving each Mesos node
> 6
> VCPUs (to hopefully get 6 executors like the stand-alone test), I still get
> roughly 15min.
>
> I've noticed that when Spark is running on Mesos, almost all tasks execute
> with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On
> stand-alone, the locality is mostly PROCESS_LOCAL.
>
> I think this locality issue might be the reason for the slow down but I
> can't figure out why, especially for coarse-grained mode as the executors
> supposedly do not go away until job completion.
>
> Any ideas?
>
> Thanks,
> Mike
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to