Re: Data locality running Spark on Mesos

2015-01-11 Thread Michael V Le

I tried two Spark stand-alone configurations:
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1g
SPARK_WORKER_INSTANCES=6
spark.driver.memory 1g
spark.executor.memory 1g
spark.storage.memoryFraction 0.9
--total-executor-cores 60

In the second configuration (same as first, but):
SPARK_WORKER_CORES=6
SPARK_WORKER_MEMORY=6g
SPARK_WORKER_INSTANCES=1
spark.executor.memory 6g

Runs using the first configuration have faster execution times compared
with Spark runs on my configuration of Mesos (both coarse-grained and
fine-grained),
Runs using second configuration had about the same execution time as with
Mesos.
Looking at the logs again, it looks like the locality info between the
stand-alone and Mesos coarse-grained mode are very similar.
I must have been hallucinating earlier thinking somehow the data locality
information was different.

So this whole thing might just simply be due to the fact that it is not
possible in Mesos right now to set the number of executors.
Even in fine-grained mode, there seems to be just one executor per node (I
had thought differently in my previous message).
The workloads I've tried apparently performs better with many executors per
node than a single powerful executor per node.

Would be really useful once this feature you've mentioned:
https://issues.apache.org/jira/browse/SPARK-5095
is implemented.

Spark on Mesos fine-grained configuration:
driver memory = 1G
spark.executor.memory 6g  (tried also with 1g, still one executor per node
and execution time roughly the same)
spark.storage.memoryFraction 0.9


Mike




From:   Timothy Chen t...@mesosphere.io
To: Michael V Le/Watson/IBM@IBMUS
Cc: user user@spark.apache.org
Date:   01/10/2015 04:31 AM
Subject:Re: Data locality running Spark on Mesos



Hi Michael,

I see you capped the cores to 60.

I wonder what's the settings you used for standalone mode that you compared
with?

I can try to run a MLib workload on both to compare.

Tim

On Jan 9, 2015, at 6:42 AM, Michael V Le m...@us.ibm.com wrote:



  Hi Tim,

  Thanks for your response.

  The benchmark I used just reads data in from HDFS and builds the
  Linear Regression model using methods from the MLlib.
  Unfortunately, for various reasons, I can't open the source code for
  the benchmark at this time.
  I will try to replicate the problem using some sample benchmarks
  provided by the vanilla Spark distribution.
  It is very possible that I have something very screwy in my workload
  or setup.

  The parameters I used for the Spark on Mesos are the following:
  driver memory = 1G
  total-executor-cores = 60
  spark.executor.memory 6g
  spark.storage.memoryFraction 0.9
  spark.mesos.coarse = true

  The rest are default values, so spark.locality.wait should just be
  3000ms.

  I launched the Spark job on a separate node from the 10-node cluster
  using spark-submit.

  With regards to Mesos in fine-grained mode, do you have a feel for
  the overhead of
  launching executors for every task? Of course, any perceived slow
  down will probably be very dependent
  on the workload. I just want to have a feel of the possible overhead
  (e.g., factor of 2 or 3 slowdown?).
  If not a data locality issue, perhaps this overhead can be a factor
  in the slowdown I observed, at least in the fine-grained case.

  BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0

  Thanks,
  Mike


  graycol.gifTim Chen ---01/08/2015 03:04:51 PM---How did you run
  this benchmark, and is there a open version I can try it with?

  From: Tim Chen t...@mesosphere.io
  To: Michael V Le/Watson/IBM@IBMUS
  Cc: user user@spark.apache.org
  Date: 01/08/2015 03:04 PM
  Subject: Re: Data locality running Spark on Mesos





  How did you run this benchmark, and is there a open version I can try
  it with?

  And what is your configurations, like spark.locality.wait, etc?

  Tim

  On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote:
Hi,

I've noticed running Spark apps on Mesos is significantly
slower compared to
stand-alone or Spark on YARN.
I don't think it should be the case, so I am posting the
problem here in
case someone has some explanation
or can point me to some configuration options i've missed.

I'm running the LinearRegression benchmark with a dataset of
48.8GB.
On a 10-node stand-alone Spark cluster (each node 4-core, 8GB
of RAM),
I can finish the workload in about 5min (I don't remember
exactly).
The data is loaded into HDFS spanning the same 10-node cluster.
There are 6 worker instances per node.

However, when running the same workload on the same cluster but
now with
Spark on Mesos

Re: Data locality running Spark on Mesos

2015-01-10 Thread Timothy Chen
Hi Michael,

I see you capped the cores to 60.

I wonder what's the settings you used for standalone mode that you compared 
with?

I can try to run a MLib workload on both to compare.

Tim 

 On Jan 9, 2015, at 6:42 AM, Michael V Le m...@us.ibm.com wrote:
 
 Hi Tim,
 
 Thanks for your response.
 
 The benchmark I used just reads data in from HDFS and builds the Linear 
 Regression model using methods from the MLlib.
 Unfortunately, for various reasons, I can't open the source code for the 
 benchmark at this time.
 I will try to replicate the problem using some sample benchmarks provided by 
 the vanilla Spark distribution.
 It is very possible that I have something very screwy in my workload or setup.
 
 The parameters I used for the Spark on Mesos are the following:
 driver memory = 1G
 total-executor-cores = 60
 spark.executor.memory 6g
 spark.storage.memoryFraction 0.9
 spark.mesos.coarse = true
 
 The rest are default values, so spark.locality.wait should just be 3000ms.
 
 I launched the Spark job on a separate node from the 10-node cluster using 
 spark-submit.
 
 With regards to Mesos in fine-grained mode, do you have a feel for the 
 overhead of
 launching executors for every task? Of course, any perceived slow down will 
 probably be very dependent
 on the workload. I just want to have a feel of the possible overhead (e.g., 
 factor of 2 or 3 slowdown?).
 If not a data locality issue, perhaps this overhead can be a factor in the 
 slowdown I observed, at least in the fine-grained case.
 
 BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0
 
 Thanks,
 Mike
 
 
 graycol.gifTim Chen ---01/08/2015 03:04:51 PM---How did you run this 
 benchmark, and is there a open version I can try it with?
 
 From: Tim Chen t...@mesosphere.io
 To:   Michael V Le/Watson/IBM@IBMUS
 Cc:   user user@spark.apache.org
 Date: 01/08/2015 03:04 PM
 Subject:  Re: Data locality running Spark on Mesos
 
 
 
 How did you run this benchmark, and is there a open version I can try it with?
 
 And what is your configurations, like spark.locality.wait, etc?
 
 Tim 
 
 On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote:
 Hi,
 
 I've noticed running Spark apps on Mesos is significantly slower compared to
 stand-alone or Spark on YARN.
 I don't think it should be the case, so I am posting the problem here in
 case someone has some explanation
 or can point me to some configuration options i've missed.
 
 I'm running the LinearRegression benchmark with a dataset of 48.8GB.
 On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM),
 I can finish the workload in about 5min (I don't remember exactly).
 The data is loaded into HDFS spanning the same 10-node cluster.
 There are 6 worker instances per node.
 
 However, when running the same workload on the same cluster but now with
 Spark on Mesos (course-grained mode), the execution time is somewhere around
 15min. Actually, I tried with find-grained mode and giving each Mesos node 6
 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get
 roughly 15min.
 
 I've noticed that when Spark is running on Mesos, almost all tasks execute
 with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On
 stand-alone, the locality is mostly PROCESS_LOCAL.
 
 I think this locality issue might be the reason for the slow down but I
 can't figure out why, especially for coarse-grained mode as the executors
 supposedly do not go away until job completion.
 
 Any ideas?
 
 Thanks,
 Mike
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


Re: Data locality running Spark on Mesos

2015-01-09 Thread Michael V Le

Hi Tim,

Thanks for your response.

The benchmark I used just reads data in from HDFS and builds the Linear
Regression model using methods from the MLlib.
Unfortunately, for various reasons, I can't open the source code for the
benchmark at this time.
I will try to replicate the problem using some sample benchmarks provided
by the vanilla Spark distribution.
It is very possible that I have something very screwy in my workload or
setup.

The parameters I used for the Spark on Mesos are the following:
driver memory = 1G
total-executor-cores = 60
spark.executor.memory 6g
spark.storage.memoryFraction 0.9
spark.mesos.coarse = true

The rest are default values, so spark.locality.wait should just be 3000ms.

I launched the Spark job on a separate node from the 10-node cluster using
spark-submit.

With regards to Mesos in fine-grained mode, do you have a feel for the
overhead of
launching executors for every task? Of course, any perceived slow down will
probably be very dependent
on the workload. I just want to have a feel of the possible overhead (e.g.,
factor of 2 or 3 slowdown?).
If not a data locality issue, perhaps this overhead can be a factor in the
slowdown I observed, at least in the fine-grained case.

BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0

Thanks,
Mike




From:   Tim Chen t...@mesosphere.io
To: Michael V Le/Watson/IBM@IBMUS
Cc: user user@spark.apache.org
Date:   01/08/2015 03:04 PM
Subject:Re: Data locality running Spark on Mesos



How did you run this benchmark, and is there a open version I can try it
with?

And what is your configurations, like spark.locality.wait, etc?

Tim

On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote:
  Hi,

  I've noticed running Spark apps on Mesos is significantly slower compared
  to
  stand-alone or Spark on YARN.
  I don't think it should be the case, so I am posting the problem here in
  case someone has some explanation
  or can point me to some configuration options i've missed.

  I'm running the LinearRegression benchmark with a dataset of 48.8GB.
  On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM),
  I can finish the workload in about 5min (I don't remember exactly).
  The data is loaded into HDFS spanning the same 10-node cluster.
  There are 6 worker instances per node.

  However, when running the same workload on the same cluster but now with
  Spark on Mesos (course-grained mode), the execution time is somewhere
  around
  15min. Actually, I tried with find-grained mode and giving each Mesos
  node 6
  VCPUs (to hopefully get 6 executors like the stand-alone test), I still
  get
  roughly 15min.

  I've noticed that when Spark is running on Mesos, almost all tasks
  execute
  with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On
  stand-alone, the locality is mostly PROCESS_LOCAL.

  I think this locality issue might be the reason for the slow down but I
  can't figure out why, especially for coarse-grained mode as the executors
  supposedly do not go away until job completion.

  Any ideas?

  Thanks,
  Mike



  --
  View this message in context:
  
http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html

  Sent from the Apache Spark User List mailing list archive at Nabble.com.

  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org



Data locality running Spark on Mesos

2015-01-08 Thread mvle
Hi,

I've noticed running Spark apps on Mesos is significantly slower compared to
stand-alone or Spark on YARN.
I don't think it should be the case, so I am posting the problem here in
case someone has some explanation
or can point me to some configuration options i've missed.

I'm running the LinearRegression benchmark with a dataset of 48.8GB.
On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM),
I can finish the workload in about 5min (I don't remember exactly).
The data is loaded into HDFS spanning the same 10-node cluster.
There are 6 worker instances per node.

However, when running the same workload on the same cluster but now with
Spark on Mesos (course-grained mode), the execution time is somewhere around
15min. Actually, I tried with find-grained mode and giving each Mesos node 6
VCPUs (to hopefully get 6 executors like the stand-alone test), I still get
roughly 15min.

I've noticed that when Spark is running on Mesos, almost all tasks execute
with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On
stand-alone, the locality is mostly PROCESS_LOCAL.

I think this locality issue might be the reason for the slow down but I
can't figure out why, especially for coarse-grained mode as the executors
supposedly do not go away until job completion.

Any ideas?

Thanks,
Mike



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Data locality running Spark on Mesos

2015-01-08 Thread Tim Chen
How did you run this benchmark, and is there a open version I can try it
with?

And what is your configurations, like spark.locality.wait, etc?

Tim

On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote:

 Hi,

 I've noticed running Spark apps on Mesos is significantly slower compared
 to
 stand-alone or Spark on YARN.
 I don't think it should be the case, so I am posting the problem here in
 case someone has some explanation
 or can point me to some configuration options i've missed.

 I'm running the LinearRegression benchmark with a dataset of 48.8GB.
 On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM),
 I can finish the workload in about 5min (I don't remember exactly).
 The data is loaded into HDFS spanning the same 10-node cluster.
 There are 6 worker instances per node.

 However, when running the same workload on the same cluster but now with
 Spark on Mesos (course-grained mode), the execution time is somewhere
 around
 15min. Actually, I tried with find-grained mode and giving each Mesos node
 6
 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get
 roughly 15min.

 I've noticed that when Spark is running on Mesos, almost all tasks execute
 with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On
 stand-alone, the locality is mostly PROCESS_LOCAL.

 I think this locality issue might be the reason for the slow down but I
 can't figure out why, especially for coarse-grained mode as the executors
 supposedly do not go away until job completion.

 Any ideas?

 Thanks,
 Mike



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org