Re: Data locality running Spark on Mesos
I tried two Spark stand-alone configurations: SPARK_WORKER_CORES=1 SPARK_WORKER_MEMORY=1g SPARK_WORKER_INSTANCES=6 spark.driver.memory 1g spark.executor.memory 1g spark.storage.memoryFraction 0.9 --total-executor-cores 60 In the second configuration (same as first, but): SPARK_WORKER_CORES=6 SPARK_WORKER_MEMORY=6g SPARK_WORKER_INSTANCES=1 spark.executor.memory 6g Runs using the first configuration have faster execution times compared with Spark runs on my configuration of Mesos (both coarse-grained and fine-grained), Runs using second configuration had about the same execution time as with Mesos. Looking at the logs again, it looks like the locality info between the stand-alone and Mesos coarse-grained mode are very similar. I must have been hallucinating earlier thinking somehow the data locality information was different. So this whole thing might just simply be due to the fact that it is not possible in Mesos right now to set the number of executors. Even in fine-grained mode, there seems to be just one executor per node (I had thought differently in my previous message). The workloads I've tried apparently performs better with many executors per node than a single powerful executor per node. Would be really useful once this feature you've mentioned: https://issues.apache.org/jira/browse/SPARK-5095 is implemented. Spark on Mesos fine-grained configuration: driver memory = 1G spark.executor.memory 6g (tried also with 1g, still one executor per node and execution time roughly the same) spark.storage.memoryFraction 0.9 Mike From: Timothy Chen t...@mesosphere.io To: Michael V Le/Watson/IBM@IBMUS Cc: user user@spark.apache.org Date: 01/10/2015 04:31 AM Subject:Re: Data locality running Spark on Mesos Hi Michael, I see you capped the cores to 60. I wonder what's the settings you used for standalone mode that you compared with? I can try to run a MLib workload on both to compare. Tim On Jan 9, 2015, at 6:42 AM, Michael V Le m...@us.ibm.com wrote: Hi Tim, Thanks for your response. The benchmark I used just reads data in from HDFS and builds the Linear Regression model using methods from the MLlib. Unfortunately, for various reasons, I can't open the source code for the benchmark at this time. I will try to replicate the problem using some sample benchmarks provided by the vanilla Spark distribution. It is very possible that I have something very screwy in my workload or setup. The parameters I used for the Spark on Mesos are the following: driver memory = 1G total-executor-cores = 60 spark.executor.memory 6g spark.storage.memoryFraction 0.9 spark.mesos.coarse = true The rest are default values, so spark.locality.wait should just be 3000ms. I launched the Spark job on a separate node from the 10-node cluster using spark-submit. With regards to Mesos in fine-grained mode, do you have a feel for the overhead of launching executors for every task? Of course, any perceived slow down will probably be very dependent on the workload. I just want to have a feel of the possible overhead (e.g., factor of 2 or 3 slowdown?). If not a data locality issue, perhaps this overhead can be a factor in the slowdown I observed, at least in the fine-grained case. BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0 Thanks, Mike graycol.gifTim Chen ---01/08/2015 03:04:51 PM---How did you run this benchmark, and is there a open version I can try it with? From: Tim Chen t...@mesosphere.io To: Michael V Le/Watson/IBM@IBMUS Cc: user user@spark.apache.org Date: 01/08/2015 03:04 PM Subject: Re: Data locality running Spark on Mesos How did you run this benchmark, and is there a open version I can try it with? And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote: Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos
Re: Data locality running Spark on Mesos
Hi Michael, I see you capped the cores to 60. I wonder what's the settings you used for standalone mode that you compared with? I can try to run a MLib workload on both to compare. Tim On Jan 9, 2015, at 6:42 AM, Michael V Le m...@us.ibm.com wrote: Hi Tim, Thanks for your response. The benchmark I used just reads data in from HDFS and builds the Linear Regression model using methods from the MLlib. Unfortunately, for various reasons, I can't open the source code for the benchmark at this time. I will try to replicate the problem using some sample benchmarks provided by the vanilla Spark distribution. It is very possible that I have something very screwy in my workload or setup. The parameters I used for the Spark on Mesos are the following: driver memory = 1G total-executor-cores = 60 spark.executor.memory 6g spark.storage.memoryFraction 0.9 spark.mesos.coarse = true The rest are default values, so spark.locality.wait should just be 3000ms. I launched the Spark job on a separate node from the 10-node cluster using spark-submit. With regards to Mesos in fine-grained mode, do you have a feel for the overhead of launching executors for every task? Of course, any perceived slow down will probably be very dependent on the workload. I just want to have a feel of the possible overhead (e.g., factor of 2 or 3 slowdown?). If not a data locality issue, perhaps this overhead can be a factor in the slowdown I observed, at least in the fine-grained case. BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0 Thanks, Mike graycol.gifTim Chen ---01/08/2015 03:04:51 PM---How did you run this benchmark, and is there a open version I can try it with? From: Tim Chen t...@mesosphere.io To: Michael V Le/Watson/IBM@IBMUS Cc: user user@spark.apache.org Date: 01/08/2015 03:04 PM Subject: Re: Data locality running Spark on Mesos How did you run this benchmark, and is there a open version I can try it with? And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote: Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos (course-grained mode), the execution time is somewhere around 15min. Actually, I tried with find-grained mode and giving each Mesos node 6 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get roughly 15min. I've noticed that when Spark is running on Mesos, almost all tasks execute with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On stand-alone, the locality is mostly PROCESS_LOCAL. I think this locality issue might be the reason for the slow down but I can't figure out why, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Data locality running Spark on Mesos
Hi Tim, Thanks for your response. The benchmark I used just reads data in from HDFS and builds the Linear Regression model using methods from the MLlib. Unfortunately, for various reasons, I can't open the source code for the benchmark at this time. I will try to replicate the problem using some sample benchmarks provided by the vanilla Spark distribution. It is very possible that I have something very screwy in my workload or setup. The parameters I used for the Spark on Mesos are the following: driver memory = 1G total-executor-cores = 60 spark.executor.memory 6g spark.storage.memoryFraction 0.9 spark.mesos.coarse = true The rest are default values, so spark.locality.wait should just be 3000ms. I launched the Spark job on a separate node from the 10-node cluster using spark-submit. With regards to Mesos in fine-grained mode, do you have a feel for the overhead of launching executors for every task? Of course, any perceived slow down will probably be very dependent on the workload. I just want to have a feel of the possible overhead (e.g., factor of 2 or 3 slowdown?). If not a data locality issue, perhaps this overhead can be a factor in the slowdown I observed, at least in the fine-grained case. BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0 Thanks, Mike From: Tim Chen t...@mesosphere.io To: Michael V Le/Watson/IBM@IBMUS Cc: user user@spark.apache.org Date: 01/08/2015 03:04 PM Subject:Re: Data locality running Spark on Mesos How did you run this benchmark, and is there a open version I can try it with? And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote: Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos (course-grained mode), the execution time is somewhere around 15min. Actually, I tried with find-grained mode and giving each Mesos node 6 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get roughly 15min. I've noticed that when Spark is running on Mesos, almost all tasks execute with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On stand-alone, the locality is mostly PROCESS_LOCAL. I think this locality issue might be the reason for the slow down but I can't figure out why, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Data locality running Spark on Mesos
Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos (course-grained mode), the execution time is somewhere around 15min. Actually, I tried with find-grained mode and giving each Mesos node 6 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get roughly 15min. I've noticed that when Spark is running on Mesos, almost all tasks execute with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On stand-alone, the locality is mostly PROCESS_LOCAL. I think this locality issue might be the reason for the slow down but I can't figure out why, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Data locality running Spark on Mesos
How did you run this benchmark, and is there a open version I can try it with? And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote: Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos (course-grained mode), the execution time is somewhere around 15min. Actually, I tried with find-grained mode and giving each Mesos node 6 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get roughly 15min. I've noticed that when Spark is running on Mesos, almost all tasks execute with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On stand-alone, the locality is mostly PROCESS_LOCAL. I think this locality issue might be the reason for the slow down but I can't figure out why, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org