Re: Spark job resource allocation best practices

2014-11-04 Thread Romi Kuntsman
How can I configure Mesos allocation policy to share resources between all current Spark applications? I can't seem to find it in the architecture docs. *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Tue, Nov 4, 2014 at 9:11 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Yes.

Re: Spark job resource allocation best practices

2014-11-04 Thread Akhil Das
You can look at different modes over here http://docs.sigmoidanalytics.com/index.php/Spark_On_Mesos#Mesos_Run_Modes These people has very good tutorial to get you started http://mesosphere.com/docs/tutorials/run-spark-on-mesos/#overview Thanks Best Regards On Tue, Nov 4, 2014 at 1:44 PM, Romi

Re: Spark job resource allocation best practices

2014-11-04 Thread Romi Kuntsman
I have a single Spark cluster, not multiple frameworks and not multiple versions. Is it relevant for my use-case? Where can I find information about exactly how to make Mesos tell Spark how many resources of the cluster to use? (instead of the default take-all) *Romi Kuntsman*, *Big Data

Re: Spark job resource allocation best practices

2014-11-04 Thread Akhil Das
You need to install mesos on your cluster. Then you will run your spark applications by specifying mesos master (mesos://) instead of (spark://). Spark can run over Mesos in two modes: “*fine-grained*” (default) and “ *coarse-grained*”. In “*fine-grained*” mode (default), each Spark task runs as

Re: Spark job resource allocation best practices

2014-11-04 Thread Romi Kuntsman
Let's say that I run Spark on Mesos in fine-grained mode, and I have 12 cores and 64GB memory. I run application A on Spark, and some time after that (but before A finished) application B. How many CPUs will each of them get? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Tue,

Spark job resource allocation best practices

2014-11-03 Thread Romi Kuntsman
Hello, I have a Spark 1.1.0 standalone cluster, with several nodes, and several jobs (applications) being scheduled at the same time. By default, each Spark job takes up all available CPUs. This way, when more than one job is scheduled, all but the first are stuck in WAITING. On the other hand,

Re: Spark job resource allocation best practices

2014-11-03 Thread Akhil Das
Have a look at scheduling pools https://spark.apache.org/docs/latest/job-scheduling.html. If you want more sophisticated resource allocation, then you are better of to use cluster managers like mesos or yarn Thanks Best Regards On Mon, Nov 3, 2014 at 9:10 PM, Romi Kuntsman r...@totango.com

Re: Spark job resource allocation best practices

2014-11-03 Thread Romi Kuntsman
So, as said there, static partitioning is used in Spark’s standalone and YARN modes, as well as the coarse-grained Mesos mode. That leaves us only with Mesos, where there is *dynamic sharing* of CPU cores. It says when the application is not running tasks on a machine, other applications may run

Re: Spark job resource allocation best practices

2014-11-03 Thread Akhil Das
Yes. i believe Mesos is the right choice for you. http://mesos.apache.org/documentation/latest/mesos-architecture/ Thanks Best Regards On Mon, Nov 3, 2014 at 9:33 PM, Romi Kuntsman r...@totango.com wrote: So, as said there, static partitioning is used in Spark’s standalone and YARN modes, as