Best is by measuring and recording how The Performance of your solution
scales as The Workload scales - recording As In "Data Points recording" and
then you can do some times series stat analysis and visualizations 

For example you can start with a single box with e.g. 8 CPU cores 

Use e.g. 1 or two partitions and 1 executor which would correspond to 1 CPU
Core (JVM Thread) processing your workload - scale the workload and see how
the performance scales and record all data points 
Then re[eat the same for more cpu cores, ram and boxes - you get the idea?

Then analyze your performance datasets in the way explained 

Basically this stuff is known as Performance Engineering and has nothing to
do with specific product - read something on PE as well  

-----Original Message-----
From: shreesh [mailto:shreesh.la...@mail.com] 
Sent: Tuesday, June 16, 2015 4:22 PM
To: user@spark.apache.org
Subject: Re: How does one decide no of executors/cores/memory allocation?

I realize that there are a lot of ways to configure my application in spark.
The part that is not clear is that how do I decide say for example in how
many partitions should I divide my data or how much ram should I have or how
many workers should one initialize?




--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-one-decide-no-o
f-executors-cores-memory-allocation-tp23326p23339.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
commands, e-mail: user-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to