Best is by measuring and recording how The Performance of your solution scales as The Workload scales - recording As In "Data Points recording" and then you can do some times series stat analysis and visualizations
For example you can start with a single box with e.g. 8 CPU cores Use e.g. 1 or two partitions and 1 executor which would correspond to 1 CPU Core (JVM Thread) processing your workload - scale the workload and see how the performance scales and record all data points Then re[eat the same for more cpu cores, ram and boxes - you get the idea? Then analyze your performance datasets in the way explained Basically this stuff is known as Performance Engineering and has nothing to do with specific product - read something on PE as well -----Original Message----- From: shreesh [mailto:shreesh.la...@mail.com] Sent: Tuesday, June 16, 2015 4:22 PM To: user@spark.apache.org Subject: Re: How does one decide no of executors/cores/memory allocation? I realize that there are a lot of ways to configure my application in spark. The part that is not clear is that how do I decide say for example in how many partitions should I divide my data or how much ram should I have or how many workers should one initialize? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-does-one-decide-no-o f-executors-cores-memory-allocation-tp23326p23339.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org