Hello, In Spark programming guide (http://spark.apache.org/docs/1.2.0/programming-guide.html) there is a recommendation: Typically you want 2-4 partitions for each CPU in your cluster.
We have a Spark Master and two Spark workers each with 18 cores and 18 GB of RAM. In our application we use JdbcRDD to load data from a DB and then cache it. We load entities from a single table, now we have 76 million of entities (entity size in memory is about 160 bytes). We call count() during application startup to force entities loading. Here are our measurements for count() operation (cores x partitions = time): 36x36 = 6.5 min 36x72 = 7.7 min 36x108 = 9.4 min So despite recommendations the most efficient setup is one partition per core. What is the reason for above recommendation? Java 8, Apache Spark 1.1.0 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Tuning-number-of-partitions-per-CPU-tp21642.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org