Hello,

In Spark programming guide
(http://spark.apache.org/docs/1.2.0/programming-guide.html) there is a
recommendation:
Typically you want 2-4 partitions for each CPU in your cluster.

We have a Spark Master and two Spark workers each with 18 cores and 18 GB of
RAM.
In our application we use JdbcRDD to load data from a DB and then cache it.
We load entities from a single table, now we have 76 million of entities
(entity size in memory is about 160 bytes). We call count() during
application startup to force entities loading. Here are our measurements for
count() operation (cores x partitions = time):
36x36 = 6.5 min
36x72 = 7.7 min
36x108 = 9.4 min

So despite recommendations the most efficient setup is one partition per
core. What is the reason for above recommendation?

Java 8, Apache Spark 1.1.0




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Tuning-number-of-partitions-per-CPU-tp21642.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to