Thanks for your reply, Vipin!

I am using spark-perf benchmark. The command to create RDD is :
val data: RDD[Vector] = RandomRDDs.normalVectorRDD(sc, m, n, numPartitions,
seed)

after I set the numPartitions, for example 40 partitions, I think spark core
code will allocate those partitions to executors. I do not know how and
where spark did that. I feel spark will do it like: if executor one has 10
cores, it will allocate 10 partitions to  executor 1, then allocate 10 to
executor 2. spark core code will not not try to distribute 40 partitions to
8 nodes evenly, right?

I do not understand you mentioned hash a key from my dataset. Could you
explain more?
Thanks,



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/distribute-partitions-evenly-to-my-cluster-tp27998p28031.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to