Thanks for your reply, Vipin! I am using spark-perf benchmark. The command to create RDD is : val data: RDD[Vector] = RandomRDDs.normalVectorRDD(sc, m, n, numPartitions, seed)
after I set the numPartitions, for example 40 partitions, I think spark core code will allocate those partitions to executors. I do not know how and where spark did that. I feel spark will do it like: if executor one has 10 cores, it will allocate 10 partitions to executor 1, then allocate 10 to executor 2. spark core code will not not try to distribute 40 partitions to 8 nodes evenly, right? I do not understand you mentioned hash a key from my dataset. Could you explain more? Thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/distribute-partitions-evenly-to-my-cluster-tp27998p28031.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org