If I have a cluster with 7 nodes, each having an equal amount of cores and create an RDD with sc.parallelize() it looks as if the Spark will always tries to distribute the partitions.
Question: (1) Is that something I can rely on? (2) Can I rely that sc.parallelize() will assign partitions to as many executers as possible. Meaning: Let's say I request 7 partitions, is each node guaranteed to get 1 of these partitions? If I select 14 partitions, is each node guaranteed to grab 2 partitions? P.S.: I am aware that for other cases like sc.hadoopFile, this might depend in the actual storage location of the data. I am merely asking for the sc.parallelize() case. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Partition-for-each-executor-tp25141.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org