I want to apply the following transformations to 60Gbyte data on 7nodes with 10Gbyte memory. And I am wondering if groupByKey() function returns a RDD with a single partition for each key? if so, what will happen if the size of the partition doesn't fit into that particular node?
rdd = sc.textFile("hdfs//.....").map(parserFunc).groupByKey() -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupByKey-returns-a-single-partition-in-a-RDD-tp4264.html Sent from the Apache Spark User List mailing list archive at Nabble.com.