I want to apply the following transformations to 60Gbyte data on 7nodes with
10Gbyte memory. And I am wondering if groupByKey() function returns a RDD
with a single partition for each key? if so, what will happen if the size of
the partition doesn't fit into that particular node? 

rdd = sc.textFile("hdfs//.....").map(parserFunc).groupByKey()



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/groupByKey-returns-a-single-partition-in-a-RDD-tp4264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to