Hi, 

In the talk "A Deeper Understanding of Spark Internals", it was mentioned
that for some operators, spark can spill to disk across keys (in 1.1 -  
.groupByKey(), .reduceByKey(), .sortByKey()), but that as a limitation of
the shuffle at that time, each single key-value pair must fit in memory. 

1) Now that the shuffle is sort-based rather than hash-based, does each pair
still need to fit in memory for the shuffle?

2) Also, do other operators, such as .cogroup(), also spill to disk? Or must
they fit in memory for the operator to work. 

thanks, 
ds

 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/single-key-value-pair-fitting-in-memory-tp20305.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to