Hi, In the talk "A Deeper Understanding of Spark Internals", it was mentioned that for some operators, spark can spill to disk across keys (in 1.1 - .groupByKey(), .reduceByKey(), .sortByKey()), but that as a limitation of the shuffle at that time, each single key-value pair must fit in memory.
1) Now that the shuffle is sort-based rather than hash-based, does each pair still need to fit in memory for the shuffle? 2) Also, do other operators, such as .cogroup(), also spill to disk? Or must they fit in memory for the operator to work. thanks, ds -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/single-key-value-pair-fitting-in-memory-tp20305.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org