These messages are actually not about spilling the RDD, they're about spilling intermediate state in a reduceByKey, groupBy or other operation whose state doesn't fit in memory. We have to do that in these cases to avoid going out of memory. You can minimize spilling by having more reduce tasks though, which will mean less data per task.
Matei On Jul 26, 2014, at 1:22 PM, lokesh.gidra <lokesh.gi...@gmail.com> wrote: > Hello, > > I am running SparkPageRank example which uses cache() API for persistence. > This AFAIK, uses MEMORY_ONLY storage level. But even in this setup, I see a > lot of "WARN ExternalAppendOnlyMap: Spilling in-memory map of...." messages > in the log. Why is it so? I thought that MEMORY_ONLY means kick out the RDD > if there isn't enough memory available. > > > Thanks, > Lokesh > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spilling-in-memory-messages-in-log-even-with-MEMORY-ONLY-tp10723.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.