These messages are actually not about spilling the RDD, they're about spilling 
intermediate state in a reduceByKey, groupBy or other operation whose state 
doesn't fit in memory. We have to do that in these cases to avoid going out of 
memory. You can minimize spilling by having more reduce tasks though, which 
will mean less data per task.

Matei

On Jul 26, 2014, at 1:22 PM, lokesh.gidra <lokesh.gi...@gmail.com> wrote:

> Hello,
> 
> I am running SparkPageRank example which uses cache() API for persistence.
> This AFAIK, uses MEMORY_ONLY storage level. But even in this setup, I see a
> lot of "WARN ExternalAppendOnlyMap: Spilling in-memory map of...." messages
> in the log. Why is it so? I thought that MEMORY_ONLY means kick out the RDD
> if there isn't enough memory available.
> 
> 
> Thanks,
> Lokesh
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spilling-in-memory-messages-in-log-even-with-MEMORY-ONLY-tp10723.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to