By default Spark will actually not keep the data at all, it will just store
"how" to recreate the data. 
The programmer can however choose to keep the data once instantiated by
calling "/.persist()/" or "/.cache()/" on the RDD. 
/.cache/ will store the data in-memory only and fail if it will not fit. 
/.persist/ will by default use memory but spill to disk if needed. 
/.persist(StorageLevel)/ allows you to write it all to disk (no in-memory
overhead). 

See:
http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence

In addition, you can define your own StorageLevel and thus if you have
magnetic and SSD disks you can choose to persist the data to the disk-level
you want (depending on how "hot" you consider the data). 

Essentially, you have full freedom to do what you will with the data in
Spark :)  

Hope this helps. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-use-more-memory-than-MapReduce-tp25030p25087.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to