Meaning of persistence levels -- setting persistence causing out of memory errors with pyspark

2014-10-27 Thread Eric Jonas
I'm running spark locally on my laptop to explore how persistence impacts memory use. I'm generating 80 MB matrices in numpy and then simply adding them as an example problem. No matter what I set NUM or persistence level to in the code below, I get out of memory errors like (

disk-backing pyspark rdds?

2014-10-21 Thread Eric Jonas
-back them (something analogous to mmap?) so that they don't create memory pressure in the system at all? With compute taking this long, the added overhead of disk and network IO is quite minimal. Thanks! ...Eric Jonas