Meaning of persistence levels -- setting persistence causing out of memory errors with pyspark
I'm running spark locally on my laptop to explore how persistence impacts memory use. I'm generating 80 MB matrices in numpy and then simply adding them as an example problem. No matter what I set NUM or persistence level to in the code below, I get out of memory errors like (
disk-backing pyspark rdds?
-back them (something analogous to mmap?) so that they don't create memory pressure in the system at all? With compute taking this long, the added overhead of disk and network IO is quite minimal. Thanks! ...Eric Jonas