>From the docs, https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence:
Storage LevelMeaningMEMORY_ONLYStore RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time they're needed. This is the default level.MEMORY_AND_DISKStore RDD as *deserialized* Java objects in the JVM. If the RDD does not fit in memory, store the partitions that don't fit on disk, and read them from there when they're needed. MEMORY_ONLY_SERStore RDD as *serialized* Java objects (one byte array per partition). This is generally more space-efficient than deserialized objects, especially when using a fast serializer <https://spark.apache.org/docs/latest/tuning.html>, but more CPU-intensive to read.MEMORY_AND_DISK_SERSimilar to *MEMORY_ONLY_SER*, but spill partitions that don't fit in memory to disk instead of recomputing them on the fly each time they're needed. On Thu, May 21, 2015 at 3:52 AM, Jiang, Zhipeng <zhipeng.ji...@intel.com> wrote: > Hi there, > > > > This question may seem to be kind of naïve, but what’s the difference > between *MEMORY_AND_DISK* and *MEMORY_AND_DISK_SER*? > > > > If I call *rdd.persist(StorageLevel.MEMORY_AND_DISK)*, the BlockManager > won’t serialize the *rdd*? > > > > Thanks, > > Zhipeng >