Re: Question about Serialization in Storage Level

Todd Nist Thu, 21 May 2015 05:50:54 -0700

>From the docs,
https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence:

Storage LevelMeaningMEMORY_ONLYStore RDD as deserialized Java objects in
the JVM. If the RDD does not fit in memory, some partitions will not be
cached and will be recomputed on the fly each time they're needed. This is
the default level.MEMORY_AND_DISKStore RDD as *deserialized* Java objects
in the JVM. If the RDD does not fit in memory, store the partitions that
don't fit on disk, and read them from there when they're needed.
MEMORY_ONLY_SERStore RDD as *serialized* Java objects (one byte array per
partition). This is generally more space-efficient than deserialized
objects, especially when using a fast serializer
<https://spark.apache.org/docs/latest/tuning.html>, but more CPU-intensive
to read.MEMORY_AND_DISK_SERSimilar to *MEMORY_ONLY_SER*, but spill
partitions that don't fit in memory to disk instead of recomputing them on
the fly each time they're needed.

On Thu, May 21, 2015 at 3:52 AM, Jiang, Zhipeng <zhipeng.ji...@intel.com>
wrote:

>  Hi there,
>
>
>
> This question may seem to be kind of naïve, but what’s the difference
> between *MEMORY_AND_DISK* and *MEMORY_AND_DISK_SER*?
>
>
>
> If I call *rdd.persist(StorageLevel.MEMORY_AND_DISK)*, the BlockManager
> won’t serialize the *rdd*?
>
>
>
> Thanks,
>
> Zhipeng
>

Re: Question about Serialization in Storage Level

Reply via email to