MEMORY_ONLY_SER question

2014-11-04 Thread Mohit Jaggi
Folks, If I have an RDD persisted in MEMORY_ONLY_SER mode and then it is needed for a transformation/action later, is the whole partition of the RDD deserialized into Java objects first before my transform/action code works on it? Or is it deserialized in a streaming manner as the iterator moves

Re: MEMORY_ONLY_SER question

2014-11-04 Thread Tathagata Das
It it deserialized in a streaming manner as the iterator moves over the partition. This is a functionality of core Spark, and Spark Streaming just uses it as is. What do you want to customize it to? On Tue, Nov 4, 2014 at 9:22 AM, Mohit Jaggi mohitja...@gmail.com wrote: Folks, If I have an RDD

RE: MEMORY_ONLY_SER question

2014-11-04 Thread Shao, Saisai
the objects once for all. Thanks Jerry From: Mohit Jaggi [mailto:mohitja...@gmail.com] Sent: Wednesday, November 05, 2014 2:01 PM To: Tathagata Das Cc: user@spark.apache.org Subject: Re: MEMORY_ONLY_SER question I used the word streaming but I did not mean to refer to spark streaming. I meant