In general, most use cases don't need the RDD to be replicated in memory
multiple times. It would be a rare exception to do this. If it's really
expensive (time consuming) to recomputing a lost partition or if the use
case is extremely time sensitive, then maybe you could replicate it in
memory. But in general, you can safely rely on the RDD lineage graph to
re-create the lost partition it it gets discarded from memory.

As far as extracting better parallelism if the RDD is replicated, that
really depends on what sort of transformations and operations you're
running against the RDD, but again.. generally speaking, you shouldn't need
to replicate it.

On Wed, Dec 3, 2014 at 11:54 PM, rapelly kartheek <kartheek.m...@gmail.com>
wrote:

> Hi,
>
> I was just thinking about necessity for rdd replication. One category
> could be something like large number of threads requiring same rdd. Even
> though, a single rdd can be shared by multiple threads belonging to "same
> application" , I believe we can extract better parallelism  if the rdd is
> replicated, am I right?.
>
> I am eager to know if there are any real life applications or any other
> scenarios which force rdd to be replicated. Can someone please throw some
> light on "necessity for rdd replication".
>
> Thank you
>
>

Reply via email to