Re: RDD replication in Spark

2014-08-27 Thread Cheng Lian
You may start from here

.
​


On Mon, Aug 25, 2014 at 9:05 PM, rapelly kartheek 
wrote:

> Hi,
>
> I've exercised multiple options available for persist() including  RDD
> replication. I have gone thru the classes that involve in caching/storing
> the RDDS at different levels. StorageLevel class plays a pivotal role by
> recording whether to use memory or disk or to replicate the RDD on multiple
> nodes.
> The class LocationIterator iterates over the preferred machines one by
> one  for
> each partition that is replicated. I got a rough idea of CoalescedRDD.
> Please correct me if I am wrong.
>
> But I am looking for the code that chooses the resources to replicate the
> RDDs. Can someone please tell me how replication takes place and how do we
> choose the resources for replication. I just want to know as to where
> should I look into to understand how the replication happens.
>
>
>
> Thank you so much!!!
>
> regards
>
> -Karthik
>


RDD replication in Spark

2014-08-25 Thread rapelly kartheek
Hi,

I've exercised multiple options available for persist() including  RDD
replication. I have gone thru the classes that involve in caching/storing
the RDDS at different levels. StorageLevel class plays a pivotal role by
recording whether to use memory or disk or to replicate the RDD on multiple
nodes.
The class LocationIterator iterates over the preferred machines one by one  for
each partition that is replicated. I got a rough idea of CoalescedRDD.
Please correct me if I am wrong.

But I am looking for the code that chooses the resources to replicate the
RDDs. Can someone please tell me how replication takes place and how do we
choose the resources for replication. I just want to know as to where
should I look into to understand how the replication happens.



Thank you so much!!!

regards

-Karthik