Effects of persist(XYZ_2)

2015-02-25 Thread Marius Soutier
Hi, just a quick question about calling persist with the _2 option. Is the 2x replication only useful for fault tolerance, or will it also increase job speed by avoiding network transfers? Assuming I’m doing joins or other shuffle operations. Thanks

Re: Effects of persist(XYZ_2)

2015-02-25 Thread Sean Owen
If you mean, can both copies of the blocks be used for computations? yes they can. On Wed, Feb 25, 2015 at 10:36 AM, Marius Soutier mps@gmail.com wrote: Hi, just a quick question about calling persist with the _2 option. Is the 2x replication only useful for fault tolerance, or will it

Re: Effects of persist(XYZ_2)

2015-02-25 Thread Marius Soutier
Yes. Effectively, could it avoid network transfers? Or put differently, would an option like persist(MEMORY_ALL) improve job speed by caching an RDD on every worker? On 25.02.2015, at 11:42, Sean Owen so...@cloudera.com wrote: If you mean, can both copies of the blocks be used for