Hi,
just a quick question about calling persist with the _2 option. Is the 2x
replication only useful for fault tolerance, or will it also increase job speed
by avoiding network transfers? Assuming I’m doing joins or other shuffle
operations.
Thanks
If you mean, can both copies of the blocks be used for computations?
yes they can.
On Wed, Feb 25, 2015 at 10:36 AM, Marius Soutier mps@gmail.com wrote:
Hi,
just a quick question about calling persist with the _2 option. Is the 2x
replication only useful for fault tolerance, or will it
Yes. Effectively, could it avoid network transfers? Or put differently, would
an option like persist(MEMORY_ALL) improve job speed by caching an RDD on every
worker?
On 25.02.2015, at 11:42, Sean Owen so...@cloudera.com wrote:
If you mean, can both copies of the blocks be used for