Yes. Effectively, could it avoid network transfers? Or put differently, would
an option like persist(MEMORY_ALL) improve job speed by caching an RDD on every
worker?
On 25.02.2015, at 11:42, Sean Owen so...@cloudera.com wrote:
If you mean, can both copies of the blocks be used for computations?
yes they can.
On Wed, Feb 25, 2015 at 10:36 AM, Marius Soutier mps@gmail.com wrote:
Hi,
just a quick question about calling persist with the _2 option. Is the 2x
replication only useful for fault tolerance, or will it also increase job
speed by avoiding network transfers? Assuming I’m doing joins or other
shuffle operations.
Thanks
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org