Yes. Effectively, could it avoid network transfers? Or put differently, would an option like persist(MEMORY_ALL) improve job speed by caching an RDD on every worker?
> On 25.02.2015, at 11:42, Sean Owen <so...@cloudera.com> wrote: > > If you mean, can both copies of the blocks be used for computations? > yes they can. > > On Wed, Feb 25, 2015 at 10:36 AM, Marius Soutier <mps....@gmail.com> wrote: >> Hi, >> >> just a quick question about calling persist with the _2 option. Is the 2x >> replication only useful for fault tolerance, or will it also increase job >> speed by avoiding network transfers? Assuming I’m doing joins or other >> shuffle operations. >> >> Thanks >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org