Yes. Effectively, could it avoid network transfers? Or put differently, would 
an option like persist(MEMORY_ALL) improve job speed by caching an RDD on every 
worker?

> On 25.02.2015, at 11:42, Sean Owen <so...@cloudera.com> wrote:
> 
> If you mean, can both copies of the blocks be used for computations?
> yes they can.
> 
> On Wed, Feb 25, 2015 at 10:36 AM, Marius Soutier <mps....@gmail.com> wrote:
>> Hi,
>> 
>> just a quick question about calling persist with the _2 option. Is the 2x 
>> replication only useful for fault tolerance, or will it also increase job 
>> speed by avoiding network transfers? Assuming I’m doing joins or other 
>> shuffle operations.
>> 
>> Thanks
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to