Re: RDD immutablility

Marco Tue, 19 Jan 2016 04:45:08 -0800

It depends on what you mean by "write access".  The RDDs are immutable, so
you can't really change them. When you apply a mapping/filter/groupBy
function, you are creating a new RDD starting from the original one.


Kind regards,
Marco

2016-01-19 13:27 GMT+01:00 Dave <dave.davo...@gmail.com>:

> Hi Marco,
>
> Yes, that answers my question. I just wanted to be sure as the API gave me
> write access to the immutable data which means its up to the developer to
> know not to modify the input parameters for these API's.
>
> Thanks for the response.
> Dave.
>
>
> On 19/01/16 12:25, Marco wrote:
>
> Hello,
>
> RDD are immutable by design. The reasons, to quote Sean Owen in this
> answer ( https://www.quora.com/Why-is-a-spark-RDD-immutable ), are the
> following :
>
> Immutability rules out a big set of potential problems due to updates from
>> multiple threads at once. Immutable data is definitely safe to share across
>> processes.
>
> They're not just immutable but a deterministic function of their input.
>> This plus immutability also means the RDD's parts can be recreated at any
>> time. This makes caching, sharing and replication easy.
>> These are significant design wins, at the cost of having to copy data
>> rather than mutate it in place. Generally, that's a decent tradeoff to
>> make: gaining the fault tolerance and correctness with no developer effort
>> is worth spending memory and CPU on, since the latter are cheap.
>> A corollary: immutable data can as easily live in memory as on disk. This
>> makes it reasonable to easily move operations that hit disk to instead use
>> data in memory, and again, adding memory is much easier than adding I/O
>> bandwidth.
>> Of course, an RDD isn't really a collection of data, but just a recipe
>> for making data from other data. It is not literally computed by
>> materializing every RDD completely. That is, a lot of the "copy" can be
>> optimized away too.
>
>
> I hope it answers your question.
>
> Kind regards,
> Marco
>
> 2016-01-19 13:14 GMT+01:00 ddav <dave.davo...@gmail.com>:
>
>> Hi,
>>
>> Certain API's (map, mapValues) give the developer access to the data
>> stored
>> in RDD's.
>> Am I correct in saying that these API's must never modify the data but
>> always return a new object with a copy of the data if the data needs to be
>> updated for the returned RDD.
>>
>> Thanks,
>> Dave.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-immutablility-tp26007.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>

Re: RDD immutablility

Reply via email to