Re: RDD immutablility

Marco Tue, 19 Jan 2016 04:25:48 -0800

Hello,

RDD are immutable by design. The reasons, to quote Sean Owen in this answer
( https://www.quora.com/Why-is-a-spark-RDD-immutable ), are the following :


Immutability rules out a big set of potential problems due to updates from
> multiple threads at once. Immutable data is definitely safe to share across
> processes.

They're not just immutable but a deterministic function of their input.
> This plus immutability also means the RDD's parts can be recreated at any
> time. This makes caching, sharing and replication easy.
> These are significant design wins, at the cost of having to copy data
> rather than mutate it in place. Generally, that's a decent tradeoff to
> make: gaining the fault tolerance and correctness with no developer effort
> is worth spending memory and CPU on, since the latter are cheap.
> A corollary: immutable data can as easily live in memory as on disk. This
> makes it reasonable to easily move operations that hit disk to instead use
> data in memory, and again, adding memory is much easier than adding I/O
> bandwidth.
> Of course, an RDD isn't really a collection of data, but just a recipe for
> making data from other data. It is not literally computed by materializing
> every RDD completely. That is, a lot of the "copy" can be optimized away
> too.


I hope it answers your question.

Kind regards,
Marco

2016-01-19 13:14 GMT+01:00 ddav <dave.davo...@gmail.com>:

> Hi,
>
> Certain API's (map, mapValues) give the developer access to the data stored
> in RDD's.
> Am I correct in saying that these API's must never modify the data but
> always return a new object with a copy of the data if the data needs to be
> updated for the returned RDD.
>
> Thanks,
> Dave.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-immutablility-tp26007.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: RDD immutablility

Reply via email to