Hello, RDD are immutable by design. The reasons, to quote Sean Owen in this answer ( https://www.quora.com/Why-is-a-spark-RDD-immutable ), are the following :
Immutability rules out a big set of potential problems due to updates from > multiple threads at once. Immutable data is definitely safe to share across > processes. They're not just immutable but a deterministic function of their input. > This plus immutability also means the RDD's parts can be recreated at any > time. This makes caching, sharing and replication easy. > These are significant design wins, at the cost of having to copy data > rather than mutate it in place. Generally, that's a decent tradeoff to > make: gaining the fault tolerance and correctness with no developer effort > is worth spending memory and CPU on, since the latter are cheap. > A corollary: immutable data can as easily live in memory as on disk. This > makes it reasonable to easily move operations that hit disk to instead use > data in memory, and again, adding memory is much easier than adding I/O > bandwidth. > Of course, an RDD isn't really a collection of data, but just a recipe for > making data from other data. It is not literally computed by materializing > every RDD completely. That is, a lot of the "copy" can be optimized away > too. I hope it answers your question. Kind regards, Marco 2016-01-19 13:14 GMT+01:00 ddav <dave.davo...@gmail.com>: > Hi, > > Certain API's (map, mapValues) give the developer access to the data stored > in RDD's. > Am I correct in saying that these API's must never modify the data but > always return a new object with a copy of the data if the data needs to be > updated for the returned RDD. > > Thanks, > Dave. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/RDD-immutablility-tp26007.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >