It depends on what you mean by "write access". The RDDs are immutable, so you can't really change them. When you apply a mapping/filter/groupBy function, you are creating a new RDD starting from the original one.
Kind regards, Marco 2016-01-19 13:27 GMT+01:00 Dave <dave.davo...@gmail.com>: > Hi Marco, > > Yes, that answers my question. I just wanted to be sure as the API gave me > write access to the immutable data which means its up to the developer to > know not to modify the input parameters for these API's. > > Thanks for the response. > Dave. > > > On 19/01/16 12:25, Marco wrote: > > Hello, > > RDD are immutable by design. The reasons, to quote Sean Owen in this > answer ( https://www.quora.com/Why-is-a-spark-RDD-immutable ), are the > following : > > Immutability rules out a big set of potential problems due to updates from >> multiple threads at once. Immutable data is definitely safe to share across >> processes. > > They're not just immutable but a deterministic function of their input. >> This plus immutability also means the RDD's parts can be recreated at any >> time. This makes caching, sharing and replication easy. >> These are significant design wins, at the cost of having to copy data >> rather than mutate it in place. Generally, that's a decent tradeoff to >> make: gaining the fault tolerance and correctness with no developer effort >> is worth spending memory and CPU on, since the latter are cheap. >> A corollary: immutable data can as easily live in memory as on disk. This >> makes it reasonable to easily move operations that hit disk to instead use >> data in memory, and again, adding memory is much easier than adding I/O >> bandwidth. >> Of course, an RDD isn't really a collection of data, but just a recipe >> for making data from other data. It is not literally computed by >> materializing every RDD completely. That is, a lot of the "copy" can be >> optimized away too. > > > I hope it answers your question. > > Kind regards, > Marco > > 2016-01-19 13:14 GMT+01:00 ddav <dave.davo...@gmail.com>: > >> Hi, >> >> Certain API's (map, mapValues) give the developer access to the data >> stored >> in RDD's. >> Am I correct in saying that these API's must never modify the data but >> always return a new object with a copy of the data if the data needs to be >> updated for the returned RDD. >> >> Thanks, >> Dave. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-immutablility-tp26007.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > >