Hi Marco,

Yes, that answers my question. I just wanted to be sure as the API gave me write access to the immutable data which means its up to the developer to know not to modify the input parameters for these API's.

Thanks for the response.
Dave.

On 19/01/16 12:25, Marco wrote:
Hello,

RDD are immutable by design. The reasons, to quote Sean Owen in this answer ( https://www.quora.com/Why-is-a-spark-RDD-immutable ), are the following :

    Immutability rules out a big set of potential problems due to
    updates from multiple threads at once. Immutable data is
    definitely safe to share across processes.

    They're not just immutable but a deterministic function of their
    input. This plus immutability also means the RDD's parts can be
    recreated at any time. This makes caching, sharing and replication
    easy.
    These are significant design wins, at the cost of having to copy
    data rather than mutate it in place. Generally, that's a decent
    tradeoff to make: gaining the fault tolerance and correctness with
    no developer effort is worth spending memory and CPU on, since the
    latter are cheap.
    A corollary: immutable data can as easily live in memory as on
    disk. This makes it reasonable to easily move operations that hit
    disk to instead use data in memory, and again, adding memory is
    much easier than adding I/O bandwidth.
    Of course, an RDD isn't really a collection of data, but just a
    recipe for making data from other data. It is not literally
    computed by materializing every RDD completely. That is, a lot of
    the "copy" can be optimized away too.


I hope it answers your question.

Kind regards,
Marco

2016-01-19 13:14 GMT+01:00 ddav <dave.davo...@gmail.com <mailto:dave.davo...@gmail.com>>:

    Hi,

    Certain API's (map, mapValues) give the developer access to the
    data stored
    in RDD's.
    Am I correct in saying that these API's must never modify the data but
    always return a new object with a copy of the data if the data
    needs to be
    updated for the returned RDD.

    Thanks,
    Dave.



    --
    View this message in context:
    
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-immutablility-tp26007.html
    Sent from the Apache Spark User List mailing list archive at
    Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org
    <mailto:user-h...@spark.apache.org>



Reply via email to