First question: If you save your modified RDD like this: points.foreach(p=>p.y = another_value).collect() or points.foreach(p=>p.y = another_value).saveAsTextFile(...) the modified RDD will be materialized and this will not use any work's memory. If you have more transformatins after the map(), the spark will pipelines all transformations and build a DAG. Very little memory will be used in this stage and the memory will be free soon. Only cache() will persist your RDD in memory for a long time. Second question: Once RDD be created, it can not be changed due to the immutable feature.You can only create a new RDD from the existing RDD or from file system.
2014-03-25 9:45 GMT+08:00 林武康 <vboylin1...@gmail.com>: > Hi hequn, a relative question, is that mean the memory usage will > doubled? And further more, if the compute function in a rdd is not > idempotent, rdd will changed during the job running, is that right? > ------------------------------ > 发件人: hequn cheng <chenghe...@gmail.com> > 发送时间: 2014/3/25 9:35 > 收件人: user@spark.apache.org > 主题: Re: RDD usage > > points.foreach(p=>p.y = another_value) will return a new modified RDD. > > > 2014-03-24 18:13 GMT+08:00 Chieh-Yen <r01944...@csie.ntu.edu.tw>: > >> Dear all, >> >> I have a question about the usage of RDD. >> I implemented a class called AppDataPoint, it looks like: >> >> case class AppDataPoint(input_y : Double, input_x : Array[Double]) >> extends Serializable { >> var y : Double = input_y >> var x : Array[Double] = input_x >> ...... >> } >> Furthermore, I created the RDD by the following function. >> >> def parsePoint(line: String): AppDataPoint = { >> /* Some related works for parsing */ >> ...... >> } >> >> Assume the RDD called "points": >> >> val lines = sc.textFile(inputPath, numPartition) >> var points = lines.map(parsePoint _).cache() >> >> The question is that, I tried to modify the value of this RDD, the >> operation is: >> >> points.foreach(p=>p.y = another_value) >> >> The operation is workable. >> There doesn't have any warning or error message showed by the system and >> the results are right. >> I wonder that if the modification for RDD is a correct and in fact >> workable design. >> The usage web said that the RDD is immutable, is there any suggestion? >> >> Thanks a lot. >> >> Chieh-Yen Lin >> > >