First question:
If you save your modified RDD like this:
points.foreach(p=>p.y = another_value).collect() or
points.foreach(p=>p.y = another_value).saveAsTextFile(...)
the modified RDD will be materialized and this will not use any work's
memory.
If you have more transformatins after the map(), the spark will pipelines
all transformations and build a DAG. Very little memory will be used in
this stage and the memory will be free soon.
Only cache() will persist your RDD in memory for a long time.
Second question:
Once RDD be created, it can not be changed due to the immutable feature.You
can only create a new RDD from the existing RDD or from file system.


2014-03-25 9:45 GMT+08:00 林武康 <vboylin1...@gmail.com>:

>  Hi hequn, a relative question, is that mean the memory usage will
> doubled? And further more, if the compute function in a rdd is not
> idempotent, rdd will changed during the job running, is that right?
>  ------------------------------
> 发件人: hequn cheng <chenghe...@gmail.com>
> 发送时间: 2014/3/25 9:35
> 收件人: user@spark.apache.org
> 主题: Re: RDD usage
>
> points.foreach(p=>p.y = another_value) will return a new modified RDD.
>
>
> 2014-03-24 18:13 GMT+08:00 Chieh-Yen <r01944...@csie.ntu.edu.tw>:
>
>>  Dear all,
>>
>> I have a question about the usage of RDD.
>> I implemented a class called AppDataPoint, it looks like:
>>
>> case class AppDataPoint(input_y : Double, input_x : Array[Double])
>> extends Serializable {
>>   var y : Double = input_y
>>   var x : Array[Double] = input_x
>>   ......
>> }
>> Furthermore, I created the RDD by the following function.
>>
>> def parsePoint(line: String): AppDataPoint = {
>>   /* Some related works for parsing */
>>   ......
>> }
>>
>> Assume the RDD called "points":
>>
>> val lines = sc.textFile(inputPath, numPartition)
>> var points = lines.map(parsePoint _).cache()
>>
>> The question is that, I tried to modify the value of this RDD, the
>> operation is:
>>
>> points.foreach(p=>p.y = another_value)
>>
>> The operation is workable.
>> There doesn't have any warning or error message showed by the system and
>> the results are right.
>> I wonder that if the modification for RDD is a correct and in fact
>> workable design.
>> The usage web said that the RDD is immutable, is there any suggestion?
>>
>> Thanks a lot.
>>
>> Chieh-Yen Lin
>>
>
>

Reply via email to