Hi,

I see that this type of question has been asked before, however still a
little confused about it in practice. Such as there are two ways I could
deal with a series of RDD transformation before I do a RDD action, which way
is faster:

Way 1:
val data = sc.textFile()
val data1 = data.map(x => f1(x))
val data2 = data.map(x1 = f2(x1))
println(data2.count())

Way2:
val data = sc.textFile(0
val data2 = data.map(x => f2(f1(x)))
println(data2.count())

Since Spark doesn't materialize RDD transformations, so I assume the two
ways are equal?

I asked this because the memory of my cluster is very limited and I don't
want to cache a RDD at the very early stage. Is it true that if I cache a
RDD early and take the space, then I need to unpersist it before I cache
another in order to save the memory?

Thanks a lot!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/lazy-evaluation-of-RDD-transformation-tp15811.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to