Hi, I see that this type of question has been asked before, however still a little confused about it in practice. Such as there are two ways I could deal with a series of RDD transformation before I do a RDD action, which way is faster:
Way 1: val data = sc.textFile() val data1 = data.map(x => f1(x)) val data2 = data.map(x1 = f2(x1)) println(data2.count()) Way2: val data = sc.textFile(0 val data2 = data.map(x => f2(f1(x))) println(data2.count()) Since Spark doesn't materialize RDD transformations, so I assume the two ways are equal? I asked this because the memory of my cluster is very limited and I don't want to cache a RDD at the very early stage. Is it true that if I cache a RDD early and take the space, then I need to unpersist it before I cache another in order to save the memory? Thanks a lot! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/lazy-evaluation-of-RDD-transformation-tp15811.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org