In my application, data parts inside an RDD partition have ralations. so I need to do some operations beween them.
for example RDD T1 has several partitions, each partition has three parts A, B and C. then I transform T1 to T2. after transform, T2 also has three parts D, E and F, D = A+B, E = A+C, F = B+C. As far as I know, spark only supports operations traversing the RDD and calling a function for each element. how can I do such a transform? in hadoop I copy the data in each partition to a user defined buffer and do any operations I like in the buffer, finally I call output.collect() to emit the data. But how can I construct a new RDD with distributed partitions in spark? makeRDD only distributes a local Scala collection to form an RDD. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Only-TraversableOnce-tp3873.html Sent from the Apache Spark User List mailing list archive at Nabble.com.