Re: Only TraversableOnce?
thank you, it works after my operation over p, return p.toIterator, because mapPartitions has iterator return type, is that right? rdd.mapPartitions{D = {val p = D.toArray; ...; p.toIterator}} -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Only-TraversableOnce-tp3873p4043.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Only TraversableOnce?
Yeah, should be right -- Nan Zhu On Wednesday, April 9, 2014 at 8:54 PM, wxhsdp wrote: thank you, it works after my operation over p, return p.toIterator, because mapPartitions has iterator return type, is that right? rdd.mapPartitions{D = {val p = D.toArray; ...; p.toIterator}} -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Only-TraversableOnce-tp3873p4043.html Sent from the Apache Spark User List mailing list archive at Nabble.com (http://Nabble.com).
Re: Only TraversableOnce?
so, the data structure looks like: D consists of D1, D2, D3 (DX is partition) and DX consists of d1, d2, d3 (dx is the part in your context)? what you want to do is to transform DX to (d1 + d2, d1 + d3, d2 + d3)? Best, -- Nan Zhu On Tuesday, April 8, 2014 at 8:09 AM, wxhsdp wrote: In my application, data parts inside an RDD partition have ralations. so I need to do some operations beween them. for example RDD T1 has several partitions, each partition has three parts A, B and C. then I transform T1 to T2. after transform, T2 also has three parts D, E and F, D = A+B, E = A+C, F = B+C. As far as I know, spark only supports operations traversing the RDD and calling a function for each element. how can I do such a transform? in hadoop I copy the data in each partition to a user defined buffer and do any operations I like in the buffer, finally I call output.collect() to emit the data. But how can I construct a new RDD with distributed partitions in spark? makeRDD only distributes a local Scala collection to form an RDD. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Only-TraversableOnce-tp3873.html Sent from the Apache Spark User List mailing list archive at Nabble.com (http://Nabble.com).
Re: Only TraversableOnce?
yes, how can i do this conveniently? i can use filter, but there will be so many RDDs and it's not concise -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Only-TraversableOnce-tp3873p3875.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Only TraversableOnce?
If that’s the case, I think mapPartition is what you need, but it seems that you have to load the partition into the memory as whole by toArray rdd.mapPartition{D = {val p = D.toArray; ...}} -- Nan Zhu On Tuesday, April 8, 2014 at 8:40 AM, wxhsdp wrote: yes, how can i do this conveniently? i can use filter, but there will be so many RDDs and it's not concise -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Only-TraversableOnce-tp3873p3875.html Sent from the Apache Spark User List mailing list archive at Nabble.com (http://Nabble.com).
Re: Only TraversableOnce?
thank you for your help! let me have a try Nan Zhu wrote If that’s the case, I think mapPartition is what you need, but it seems that you have to load the partition into the memory as whole by toArray rdd.mapPartition{D = {val p = D.toArray; ...}} -- Nan Zhu On Tuesday, April 8, 2014 at 8:40 AM, wxhsdp wrote: yes, how can i do this conveniently? i can use filter, but there will be so many RDDs and it's not concise -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Only-TraversableOnce-tp3873p3875.html Sent from the Apache Spark User List mailing list archive at Nabble.com (http://Nabble.com). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Only-TraversableOnce-tp3873p3877.html Sent from the Apache Spark User List mailing list archive at Nabble.com.