Re: Finding previous and next element in a sorted RDD
Using mapPartitions, you could get the neighbors within a partition, but if you think about it, it's much more difficult to accomplish this for the complete dataset. On Fri, Aug 22, 2014 at 11:24 AM, cjwang c...@cjwang.us wrote: It would be nice if an RDD that was massaged by OrderedRDDFunction could know its neighbors. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Finding previous and next element in a sorted RDD
There's no way to avoid a shuffle due to the first and last elements of each partition needing to be computed with the others, but I wonder if there is a way to do a minimal shuffle. On Thu, Aug 21, 2014 at 6:13 PM, cjwang c...@cjwang.us wrote: One way is to do zipWithIndex on the RDD. Then use the index as a key. Add or subtract 1 for previous or next element. Then use cogroup or join to bind them together. val idx = input.zipWithIndex val previous = idx.map(x = (x._2+1, x._1)) val current = idx.map(x = (x._2, x._1)) val next = idx.map(x = (x._2-1, x._1)) val joined = current leftOuterJoin previous leftOuterJoin next Code looks clean to me, but I feel uneasy about the performance of join. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Finding previous and next element in a sorted RDD
It would be nice if an RDD that was massaged by OrderedRDDFunction could know its neighbors. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Finding previous and next element in a sorted RDD
I have an RDD containing elements sorted in certain order. I would like to map over the elements knowing the values of their respective previous and next elements. With regular List, I used to do this: (input is a List below) // The first of the previous measures and the last of the next measures are dummy val dummy = new Measure() val ml = (dummy :: input) :+ dummy // Take 3 element at the time. Then 1st is the previous, the middle is the current, and // the last is the next. ml.iterator.sliding(3).map( tri = produceMeasure(tri.head, tri.tail.head, tri.last) ) toList Now, in RDD, how do I do that (elegantly I hope)? I thought about zipping an RDD with shifted self, but there are some messy partition issues about zip. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Finding previous and next element in a sorted RDD
One way is to do zipWithIndex on the RDD. Then use the index as a key. Add or subtract 1 for previous or next element. Then use cogroup or join to bind them together. val idx = input.zipWithIndex val previous = idx.map(x = (x._2+1, x._1)) val current = idx.map(x = (x._2, x._1)) val next = idx.map(x = (x._2-1, x._1)) val joined = current leftOuterJoin previous leftOuterJoin next Code looks clean to me, but I feel uneasy about the performance of join. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org