Re: Finding previous and next element in a sorted RDD
Using mapPartitions, you could get the neighbors within a partition, but if you think about it, it's much more difficult to accomplish this for the complete dataset. On Fri, Aug 22, 2014 at 11:24 AM, cjwang wrote: > It would be nice if an RDD that was massaged by OrderedRDDFunction could > know > its "neighbors". > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Finding previous and next element in a sorted RDD
It would be nice if an RDD that was massaged by OrderedRDDFunction could know its "neighbors". -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Finding previous and next element in a sorted RDD
There's no way to avoid a shuffle due to the first and last elements of each partition needing to be computed with the others, but I wonder if there is a way to do a minimal shuffle. On Thu, Aug 21, 2014 at 6:13 PM, cjwang wrote: > One way is to do zipWithIndex on the RDD. Then use the index as a key. Add > or subtract 1 for previous or next element. Then use cogroup or join to > bind them together. > > val idx = input.zipWithIndex > val previous = idx.map(x => (x._2+1, x._1)) > val current = idx.map(x => (x._2, x._1)) > val next = idx.map(x => (x._2-1, x._1)) > > val joined = current leftOuterJoin previous leftOuterJoin next > > Code looks clean to me, but I feel uneasy about the performance of join. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Finding previous and next element in a sorted RDD
One way is to do zipWithIndex on the RDD. Then use the index as a key. Add or subtract 1 for previous or next element. Then use cogroup or join to bind them together. val idx = input.zipWithIndex val previous = idx.map(x => (x._2+1, x._1)) val current = idx.map(x => (x._2, x._1)) val next = idx.map(x => (x._2-1, x._1)) val joined = current leftOuterJoin previous leftOuterJoin next Code looks clean to me, but I feel uneasy about the performance of join. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org