Re: Finding previous and next element in a sorted RDD

2014-08-23 Thread Victor Tso-Guillen
Using mapPartitions, you could get the neighbors within a partition, but if
you think about it, it's much more difficult to accomplish this for the
complete dataset.


On Fri, Aug 22, 2014 at 11:24 AM, cjwang  wrote:

> It would be nice if an RDD that was massaged by OrderedRDDFunction could
> know
> its "neighbors".
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Finding previous and next element in a sorted RDD

2014-08-22 Thread cjwang
It would be nice if an RDD that was massaged by OrderedRDDFunction could know
its "neighbors".




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Finding previous and next element in a sorted RDD

2014-08-21 Thread Evan Chan
There's no way to avoid a shuffle due to the first and last elements
of each partition needing to be computed with the others, but I wonder
if there is a way to do a minimal shuffle.

On Thu, Aug 21, 2014 at 6:13 PM, cjwang  wrote:
> One way is to do zipWithIndex on the RDD.  Then use the index as a key.  Add
> or subtract 1 for previous or next element.  Then use cogroup or join to
> bind them together.
>
> val idx = input.zipWithIndex
> val previous = idx.map(x => (x._2+1, x._1))
> val current = idx.map(x => (x._2, x._1))
> val next = idx.map(x => (x._2-1, x._1))
>
> val joined = current leftOuterJoin previous leftOuterJoin next
>
> Code looks clean to me, but I feel uneasy about the performance of join.
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Finding previous and next element in a sorted RDD

2014-08-21 Thread cjwang
One way is to do zipWithIndex on the RDD.  Then use the index as a key.  Add
or subtract 1 for previous or next element.  Then use cogroup or join to
bind them together.

val idx = input.zipWithIndex
val previous = idx.map(x => (x._2+1, x._1))
val current = idx.map(x => (x._2, x._1))
val next = idx.map(x => (x._2-1, x._1))

val joined = current leftOuterJoin previous leftOuterJoin next

Code looks clean to me, but I feel uneasy about the performance of join.  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org